Semi-Automatic Speech Transcription

US 20100121637A1
Filed: 11/12/2008
Published: 05/13/2010
Est. Priority Date: 11/12/2008
Status: Active Grant

First Claim

Patent Images

1. A method for providing semi-automatic speech transcription, comprising:

(a) receiving audio by an automatic speech detection component;

(b) automatically detecting speech in the audio by the automatic speech detection component;

(c) providing by the automatic speech detection component the detected speech as a plurality of speech segments to a transcription tool;

(d) providing by the transcription tool each of the plurality of speech segments to a user via a transcription interface; and

(e) receiving by the transcription tool via the transcription interface an indication for each of the plurality of speech segments from the user, wherein the indication comprises a transcription of the speech segment or an indication of non-speech for the speech segments.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A semi-automatic speech transcription system of the invention leverages the complementary capabilities of human and machine, building a system which combines automatic and manual approaches. With the invention, collected audio data is automatically distilled into speech segments, using signal processing and pattern recognition algorithms. The detected speech segments are presented to a human transcriber using a transcription tool with a streamlined transcription interface, requiring the transcriber to simply “listen and type”. This eliminates the need to manually navigate the audio, coupling the human effort to the amount of speech, rather than the amount of audio. Errors produced by the automatic system can be quickly identified by the human transcriber, which are used to improve the automatic system performance. The automatic system is tuned to maximize the human transcriber efficiency. The result is a system which takes considerably less time than purely manual transcription approaches to produce a complete transcription.

Citations

11 Claims

1. A method for providing semi-automatic speech transcription, comprising:
- (a) receiving audio by an automatic speech detection component;
  
  (b) automatically detecting speech in the audio by the automatic speech detection component;
  
  (c) providing by the automatic speech detection component the detected speech as a plurality of speech segments to a transcription tool;
  
  (d) providing by the transcription tool each of the plurality of speech segments to a user via a transcription interface; and
  
  (e) receiving by the transcription tool via the transcription interface an indication for each of the plurality of speech segments from the user, wherein the indication comprises a transcription of the speech segment or an indication of non-speech for the speech segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the automatically detecting (b) comprises:
    - (b1) partitioning the audio into a plurality of frames;
      
      (b2) calculating a feature vector for each of the plurality of frames;
      
      (b3) classifying each of the plurality of frames as speech or non-speech based on the feature vector corresponding to each frame; and
      
      (b4) grouping the plurality of frames into the plurality of speech segments based on the classifications.
  - 3. The method of claim 2, wherein the feature vector of a frame of the plurality of frames comprises one or more of the following:
    - a frequency content of the frame;
      
      a power of the audio in the frame;
      
      a power ratio in different frequency bands of the frame; and
      
      a spectral entropy for the frame.
  - 4. The method of claim 2, wherein a length of each frame is set based on an amount of audio to be in each frame.
  - 5. The method of claim 2, wherein the classifying (b3) is based on a decision tree classifier, wherein a sequence of decisions is made through nodes of the decision tree based on an input feature vector to determine the classification.
  - 6. The method of claim 2, wherein the plurality of features comprises a feature vector, wherein the classifying (b3) further comprises:
    - (b3i) producing a confidence score for the classification.
  - 7. The method of claim 2, wherein the grouping (b4) comprises:
    - (b4i) for a sequence of classifications, switching or not switching the speech or non-speech classification in the sequence based on a comparison of a cost for switching the classification with a cost for not switching the classification; and
      
      (b4ii) grouping the frames in the sequence classified as speech into a speech segment of the plurality of speech segments.
  - 8. The method of claim 1, wherein the providing (d) and the receiving (e) comprise:
    - (d1) displaying a text box for each of the plurality of speech segments by the transcription tool;
      
      (d2) playing one of the plurality of speech segments by the transcription tool;
      
      (e1) receiving a completed transcription of the speech segment or a non-speech indication in the text box for the playing speech segment from the user; and
      
      (e2) automatically advancing to a next speech segment of the plurality of speech segments by the transcription tool, wherein (d2), (e1), and (e2) are repeated for each speech segment.
  - 9. The method of claim 1, wherein the indication further comprises an indication that the speech segment cannot be transcribed.
  - 10. The method of claim 1, further comprising:
    - (f) creating a training set from the speech segments with indications of speech and non-speech; and
      
      (g) using the training set to improve the detecting (b).

11. A computer readable medium with program instructions for providing semi-automatic speech transcription, the program instructions executed by a computer, the instructions comprising:
- (a) receiving audio by an automatic speech detection component;
  
  (b) automatically detecting speech in the audio by the automatic speech detection component;
  
  (c) providing by the automatic speech detection component the detected speech as a plurality of speech segments to a transcription tool;
  
  (d) providing by the transcription tool each of the plurality of speech segments to a user via a transcription interface; and
  
  (e) receiving by the transcription tool via the transcription interface an indication for each of the plurality of speech segments from the user, wherein the indication comprises a transcription of the speech segment or an indication of non-speech for the speech segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Massachusetts Institute of Technology, MIT Media Lab
Original Assignee
Massachusetts Institute of Technology
Inventors
Roy, Brandon Cain, Roy, Deb Kumar

Granted Patent

US 8,249,870 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 25/78 Detection of presence or ab...

Semi-Automatic Speech Transcription

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Semi-Automatic Speech Transcription

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links