Mixing digitized speech and text using reliability indices

US 6,151,576 A
Filed: 08/11/1998
Issued: 11/21/2000
Est. Priority Date: 08/11/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of processing an original data stream of digitized speech samples, comprising:

converting a stream of digitized speech samples to a stream of text and associated reliability measures, the reliability measures indicating a level of confidence in the correctness of the speech to text conversion of the associated portions of the stream of text; and

creating a mixed-media data stream comprising the stream of text as a text component and selected portions of the digitized stream of speech as a speech component, each selected portion corresponding to a portion of the stream of text having a reliability measure below a threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus of processing, storing and transmitting an original data stream of digitized speech samples. The method converts a stream of digitized speech samples to a stream of text and associated reliability measures. A mixed-media data stream is created with the stream of text as a text component and selected portions of the digitized stream of speech as a speech component. The selected portions are those whose corresponding reliability measures fall below a threshold. The threshold can be changed to change the amount of storage or bandwidth used by the mixed-media data stream. The mixed-media data stream can be searched and the results can be spoken as synthetic speech derived form the text component or as speech samples taken from the digitized speech component.

107 Citations

30 Claims

1. A method of processing an original data stream of digitized speech samples, comprising:
- converting a stream of digitized speech samples to a stream of text and associated reliability measures, the reliability measures indicating a level of confidence in the correctness of the speech to text conversion of the associated portions of the stream of text; and
  
  creating a mixed-media data stream comprising the stream of text as a text component and selected portions of the digitized stream of speech as a speech component, each selected portion corresponding to a portion of the stream of text having a reliability measure below a threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 of processing and transmitting speech, further comprising:
    - transmitting the speech by transmitting the mixed-media data stream.
  - 3. The method of claim 2, further comprising:
    - receiving the mixed-media data stream;
      
      converting the text of the received mixed-media data stream to synthetic speech; and
      
      speaking the mixed-media data stream by speaking the synthetic speech for the portions of the original speech where no digitized speech is present in the mixed-media data stream and speaking the digitized speech from the mixed-media data stream where digitized speech is present in the mixed-media data stream.
  - 4. The method of claim 1 of processing and storing speech, further comprising:
    - storing the mixed-media data stream.
  - 5. The method of claim 4, further comprising:
    - receiving the mixed-media data stream;
      
      converting the text of the received mixed-media data stream to synthetic speech; and
      
      speaking the mixed-media data stream by speaking the synthetic speech for the portions of the original speech where no digitized speech is present in the mixed-media data stream and speaking the digitized speech from the mixed-media data stream where it is present in the mixed-media data stream.
  - 6. The method of claim 4, further comprising:
    - searching the text component of the mixed-media data stream for text matching a text search request.
  - 7. The method of claim 1, further comprising:
    - searching the text component of the mixed-media data stream for text matching a text search request.
  - 8. The method of claim 6, further comprising:
    - finding in the text component a segment of text matching a text search request; and
      
      speaking the segment of text.
  - 9. The method of claim 1, wherein the mixed-media data stream is created from two time-synchronized data streams, the first being the original data stream of digitized speech samples, and the second being the stream of text having associated reliability measures.
  - 10. The method of claim 1, wherein:
    - each word in the stream of text has an associated reliability measure, a reliability measure value below the threshold indicating that a corresponding portion of the converted text is unreliable.
  - 11. The method of claim 1, further comprising:
    - measuring the amount of storage required to store the mixed-media data stream as it is being created; and
      
      changing the threshold to change the amount of storage required to store the mixed-media data stream.
  - 12. The method of claim 1, wherein:
    - the original data stream of digitized speech samples is received in a computer;
      
      the digitized speech samples are provided as input to speech recognition engine, the speech recognition engine being a computer program running in the computer, to convert speech to text and produce reliability measures; and
      
      the mixed-media data stream is created by a second computer program receiving as inputs the digitized speech samples, the text produced by the speech recognition engine, and the reliability measures produced by the speech recognition engine.

13. A computer program, residing on a computer-readable medium, comprising instructions for causing a computer to:
- convert a stream of digitized speech samples to a stream of text and associated reliability measures, the reliability measures indicating a level of confidence in the correctness of the speech to text conversion of the associated portions of the stream of text; and
  
  create a mixed-media data stream comprising the stream of text as a text component and selected portions of the digitized stream of speech as a speech component, each selected portion corresponding to a portion of the stream of text having a reliability measure below a threshold.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The computer program of claim 13, further comprising instructions to:
    - store the mixed-media data stream.
  - 15. The computer program of claim 13, further comprising instructions to:
    - receive the mixed-media data stream;
      
      convert the text of the received mixed-media data stream to synthetic speech; and
      
      speak the mixed-media data stream by speaking the synthetic speech for the portions of the original speech where no digitized speech is present in the mixed-media data stream and speaking the digitized speech from the mixed-media data stream where it is present in the mixed-media data stream.
  - 16. The computer program of claim 15, farther comprising instructions to:
    - search the text component of the mixed-media data stream for text matching a text search request.
  - 17. The method of claim 15, further comprising instructions to:
    - find in the text component a segment of text matching a text search request; and
      
      speaking the segment of text.
  - 18. The computer program of claim 13, further comprising instructions to:
    - search the text component of the mixed-media data stream for text matching a text search request.
  - 19. The method of claim 13, further comprising instructions to:
    - find in the text component a segment of text matching a text search request; and
      
      speaking the segment of text.
  - 20. The method of claim 13, wherein the mixed-media data stream is created from two time-synchronized data streams, the first being the original data stream of digitized speech samples, and the second being the stream of text having associated reliability measures.
  - 21. The method of claim 13, wherein:
    - each word in the stream of text has an associated reliability measure, a reliability measure value below the threshold indicating that a corresponding portion of the converted text is unreliable.
  - 22. The method of claim 13, further comprising instructions to:
    - measure the amount of storage required to store the mixed-media data stream as it is being created; and
      
      change the threshold to change the amount of storage required to store the mixed-media data stream.

23. A method of processing a mixed-media data stream, comprising:
- reading a mixed-media data stream comprising a text component and a digitized speech component, the text component comprising text segments identified in the mixed-media data stream as doubtful and the digitized speech component comprising digitized speech corresponding to the text segments identified as doubtful;
  
  converting the text not identified as doubtful to synthetic speech; and
  
  generating an audio waveform from the mixed-media data stream by combining the digitized speech and the synthetic speech.
- View Dependent Claims (24, 25, 26)
- - 24. The method of claim 23, further comprising:
    - generating an audio waveform from the mixed-media data stream by combining the synthetic speech for the portions of the data stream where no digitized speech is present in the data stream with the digitized speech from the data stream where it is present in the data stream.
  - 25. The method of claim 23, further comprising:
    - speaking the audio waveform.
  - 26. The method of claim 23, further comprising:
    - finding in the text component a segment of text matching a text search request; and
      
      speaking the segment of text.

27. Apparatus comprising a computer-readable storage medium tangibly embodying program instructions for presenting information in spoken form, the program instructions including instructions operable for causing a programmable processor to:
- read a mixed-media data stream comprising a text component and a digitized speech component, the text component comprising text segments identified in the mixed-media data stream as doubtful and the digitized speech component comprising digitized speech corresponding to the text segments identified as doubtful;
  
  convert the text not identified as doubtful to synthetic speech; and
  
  generate an audio waveform from the mixed-media data stream by combining the digitized speech and the synthetic speech.
- View Dependent Claims (28, 29, 30)
- - 28. The apparatus of claim 27, wherein the program instructions further comprise instructions to:
    - generate an audio waveform from the mixed-media data stream by combining the synthetic speech for the portions of the original speech where no digitized speech is present in the mixed-media data stream and speaking the digitized speech from the mixed-media data stream where it is present in the mixed-media data stream.
  - 29. The method of claim 27, wherein the program instructions further comprise instructions to:
    - speak the audio waveform.
  - 30. The method of claim 27, wherein the program instructions further comprise instructions to:
    - find in the text component a segment of text matching a text search request; and
      
      speak the segment of text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Systems Incorporated (Adobe Inc.)
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Warnock, John E., Raman, T. V.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Azad, Abul K.

Application Number

US09/132,875
Time in Patent Office

833 Days
Field of Search

704/235, 704/260, 704/239, 704/231, 704/257
US Class Current

704/260
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 19/0018   Speech coding using phoneti...

Mixing digitized speech and text using reliability indices

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

107 Citations

30 Claims

Specification

Use Cases

Quick Links

Others

Mixing digitized speech and text using reliability indices

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

107 Citations

30 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others