Content-based audio playback emphasis

US 20070033032A1
Filed: 07/22/2005
Published: 02/08/2007
Est. Priority Date: 07/22/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising steps of:

(A) identifying an estimate of a likelihood that a region of a document correctly represents content in a corresponding region of a spoken audio stream; and

(B) identifying, based on the identified likelihood, an emphasis factor for modifying emphasis placed on the region of the spoken audio stream when played back.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.

Citations

52 Claims

1. A method comprising steps of:
- (A) identifying an estimate of a likelihood that a region of a document correctly represents content in a corresponding region of a spoken audio stream; and
  
  (B) identifying, based on the identified likelihood, an emphasis factor for modifying emphasis placed on the region of the spoken audio stream when played back.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 35, 41)
- - 2. The method of claim 1, wherein the step (B) comprises a step of identifying an emphasis factor for increasing the emphasis placed on the region of the spoken audio stream when played back.
  - 3. The method of claim 1, wherein the step (B) comprises a step of identifying an emphasis factor for decreasing the emphasis placed on the region of the spoken audio stream when played back.
  - 4. The method of claim 1, wherein the step (B) comprises a step of identifying, based on the identified likelihood, a timescale adjustment factor for adjusting a playback rate of the region of the spoken audio stream.
  - 5. The method of claim 1, wherein the step (B) comprises a step of identifying, based on the identified likelihood, a signal power adjustment factor for adjusting a signal power of the region of the spoken audio stream.
  - 6. The method of claim 1, further comprising a step of:
    - (C) modifying an emphasis of the region of the spoken audio stream in accordance with the emphasis factor to produce an emphasis-adjusted audio stream.
  - 7. The method of claim 6, further comprising a step of:
    - (D) playing back the emphasis-adjusted audio stream.
  - 8. The method of claim 7, further comprising a step of:
    - (E) correcting errors in the document based on the emphasis-adjusted audio stream.
  - 9. The method of claim 6, further comprising a step of:
    - (D) modifying an emphasis of the region of the document in accordance with the emphasis factor to produce an emphasis-adjusted document region.
  - 10. The method of claim 6, further comprising a step of:
    - (D) modifying an emphasis of a region adjacent to the region of the spoken audio stream to a lesser extent than specified by the emphasis factor.
  - 11. The method of claim 1, wherein the step (A) comprises a step of:
    - (A)(1) identifying the estimate of the likelihood based on a prior likelihood of correctness of the region of the document.
  - 12. The method of claim 1, wherein the step (A) comprises a step of:
    - (A)(1) identifying the estimate of the likelihood based on a feature of the spoken audio stream.
  - 13. The method of claim 12, wherein the feature comprises an identity of a speaker of the spoken audio stream.
  - 14. The method of claim 12, wherein the feature comprises a signal-to-noise ratio of the spoken audio stream.
  - 15. The method of claim 1, wherein the step (A) comprises a step of:
    - (A)(1) identifying the estimate of the likelihood based on a confidence measure representing a degree of confidence that the region of the document correctly represents the content in the corresponding region of the spoken audio stream, wherein the confidence measure is provided by an automatic transcription system that produced the region of the document based on the region of the spoken audio stream.
  - 16. The method of claim 15, wherein the step (A)(1) comprises a step of identifying the estimate of the likelihood based on the confidence measure, a prior likelihood of correctness of the region of the document, and a feature of the spoken audio stream.
  - 17. The method of claim 1 further comprising a step of:
    - (C) prior to the step (B), identifying a measure of relevance of the region of the spoken audio stream;
      
      wherein the step (B) comprises a step of identifying the emphasis factor based on the identified likelihood and the identified measure of relevance.
  - 18. The method of claim 17, wherein the step (C) comprises a step of:
    - (C)(1) identifying a prior relevance of the region of the document; and
      
      (C)(2) identifying the measure of relevance of the region of the spoken audio stream based on the identified prior relevance of the region of the document.
  - 19. The method of claim 18, wherein the step (C)(1) comprises a step of identifying the prior relevance of the region of the document as a relatively high prior relevance if the region of the document contains content in a predetermined set of highly-relevant content.
  - 20. The method of claim 17, wherein the step (C) comprises a step of identifying the measure of relevance of the region of the spoken audio stream as a relative relevance if the region of the spoken audio stream contains no speech.
  - 21. The method of claim 17, wherein the region of the document comprises a hypothesis generated by an automatic transcription system for the corresponding region of the spoken audio stream, and wherein the step (C) comprises steps of:
    - (C)(1) identifying a competing hypothesis generated by the automatic transcription system for the corresponding region of the spoken audio stream;
      
      (C)(2) identifying a prior relevance of the competing hypothesis; and
      
      (C)(3) identifying the measure of relevance based on the prior relevance of the competing hypothesis.
  - 22. The method of claim 21, wherein the step (C)(3) comprises a step of identifying the measure of relevance based on the prior relevance of the competing hypothesis and a prior relevance of the region of the document.
  - 23. The method of claim 17, wherein the step (B) comprises steps of:
    - (B)(1) identifying a rule for identifying the emphasis factor based on the identified likelihood and the identified measure of relevance; and
      
      (B)(2) applying the rule to the identified likelihood and the identified measure of relevance to identify the emphasis factor.
  - 24. The method of claim 23, wherein the step (B)(2) comprises steps of:
    - (B)(2)(a) identifying a first weight associated with the identified likelihood;
      
      (B)(2)(b) identifying a second weight associated with the measure of relevance; and
      
      (B)(2)(c) identifying the emphasis factor as a combination of the identified likelihood and the measure of relevance weighted by the first and second weights, respectively.
  - 25. The method of claim 1 further comprising a step of:
    - (C) prior to the step (A), generating the document based on the spoken audio stream.
  - 27. The method of claim 25, wherein the step (C) comprises a step of using an automated transcription system to generate the document based on the spoken audio stream.
  - 35. The apparatus of claim 6, further comprising:
    - means for modifying an emphasis of a region adjacent to the region of the spoken audio stream to a lesser extent than specified by the emphasis factor.
  - 41. The apparatus of claim 1 further comprising:
    - means for generating the document based on the spoken audio stream.

26. A method comprising steps of:
- (A) identifying an estimate of a likelihood that a region of a document correctly represents content in a corresponding region of a spoken audio stream;
  
  (B) identifying a measure of relevance of the region of the spoken audio stream; and
  
  (C) identifying, based on the identified likelihood and the identified measure of relevance, a timescale adjustment factor for adjusting a playback rate of the region of the spoken audio stream when played back.

28. An apparatus comprising:
- first identification means for identifying an estimate of a likelihood that a region of a document correctly represents content in a corresponding region of a spoken audio stream; and
  
  second identification means for identifying, based on the identified likelihood, an emphasis factor for modifying emphasis placed on the region of the spoken audio stream when played back.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40)
- - 29. The apparatus of claim 28, wherein the second identification means comprises means for identifying an emphasis factor for increasing the emphasis placed on the region of the spoken audio stream when played back.
  - 30. The apparatus of claim 28, wherein the second identification means comprises means for identifying, based on the identified likelihood, a signal power adjustment factor for adjusting a signal power of the region of the spoken audio stream.
  - 31. The apparatus of claim 28, further comprising:
    - means for modifying an emphasis of the region of the spoken audio stream in accordance with the emphasis factor to produce an emphasis-adjusted audio stream.
  - 32. The apparatus of claim 31, further comprising:
    - means for playing back the emphasis-adjusted audio stream.
  - 33. The apparatus of claim 32, further comprising:
    - means for correcting errors in the document based on the emphasis-adjusted audio stream.
  - 34. The apparatus of claim 31, further comprising:
    - means for modifying an emphasis of the region of the document in accordance with the emphasis factor to produce an emphasis-adjusted document region.
  - 36. The apparatus of claim 28, wherein the first identification means comprises:
    - means for identifying the estimate of the likelihood based on a prior likelihood of correctness of the region of the document.
  - 37. The apparatus of claim 28, wherein the first identification means comprises:
    - means for identifying the estimate of the likelihood based on a feature of the spoken audio stream.
  - 38. The apparatus of claim 28, further comprising:
    - third identification means for identifying a measure of relevance of the region of the spoken audio stream;
      
      wherein the second identification means comprises means for identifying the emphasis factor based on the identified likelihood and the identified measure of relevance.
  - 39. The apparatus of claim 38, wherein the third identification means comprises:
    - means for identifying a prior relevance of the region of the document; and
      
      means for identifying the measure of relevance of the region of the spoken audio stream based on the identified prior relevance of the region of the document.
  - 40. The apparatus of claim 38, wherein the second identification means comprises:
    - means for identifying a rule for identifying the emphasis factor based on the identified likelihood and the identified measure of relevance; and
      
      means for applying the rule to the identified likelihood and the identified measure of relevance to identify the emphasis factor.

42. A method comprising steps of:
- (A) identifying an estimate of a likelihood that a region of a document correctly represents particular content;
  
  (B) identifying, based on the identified likelihood, an emphasis factor; and
  
  (C) using a text-to-speech engine to play an audio stream representing the region of the document with an emphasis specified by the emphasis factor.
- View Dependent Claims (43, 44, 45, 46, 47)
- - 43. The method of claim 42, further comprising a step of:
    - (D) correcting errors in the document based on the audio stream.
  - 44. The method of claim 42, wherein the step (B) comprises a step of identifying, based on the identified likelihood, a timescale adjustment factor for adjusting a playback rate of the audio stream.
  - 45. The method of claim 42, wherein the step (B) comprises a step of identifying, based on the identified likelihood, a signal power adjustment factor for adjusting a signal power of the audio stream.
  - 46. The method of claim 42, further comprising a step of:
    - (D) modifying an emphasis of the region of the document in accordance with the emphasis factor to produce an emphasis-adjusted document region.
  - 47. The method of claim 42, wherein the step (A) comprises a step of:
    - (A)(1) identifying the estimate of the likelihood based on a prior likelihood of correctness of the region of the document.

48. An apparatus comprising:
- first identification means for identifying an estimate of a likelihood that a region of a document correctly represents particular content;
  
  second identification means for identifying, based on the identified likelihood, an emphasis factor; and
  
  a text-to-speech engine to play an audio stream representing the region of the document with an emphasis specified by the emphasis factor.
- View Dependent Claims (49, 50, 51, 52)
- - 49. The apparatus of claim 48, further comprising:
    - means for correcting errors in the document based on the audio stream.
  - 50. The apparatus of claim 48, wherein the second identification means comprises means for identifying, based on the identified likelihood, a timescale adjustment factor for adjusting a playback rate of the audio stream.
  - 51. The apparatus of claim 48, wherein the second identification means comprises means for identifying, based on the identified likelihood, a signal power adjustment factor for adjusting a signal power of the audio stream.
  - 52. The apparatus of claim 48, wherein the first identification means comprises:
    - means for identifying the estimate of the likelihood based on a prior likelihood of correctness of the region of the document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
3M Innovative Properties Company (3M Company)
Original Assignee
Multimodal Technologies Incorporated (3M Company)
Inventors
Koll, Detlef, Schubert, Kjell, Finke, Michael, Fritsch, Juergen

Granted Patent

US 7,844,464 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G06F 40/232   Orthographic correction, e....

G06F 40/30   Semantic analysis

G10L 15/1807   using prosody or stress

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 21/04   Time compression or expansion

Content-based audio playback emphasis

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

52 Claims

Specification

Solutions

Use Cases

Quick Links

Content-based audio playback emphasis

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

52 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links