AUTOMATIC SPEECH RECOGNITION WITH TEXTUAL CONTENT INPUT

US 20080270110A1
Filed: 04/30/2007
Published: 10/30/2008
Est. Priority Date: 04/30/2007
Status: Abandoned Application

First Claim

Patent Images

1. A method of recognizing speech, the method comprising:

(a) extracting textual content from a visual content time segment associated with a rich media presentation;

(b) creating a textual content input comprising a word from the extracted textual content; and

(c) providing the textual content input to an automatic speech recognition algorithm such that there is an increased probability that the automatic speech recognition algorithm recognizes the word within an audio content time segment associated with the rich media presentation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of recognizing speech includes extracting textual content from a visual content time segment associated with a rich media presentation. A textual content input comprising a word from the extracted textual content is created. The textual content input is provided to an automatic speech recognition algorithm such that there is an increased probability that the automatic speech recognition algorithm recognizes the word within an audio content time segment associated with the rich media presentation.

129 Citations

View as Search Results

31 Claims

1. A method of recognizing speech, the method comprising:
- (a) extracting textual content from a visual content time segment associated with a rich media presentation;
  
  (b) creating a textual content input comprising a word from the extracted textual content; and
  
  (c) providing the textual content input to an automatic speech recognition algorithm such that there is an increased probability that the automatic speech recognition algorithm recognizes the word within an audio content time segment associated with the rich media presentation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 2. The method of claim 1, wherein a textual content selection algorithm is used to ensure that the word appears in a dictionary.
  - 3. The method of claim 1, wherein a textual content selection algorithm is used to ensure that the word includes a minimum number of characters.
  - 4. The method of claim 1, wherein a textual content selection algorithm is used to ensure that the word is not a commonly used word.
  - 5. The method of claim 1, wherein a textual content selection algorithm is used to ensure that the word does not appear repetitively within the visual content segment.
  - 6. The method of claim 1, wherein a textual content selection algorithm is used to ensure that the word conforms to a word validity rule.
  - 7. The method of claim 1, wherein the textual content input further comprises a second word obtained from textual metadata content associated the rich media presentation.
  - 8. The method of claim 7, wherein the textual metadata content associated the rich media presentation comprises a description of the rich media presentation.
  - 9. The method of claim 7, wherein the textual metadata content associated the rich media presentation comprises a title of the rich media presentation.
  - 10. The method of claim 7, wherein the textual metadata content associated the rich media presentation comprises annotations provided by a viewer of the rich media presentation.
  - 11. The method of claim 7, wherein the textual metadata content associated the rich media presentation comprises a presenter name.
  - 12. The method of claim 7, wherein the textual metadata content associated the rich media presentation comprises a presentation date.
  - 13. The method of claim 1, wherein the textual content is extracted from the visual content segment with an optical character recognition algorithm.
  - 14. The method of claim 1, wherein the textual content is extracted from a software application file associated with the visual content segment.
  - 15. The method of claim 1, wherein the textual content is extracted from formatted text associated with the visual content segment.
  - 16. The method of claim 1, wherein the textual content input comprises a dynamic automatic speech recognition dictionary in which the word is an entry.
  - 17. The method of claim 16, wherein the word appears in the dynamic automatic speech recognition dictionary while the speech recognition algorithm is recognizing speech with a timestamp that falls within an in-interval, wherein the in-interval comprises a time interval during which the word appears in the visual content time segment.
  - 18. The method of claim 17, wherein the word further appears in the dynamic speech recognition dictionary while the speech recognition algorithm is recognizing speech with a timestamp that falls within a time interval before or after the in-interval.
  - 19. The method of claim 1, wherein the textual content input further comprises one or more timestamps associated with the word.
  - 20. The method of claim 19, wherein the increased probability that the automatic speech recognition algorithm recognizes the word is further increased when the speech recognition algorithm is recognizing speech with a timestamp that falls within an in-interval, wherein the in-interval comprises a time interval during which the word appears in the visual content time segment.
  - 21. The method of claim 20, further comprising assigning a decaying weight to the word at a beginning or end of the in-interval such that the increased probability decreases over a time period which precedes or follows the in-interval.
  - 22. The method of claim 1, wherein the increased probability is based at least in part on a weight assigned to the word by a frequency-based weighting algorithm, wherein the assigned weight is based on a frequency with which the word is generally used.
  - 23. The method of claim 22, wherein speech recognition training data is used to determine the frequency with which the word is generally used.
  - 24. The method of claim 1, wherein the textual content input is used to augment at least one of an existing automatic speech recognition dictionary and an existing speech recognition language model.
  - 25. The method of claim 1, wherein the textual content input is used to select at least one of an existing automatic speech recognition dictionary, an existing automatic speech recognition language model, and an existing automatic speech recognition acoustic model.

26. A computer-readable medium having computer-readable instructions stored thereon that, upon execution by a processor, cause the processor to recognize speech, the instructions configured to:
- (a) create a textual content input comprising a word, wherein the word is obtained from textual content extracted from a visual content time segment associated with a rich media presentation; and
  
  (b) provide the textual content input to an automatic speech recognition algorithm such that there is an increased probability that the automatic speech recognition algorithm recognizes the word within an audio content time segment associated with the rich media presentation.
- View Dependent Claims (27)
- - 27. The computer-readable medium of claim 26, wherein the instructions are further configured to extract the textual content from the visual content time segment using an optical character recognition algorithm.

28. A method of recognizing speech, the method comprising:
- (a) creating a textual content input comprising a word obtained from textual metadata content associated with a rich media presentation; and
  
  (b) providing the textual content input to an automatic speech recognition algorithm such that there is an increased probability that the automatic speech recognition algorithm recognizes the word within an audio content time segment associated with the rich media presentation.
- View Dependent Claims (29)
- - 29. The method of claim 28, wherein the textual metadata content comprises at least one of an abstract describing the rich media presentation, a date of the rich media presentation, a presenter name, a title of the rich media presentation, and an annotation provided by a viewer of the rich media presentation.

30. A system for recognizing speech comprising:
- (a) an automatic speech recognition application, wherein the automatic speech recognition application comprises computer code configured toreceive a textual content input comprising a word, wherein the word is obtained from textual content extracted from a visual content time segment associated with a rich media presentation; and
  
  use the textual content input to increase a probability that the word is recognized within an audio content time segment associated with the rich media presentation;
  
  (b) a memory configured to store the automatic speech recognition application; and
  
  (c) a processor coupled to the memory, wherein the processor is configured to execute the automatic speech recognition application.

31. A method of recognizing speech, the method comprising:
- (a) extracting textual content from audiovisual content;
  
  (b) creating a textual content input comprising a word from the extracted textual content; and
  
  (c) providing the textual content input to an automatic speech recognition algorithm such that there is an increased probability that the automatic speech recognition algorithm recognizes the word within audio from the audiovisual content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonic Foundry Incorporated
Original Assignee
Sonic Foundry Incorporated
Inventors
Hancock, John, Scott, Jonathan, Knight, Michael J., Yurick, Steven J.

Application Number

US11/742,150
Publication Number

US 20080270110A1
Time in Patent Office

Days
Field of Search
US Class Current

704/3
CPC Class Codes

G06F 16/433   using audio data

G06F 16/434   using image data, e.g. imag...

G06F 16/4393   Multimedia presentations, e...

G06F 16/48   Retrieval characterised by ...

G06F 16/61   Indexing; Data structures t...

G06F 16/685   using automatically derived...

G10L 15/06   Creation of reference templ...

G10L 2015/228   of application context

AUTOMATIC SPEECH RECOGNITION WITH TEXTUAL CONTENT INPUT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

129 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATIC SPEECH RECOGNITION WITH TEXTUAL CONTENT INPUT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

129 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links