Custom language models for audio content

US 8,447,608 B1
Filed: 12/10/2008
Issued: 05/21/2013
Est. Priority Date: 12/10/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving a collection of source texts;

identifying a type from a collection of types for each source text, each source text being associated with a particular type;

generating, by data processing apparatus, a type-specific language model for each identified type using the source texts associated with the respective type;

storing the language models in a computer-readable medium;

receiving an audio source file to be processed, the audio source file having a particular type;

selecting a particular weighted combination of the language models, wherein weighting the particular weighted combination of the language models depends on the type of the audio source file to be processed; and

generating, by data processing apparatus, a text file from the audio source file based on the selected weighted combination of language models.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This specification describes technologies relating to generating custom language models for audio content. In some implementations, a computer-implemented method is provided that includes the actions of receiving a collection of source texts; identifying a type from a collection of types for each source text, each source text being associated with a particular type; generating, for each identified type, a type-specific language model using the source texts associated with the respective type; and storing the language models.

Citations

40 Claims

1. A computer-implemented method comprising:
- receiving a collection of source texts;
  
  identifying a type from a collection of types for each source text, each source text being associated with a particular type;
  
  generating, by data processing apparatus, a type-specific language model for each identified type using the source texts associated with the respective type;
  
  storing the language models in a computer-readable medium;
  
  receiving an audio source file to be processed, the audio source file having a particular type;
  
  selecting a particular weighted combination of the language models, wherein weighting the particular weighted combination of the language models depends on the type of the audio source file to be processed; and
  
  generating, by data processing apparatus, a text file from the audio source file based on the selected weighted combination of language models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising:
    - processing one or more source texts to extract particular content portions.
  - 3. The method of claim 2, where the one or more source texts are structured documents and the processing includes removing tags indicating document structure.
  - 4. The method of claim 3, where the one or more source texts are in XML format.
  - 5. The method of claim 1, where each source text includes an identifier of the particular type associated with the source text.
  - 6. The method of claim 1, further comprising:
    - using the weighted combination of language models to generate the text file from the audio source file using speech-to-text processing; and
      
      storing the text file.
  - 7. The method of claim 6, further comprising:
    - filtering the received audio source file to remove noise prior to generating the text file.
  - 8. The method of claim 6, further comprising:
    - receiving an input identifying a type of the collection of types associated with the audio source file; and
      
      selecting the weighted combination of language models according to the received input.
  - 9. The method of claim 6, wherein the weighted combination of language models includes a default language model generated using a second collection of text.
  - 10. The method of claim 1, further comprising:
    - evaluating the generated language models based on a word error rate.
  - 11. The method of claim 10, further comprising:
    - adjusting the generated language models based on the evaluation of the generated language models; and
      
      storing the adjusted language models.
  - 12. The method of claim 1, where the particular type is based at least in part on a genre of one of a group of cinematic, television, musical, or literary compositions.
  - 13. The method of claim 1, where generating each type-specific language model includes training the type-specific language model using the particular source texts of the type.

14. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
- receiving a collection of source texts;
  
  identifying a type from a collection of types for each source text, each source text being associated with a particular type;
  
  generating, for each identified type, a type-specific language model using the source texts associated with the respective type;
  
  storing the language models;
  
  receiving an audio source file to be processed, the audio source file having a particular type;
  
  selecting a particular weighted combination of the language models, wherein weighting the particular weighted combination of the language models depends on the type of the audio source file to be processed; and
  
  generating a text file from the audio source file based on the selected weighted combination of language models.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The computer program product of claim 14, further operable to perform operations comprising:
    - processing one or more source texts to extract particular content portions.
  - 16. The computer program product of claim 15, where the one or more source texts are structured documents and the processing includes removing tags indicating document structure.
  - 17. The computer program product of claim 16, where the one or more source texts are in XML format.
  - 18. The computer program product of claim 14, where each source text includes an identifier of the particular type associated with the source text.
  - 19. The computer program product of claim 14, further operable to perform operations comprising:
    - using the weighted combination of language models to generate the text file from the audio source file using speech-to-text processing; and
      
      storing the text file.
  - 20. The computer program product of claim 19, further operable to perform operations comprising:
    - filtering the received audio source file to remove noise prior to generating the text file.
  - 21. The computer program product of claim 19, further operable to perform operations comprising:
    - receiving an input identifying a type of the collection of types associated with the audio source file; and
      
      selecting the weighted combination of language models according to the received input.
  - 22. The computer program product of claim 19, wherein the weighted combination of language models includes a default language model generated using a second collection of text.
  - 23. The computer program product of claim 14, further operable to perform operations comprising:
    - evaluating the generated language models based on a word error rate.
  - 24. The computer program product of claim 23, further operable to perform operations comprising:
    - adjusting the generated language models based on the evaluation of the generated language models; and
      
      storing the adjusted language models.
  - 25. The computer program product of claim 14, where the particular type is based at least in part on a genre of one of a group of cinematic, television, musical, or literary compositions.
  - 26. The computer program product of claim 14, where generating each type-specific language model includes training the type-specific language model using the particular source texts of the type.

27. A system comprising:
- a processor and a memory operable to perform operations including;
  
  receiving a collection of source texts;
  
  identifying a type from a collection of types for each source text, each source text being associated with a particular type;
  
  generating, for each identified type, a type-specific language model using the source texts associated with the respective type;
  
  storing the language models;
  
  receiving an audio source file to be processed, the audio source file having a particular type;
  
  selecting a particular weighted combination of the language models, wherein weighting the particular weighted combination of the language models depends on the type of the audio source file to be processed; and
  
  generating a text file from the audio source file based on the selected weighted combination of language models.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 28. The system of claim 27, further operable to perform operations comprising:
    - processing one or more source texts to extract particular content portions.
  - 29. The system of claim 28, where the one or more source texts are structured documents and the processing includes removing tags indicating document structure.
  - 30. The system of claim 29, where the one or more source texts are in XML format.
  - 31. The system of claim 27, where each source text includes an identifier of the particular type associated with the source text.
  - 32. The system of claim 27, further operable to perform operations comprising:
    - using the weighted combination of language models to generate the text file from the audio source file using speech-to-text processing; and
      
      storing the text file.
  - 33. The system of claim 32, further operable to perform operations comprising:
    - filtering the received audio source file to remove noise prior to generating the text file.
  - 34. The system of claim 32, further operable to perform operations comprising:
    - receiving an input identifying a type of the collection of types associated with the audio source file; and
      
      selecting the weighted combination of language models according to the received input.
  - 35. The system of claim 32, wherein the weighted combination of language models includes a default language model generated using a second collection of text.
  - 36. The system of claim 27, further operable to perform operations comprising:
    - evaluating the generated language models based on a word error rate.
  - 37. The system of claim 36, further operable to perform operations comprising:
    - adjusting the generated language models based on the evaluation of the generated language models; and
      
      storing the adjusted language models.
  - 38. The system of claim 27, where the particular type is based at least in part on a genre of one of a group of cinematic, television, musical, or literary compositions.
  - 39. The system of claim 27, where generating each type-specific language model includes training the type-specific language model using the particular source texts of the type.

40. A method performed by a computer programmed to provide one or more language models, the method comprising:
- receiving a collection of source texts;
  
  identifying a type from a collection of types for each source text, each source text being associated with a particular type;
  
  generating, by the computer, a type-specific language model for each identified type using the source texts associated with the respective type;
  
  storing the language models at the computer;
  
  receiving an audio source file to be processed, the audio source file having a particular type;
  
  selecting a weighted combination of the language models, wherein the weights are determined based on the type of the audio source file to be processed; and
  
  generating, by the computer, a text file from the audio source file based on the selected weighted combination of language models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Systems Incorporated (Adobe Inc.)
Inventors
Chang, Walter W., Welch, Michael J.
Primary Examiner(s)
Neway, Samuel G

Application Number

US12/332,297
Time in Patent Office

1,623 Days
Field of Search

704/9, 704231-257
US Class Current

704/257
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/242   Dictionaries

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

G10L 2015/228   of application context

Custom language models for audio content

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Custom language models for audio content

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links