AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS

US 20110224982A1
Filed: 03/12/2010
Published: 09/15/2011
Est. Priority Date: 03/12/2010
Status: Abandoned Application

First Claim

Patent Images

1. In a computing environment, a system comprising:

a recognition mechanism that processes audio input into acoustic units;

a feature extraction mechanism that processes the acoustic units into features derived from the acoustic units; and

an information retrieval-based scoring mechanism that inputs the features and determines one or more words or acoustic scores associated with words based upon the features.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words). Also described is the use of IR techniques to provide a full large vocabulary continuous speech (LVCSR) recognizer

90 Citations

View as Search Results

20 Claims

1. In a computing environment, a system comprising:
- a recognition mechanism that processes audio input into acoustic units;
  
  a feature extraction mechanism that processes the acoustic units into features derived from the acoustic units; and
  
  an information retrieval-based scoring mechanism that inputs the features and determines one or more words or acoustic scores associated with words based upon the features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1 wherein the recognition mechanism outputs information corresponding to sub-word units, comprising phonemes, multi-phones or syllables, as the acoustic units.
  - 3. The system of claim 1 wherein the recognition mechanism outputs information corresponding to words as the acoustic units.
  - 4. The system of claim 1 wherein the features comprise one or more n-gram unit features.
  - 5. The system of claim 1 wherein features comprise length-related information.
  - 6. The system of claim 1 wherein the one or more words or acoustic scores are used by a telephony application.
  - 7. The system of claim 1 wherein the one or more words or acoustic scores are used by a continuous speech recognizer, including by combining information retrieval-based acoustic scores associated with each word with a language model score to decode an utterance.
  - 8. The system of claim 7 wherein the acoustic score is variable depending on whether there is an exact match between acoustic units and units in a dictionary used by the continuous speech recognizer.
  - 9. The system of claim 1 wherein the one or more words or acoustic scores are used by a continuous speech recognizer, including by combining information retrieval-based acoustic scores associated with each word with length data and a language model score to decode an utterance.
  - 10. The system of claim 1 wherein the information retrieval-based scoring mechanism comprises a vector space model-based scoring mechanism.
  - 11. The system of claim 10 wherein the vector space model-based scoring mechanism is trained based upon TF-IDF counts in training data to determine term weights.
  - 12. The system of claim 10 wherein the vector space model-based scoring mechanism is trained based upon training data and discriminative training to determine term weights.
  - 13. The system of claim 1 wherein the information retrieval-based scoring mechanism comprises a language model-based scoring mechanism.

14. In a computing environment, a method performed on at least one processor, comprising, processing audio input into acoustic units, extracting features corresponding to the acoustic units, and using information retrieval-based scoring to determine acoustic scores for words based upon the features.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14 further comprising, providing a business listing based upon the acoustic scores for the words.
  - 16. The method of claim 14 further comprising, using the acoustic scores for a plurality of candidate words with length data and a language model score to decode an utterance.
  - 17. The method of claim 16 further comprising, determining whether there is an exact match between acoustic units and units in a dictionary, and if so, changing the acoustic score.

18. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
- receiving speech;
  
  extracting units based upon the speech and hypothesized word boundaries;
  
  determining candidate words that are associated with the units;
  
  computing an information-retrieval based acoustic score for each candidate word and associating that acoustic score with that candidate word; and
  
  sorting the candidate words by acoustic score.
- View Dependent Claims (19, 20)
- - 19. The one or more computer-readable media of claim 18 having further computer-executable instructions comprising, combining at least some of the candidate words into n-gram sequences, and determining an utterance based on the scores associated with candidate words of an n-gram sequence with a language model score.
  - 20. The one or more computer-readable media of claim 18 having further computer-executable instructions comprising, determining whether there is an exact match between a set of acoustic units corresponding to a word and units in a dictionary, and if so, changing the acoustic score associated with that word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Xiao, Xiaoqiang, Zweig, Geoffrey G., Acero, Alejandro, Droppo, James Garnet III

Application Number

US12/722,556
Publication Number

US 20110224982A1
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 2015/025 Phonemes, fenemes or fenone...

AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

90 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

90 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links