Speech Recognition By Post Processing Using Phonetic and Semantic Information

US 20150179169A1
Filed: 12/19/2013
Published: 06/25/2015
Est. Priority Date: 12/19/2013
Status: Abandoned Application

First Claim

Patent Images

1. A method for improving speech recognition of an Automatic Speech Recognition System (ASR) comprising:

providing, on a non-transitory computer readable storage medium, a vocabulary comprising words from a specified language and their corresponding phonemes;

obtaining at least one sequence of phonemes generated by the ASR from at least one sentence spoken by a human user in a specified language into the ASR, the at least one sentence spoken by a human user comprising words occurring in the vocabulary;

comparing the at least one sequence of phonemes obtained from the ASR for each sentence with the phonemes for at least one spoken word in the vocabulary;

determining whether at least one error is present in the sequence of phonemes obtained from the ASR;

assigning contiguous phonemes obtained from the ASR for each sentence to words in the vocabulary;

producing at least one sequence of words from the assigned words in the vocabulary; and

correcting the at least one error, if present, in the sequence of phonemes obtained from the ASRwhere the ASR is executed on a computer system with one or more processors.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is described for improving results of Automatic Speech Recognition (ASR) systems. ASR'"'"'s typically match patterns of incoming sounds to phonemes associated with sounds in a specified language, then associates phonemes with words. ASR'"'"'s typically consider combinations of up to three phonemes and up to three words. The limitation to small combinations of phonemes and words is one source of errors in ASR'"'"'s. The invention described here post processes the output from ASR'"'"'s. In one embodiment, the method forms long combinations of phonemes and words to improve ASR results. In another embodiment, the method detects errors by finding inconsistencies in the ASR'"'"'s output and then corrects these errors. Other embodiments correct errors that are phonetically close to the correct words, determines the right list of words from a large expected list of sentences, and further improves recognition where word errors are phonetically close to the correct words.

9 Citations

View as Search Results

18 Claims

1. A method for improving speech recognition of an Automatic Speech Recognition System (ASR) comprising:
- providing, on a non-transitory computer readable storage medium, a vocabulary comprising words from a specified language and their corresponding phonemes;
  
  obtaining at least one sequence of phonemes generated by the ASR from at least one sentence spoken by a human user in a specified language into the ASR, the at least one sentence spoken by a human user comprising words occurring in the vocabulary;
  
  comparing the at least one sequence of phonemes obtained from the ASR for each sentence with the phonemes for at least one spoken word in the vocabulary;
  
  determining whether at least one error is present in the sequence of phonemes obtained from the ASR;
  
  assigning contiguous phonemes obtained from the ASR for each sentence to words in the vocabulary;
  
  producing at least one sequence of words from the assigned words in the vocabulary; and
  
  correcting the at least one error, if present, in the sequence of phonemes obtained from the ASRwhere the ASR is executed on a computer system with one or more processors.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method as in claim 1 where the ASR generates sequences of words and where the words are converted to a sequence of phonemes.
  - 3. The method as in claim 1 where the ASR generates at least one utterance that is an incomplete or ungrammatical sentence in the specified language.
  - 4. The method as in claim 1 where the at least one error is determined using a formula using one or more of the following variables:
    - the number of incorrectly inserted phonemes, the number of incorrectly deleted phonemes, and the number of incorrectly substituted phonemes.
  - 5. The method as in claim 1 where the ASR generates sequences of phonemes that are written using non-roman characters.
  - 6. The method as in claim 1 where the ASR generates phonemes belonging to a language where there are different tones for the same sound.

7. A method for improving speech recognition of an Automatic Speech Recognition System (ASR) comprisingproviding, on a non-transitory computer readable storage medium, a vocabulary comprising words from a specified language and a collection of sentences of words;
- obtaining at least one sequence of words generated by the ASR from at least one sentence spoken by a human user in a specified language, the at least one sentence spoken by a human user comprising words occurring in the vocabulary;
  
  comparing the at least one sequence of words obtained from the ASR for each sentence with sequences of words that occur together in the collection of sentences;
  
  determining whether at least one error is present in the sequence of words obtained from the ASR;
  
  producing at least one sequence of words from the assigned words in the vocabulary; and
  
  correcting at least one error, if present, in the sequence of words obtained from the ASRwhere the ASR is executed on a computer system with one or more processors.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. A method as in claim 7 where the at least one sequences of words generated by the ASR is generated where any sequence of five or less contiguous words occur together in the collection of sentences.
  - 9. A method as in claim 7 where the ASR generates at least one utterance which is an incomplete or ungrammatical sentence in the specified language.
  - 10. The method as in claim 7 where the at least one error is determined using a formula using one or more of the following variables:
    - a number of incorrectly inserted words, a number of incorrectly deleted words, and a number of incorrectly substituted words.
  - 11. The method as in claim 7 where a search engine is used to determine whether the at least one sequence of words obtained from the ASR occurs in the collection of sentences in the language.
  - 12. The method as in claim 7 where the specified language is a language where sentences are not divided into words.
  - 13. The method as in claim 7 where at least one sentence in the collection of sentences from the specified language contains one or more words in another language.

14. A method for improving speech recognition of an Automatic Speech Recognition System (ASR) comprising:
- providing, on a non-transitory computer readable storage medium, a vocabulary comprising words from a specified language and a collection of sentences of words;
  
  obtaining at least one sequence of words generated by the ASR from at least one sentence spoken by a human user in a specified language, the at least one sentence spoken by a human user occurring in the collection of sentences;
  
  comparing the at least one sequence of words obtained from the ASR for each sentence with sequences of words that occur together in the collection of sentences;
  
  determining a distance of at least one sequence of words obtained from the ASR with the sequence of words occurring in each sentence in the collection of sentences; and
  
  obtaining from the vocabulary at least one sentence closest in distance to at least one sequence of words obtained from the ASRwhere the ASR is executed on a computer system with one or more processors.
- View Dependent Claims (15, 16, 17, 18)
- - 15. A method as in claim 14 where the ASR generates a sequence of phonemes that occur in one sequence of words, the one sequence of words being a sentence occurring in the collection of sentences.
  - 16. A method as in claim 14 where the distance between the one sequence of words and the sequence of words in one sentence in the collection is calculated using a method that finds the common longest sub-sequence of the two sequences of words.
  - 17. A method as in claim 14 where the collection of sentences include at least one sequence of words that may be an incomplete sentence in the language.
  - 18. A method as in claim 14 where at least one sentence in the collection of sentences from the specified language contains one or more words in another language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Thomas John, Vijay George John
Original Assignee
Thomas John, Vijay George John
Inventors
John, Thomas, John, Vijay George

Application Number

US14/134,710
Publication Number

US 20150179169A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/187 Phonemic context, e.g. pron...

Speech Recognition By Post Processing Using Phonetic and Semantic Information

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

9 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech Recognition By Post Processing Using Phonetic and Semantic Information

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links