Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data

US 9,009,041 B2
Filed: 07/26/2011
Issued: 04/14/2015
Est. Priority Date: 07/26/2011
Status: Active Grant

First Claim

Patent Images

1. A personal computing device for use with a remote automatic speech recognition engine, the device comprising:

a communications port configured to receive a data set and audio data from the remote automatic speech recognition engine,wherein the data set and the audio data reflect speech,wherein the data set is a rich data set that includes a word list for candidate words with confidence scores, andwherein the data set is generated by the remote automatic speech recognition engine in response to the audio data;

a display device for displaying information to a user;

memory for at least temporarily storing personal data and executable code for a re-recognition engine,wherein the re-recognition engine includes automatic speech recognition capability; and

at least one processor coupled among the communications port, the display device, and the memory,wherein the at least one processor is configured to execute the code for the re-recognition engine and—

access the personal data from the memory,generate a local transcription using the audio data, wherein the local transcription is generated using the speech recognition capability of the re-recognition engine and the accessed personal data,rescore the data set received from the remote automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data and confidence scores associated with the local transcription,generate a final transcription of the speech using the rescored data set and the local transcription,present, via the display device, the final transcription of the speech to the user based on the rescored data set and local transcription, andcreate a rule that a particular word in the data set from the remote automatic speech recognition engine is to be replaced by a particular replacement word from the local transcription, and transmit, via the communications port, the rule or the rescored data set to the remote automatic speech recognition engine from which a vocabulary of the remote automatic speech recognition engine is modified,wherein the remote automatic speech recognition engine is hosted by a server accessible via a network, and the personal computing device is a cell phone, smart phone, tablet or portable telecommunications device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.

Citations

23 Claims

1. A personal computing device for use with a remote automatic speech recognition engine, the device comprising:
- a communications port configured to receive a data set and audio data from the remote automatic speech recognition engine,wherein the data set and the audio data reflect speech,wherein the data set is a rich data set that includes a word list for candidate words with confidence scores, andwherein the data set is generated by the remote automatic speech recognition engine in response to the audio data;
  
  a display device for displaying information to a user;
  
  memory for at least temporarily storing personal data and executable code for a re-recognition engine,wherein the re-recognition engine includes automatic speech recognition capability; and
  
  at least one processor coupled among the communications port, the display device, and the memory,wherein the at least one processor is configured to execute the code for the re-recognition engine and—
  
  access the personal data from the memory,generate a local transcription using the audio data, wherein the local transcription is generated using the speech recognition capability of the re-recognition engine and the accessed personal data,rescore the data set received from the remote automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data and confidence scores associated with the local transcription,generate a final transcription of the speech using the rescored data set and the local transcription,present, via the display device, the final transcription of the speech to the user based on the rescored data set and local transcription, andcreate a rule that a particular word in the data set from the remote automatic speech recognition engine is to be replaced by a particular replacement word from the local transcription, and transmit, via the communications port, the rule or the rescored data set to the remote automatic speech recognition engine from which a vocabulary of the remote automatic speech recognition engine is modified,wherein the remote automatic speech recognition engine is hosted by a server accessible via a network, and the personal computing device is a cell phone, smart phone, tablet or portable telecommunications device.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The personal computing device of claim 1 wherein the data set further comprises a word lattice or a phoneme lattice.
  - 3. The personal computing device of claim 2 wherein the data set includes a language model associated with the user.
  - 4. The personal computing device of claim 1 wherein the personal data includes at least one acoustic model associated with the user.
  - 5. The personal computing device of claim 1 wherein the personal data includes at least one of:
    - address book information, text-based messages, and data from web-based sources.

6. A method of generating a secondary transcription from a primary transcription generated by a remote automatic speech recognition (ASR) engine, wherein the method is performed by a computing system having a processor and a memory, the method comprising:
- maintaining a personal vocabulary that includes replacement words,wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user;
  
  receiving primary transcription data from an audio recording,wherein the primary transcription data is generated by the remote ASR engine using an ASR vocabulary;
  
  wherein the primary transcription data includes a primary transcription and confidence scores associated with words in the primary transcription, andwherein the confidence scores are generated by the remote ASR engine;
  
  receiving audio data that corresponds at least in part to a portion of the received primary transcription data;
  
  generating a local transcription using the audio data, wherein the local transcription is generated by a local ASR engine at the computing system using the personal vocabulary;
  
  identifying at least one replacement word from the local transcription;
  
  comparing the replacement word to at least a portion of the received primary transcription;
  
  producing a modified score associated with the portion of the received primary transcription based at least in part on the comparison;
  
  generating a secondary transcription using the modified score, wherein the secondary transcription includes at least the one replacement word and the replacement word appears in the secondary transcription in place of at least one word from the primary transcription; and
  
  creating a rule that a particular word in the data set from the remote speech recognition engine is to be replaced by a particular replacement word from the secondary transcription, and transmitting the rule or the modified score to the remote ASR engine from which a vocabulary of the remote ASR engine is modified.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The method of claim 6, comprising:
    - receiving a rich-recognition-result for the portion of the received primary transcription data; and
      
      modifying the score based at least in part on the local transcription.
  - 8. The method of claim 6, wherein the received primary transcription data includes a rich-recognition-result with phoneme data associated with the audio recording.
  - 9. The method of claim 6, wherein the replacement word is associated with a weighting, and wherein the weighting is indicative of a relative significance of the replacement word to the user.
  - 10. The method of claim 6, wherein the personal data associated with the user includes data from one of an SMS or MMS message, an email, a contact, or a social network.
  - 11. The method of claim 6, wherein the received primary transcription data includes a word lattice and generating the secondary transcription comprises finding a path through the word lattice that is different from a path through the word lattice that was used to generate the primary transcription.
  - 12. The method of claim 6, wherein the replacement word is associated with a tag, and wherein the tag identifies that the replacement word is associated with one of a name, a location, or a family member or frequent contact.
  - 13. The method of claim 6, further comprising obtaining user approval prior to forwarding the report to the remote ASR engine.

14. A method of replacing one or more words in a transcription generated by a remote automatic speech recognition (ASR) engine, wherein the method is performed by a personal computing system having a processor and a memory, the method comprising:
- maintaining a personal vocabulary that includes replacement words;
  
  wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; and
  
  receiving a transcription of an audio recording,wherein the transcription is generated by the remote ASR engine using an ASR vocabulary,wherein the ASR vocabulary is separate from the personal vocabulary, and,wherein the transcription includes at least one transcribed word that represents at least one spoken word in the audio recording;
  
  receiving data associated with the transcribed word, wherein the received data is a rich data set that includes a word lattice, confidence scores, and a phoneme lattice;
  
  receiving audio data that includes the spoken word;
  
  generating a second transcription using the audio data, wherein the second transcription is generated by a local ASR engine using the personal vocabulary to rescore the rich data set;
  
  identifying a replacement word from the second transcription;
  
  replacing the transcribed word with the replacement word; and
  
  creating a rule that a particular word in the data set from the remote ASR engine is to be replaced by the replacement word from the second transcription and transmitting the rule or the rescored rich data set to the remote ASR engine from which a vocabulary of the remote ASR engine is modified;
  
  wherein the personal computing system is a mobile phone or tablet, wherein the remote ASR engine is located geographically remotely from the personal computing system; and
  
  wherein the replacement word is from the personal vocabulary.

15. A method of replacing one or more words in a transcription generated by a remote automatic speech recognition (ASR) engine, wherein the method is performed by a portable computing system having a processor and a memory, the method comprising:
- maintaining a personal vocabulary that includes replacement words;
  
  wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; and
  
  wherein the personal data is obtained from;
  
  stored contact data for the user,stored calendar data for the user,text-based messages sent or received by the user;
  
  ora social network of which the user is a member,receiving a transcription of an audio recording,wherein the transcription is generated by the remote ASR engine using an ASR vocabulary,wherein the transcription includes a transcribed word that represents a spoken word in the audio recording, andwherein the remote ASR engine is located geographically remotely from the portable computing system;
  
  receiving data associated with the transcribed word,wherein the data associated with the transcribed word includes a word lattice and associated confidence scores, andwherein the confidence scores are generated by the remote ASR engine;
  
  receiving audio data that includes the spoken word;
  
  generating a second transcription using the audio data, wherein the second transcription is generated by a local ASR engine using the personal vocabulary and re-scores the word lattice;
  
  identifying a replacement word from the second transcription; and
  
  creating a rule to automatically replace the transcribed word with the replacement word whenever the transcribed word is found in a transcription, and transmitting the rule or the re-scored word lattice to the remote ASR engine from which a vocabulary of the remote ASR engine is modified;
  
  wherein the portable computing system is a mobile phone or table, and wherein the replacement word is from the personal vocabulary.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
- - 16. The method of claim 15, wherein the replacement word is associated with a confidence score that is greater than a confidence score associated with the transcribed word, and wherein the confidence score of the transcribed word is less than a threshold confidence level.
  - 17. The method of claim 15, wherein the replacement word is associated with a weighting, and wherein the weighting is indicative of a relative significance of the word to the user.
  - 18. The method of claim 15, wherein the data associated with the transcribed word includes a phonetic spelling of the transcribed word.
  - 19. The method of claim 15, wherein the transcription includes metadata including a phone number or electronic address of a person, and wherein the replacement word is associated with a name of the person from the contact for the user.
  - 20. The method of claim 15, wherein the replacement word is associated with a tag, and wherein the tag identifies that the replacement word is associated with one of a name, a location, or a family member or frequent contact.
  - 21. The method of claim 15, further comprising obtaining user approval before sending the report to the remote ASR engine.
  - 22. The method of claim 15, further comprising receiving a selection from the user of the replacement word, wherein upon the selection from the user, the replacement word is substituted in the transcription for the transcribed word.
  - 23. The method of claim 15, wherein the threshold confidence level depends at least in part on a weighting associated with the replacement word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zavaliagkos, George, Ganong, William F. III, Jost, Uwe H., Madhavapeddi, Shreedhar, Clayton, Gary B.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sirjani, Fariba

Application Number

US13/190,749
Publication Number

US 20130030804A1
Time in Patent Office

1,358 Days
Field of Search

704/235
US Class Current

704/235
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/08   Speech classification or se...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/227   of the speaker; Human-fact...

Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links