SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA

US 20130030804A1
Filed: 07/26/2011
Published: 01/31/2013
Est. Priority Date: 07/26/2011
Status: Active Grant

First Claim

Patent Images

1. A personal computing device for use with an automatic speech recognition engine, the device comprising:

a communications port configured to receive a data set from the automatic speech recognition engine,wherein the data set includes a word list for candidate words with confidence scores generated by the automatic speech recognition engine in response to audio data;

a display device for displaying information to a user;

memory for at least temporarily storing personal data and executable code for a re-recognition engine,wherein the re-recognition engine is similar to, but has less speech recognition functionality than, the automatic speech recognition engine; and

at least one processor coupled among the communications port, the display device, and the memory,wherein the at least one processor is configured to perform a re-recognition of the data set to generate a transcription of the audio data, andwherein the at least one processor is further configured to—

access the personal data from the memory,rescore the data set received from the automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data, andpresent, via the display device, a transcription of the audio to the user based on the rescored data set, or transmit, via the communications port, the rescored data set to the automatic speech recognition engine.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.

313 Citations

29 Claims

1. A personal computing device for use with an automatic speech recognition engine, the device comprising:
- a communications port configured to receive a data set from the automatic speech recognition engine,wherein the data set includes a word list for candidate words with confidence scores generated by the automatic speech recognition engine in response to audio data;
  
  a display device for displaying information to a user;
  
  memory for at least temporarily storing personal data and executable code for a re-recognition engine,wherein the re-recognition engine is similar to, but has less speech recognition functionality than, the automatic speech recognition engine; and
  
  at least one processor coupled among the communications port, the display device, and the memory,wherein the at least one processor is configured to perform a re-recognition of the data set to generate a transcription of the audio data, andwherein the at least one processor is further configured to—
  
  access the personal data from the memory,rescore the data set received from the automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data, andpresent, via the display device, a transcription of the audio to the user based on the rescored data set, or transmit, via the communications port, the rescored data set to the automatic speech recognition engine.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The personal computing device of claim 1 wherein the automatic speech recognition engine is hosted by a server accessible via a network, and wherein the personal computing device is a cell phone, smart phone, tablet or portable telecommunications device.
  - 3. The personal computing device of claim 1 wherein the data set is a rich data set that includes a word lattice, confidence scores, and a phoneme lattice.
  - 4. The personal computing device of claim 3 wherein the data set includes a language model associated with the user.
  - 5. The personal computing device of claim 1 wherein the personal data includes at least one acoustic model associated with the user
  - 6. The personal computing device of claim 1 wherein the personal data includes at least one of:
    - address book information, text-based messages, and data from web-based sources.

7. A method of generating a secondary transcription from a primary transcription generated by an automatic speech recognition (ASR) engine, wherein the method is performed by a computing system having a processor and a memory, the method comprising:
- maintaining a personal vocabulary that includes replacement words,wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user;
  
  receiving primary transcription data from an audio recording,wherein the primary transcription data is generated by the ASR engine using an ASR vocabulary;
  
  wherein the primary transcription data includes a primary transcription and a score associated words in with the primary transcription, andwherein the score is generated by the ASR engine;
  
  identifying at least one replacement word from the personal vocabulary;
  
  comparing the replacement word to at least a portion of the received primary transcription;
  
  producing a modified score associated with the portion of the received primary transcription based at least in part on the comparison; and
  
  generating a secondary transcription using the modified score,wherein the secondary transcription includes at least the one replacement word.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 8. The method of claim 7, wherein the computing system is a mobile phone or tablet, wherein the ASR engine is executed by a server assessable by the mobile phone or tablet via a network, and wherein the method further comprises:
    - receiving a rich-recognition-result for the portion of the received primary transcription data and an audio data that corresponds at least in part to the portion of the received primary transcription data; and
      
      generating a local transcription using the audio data, wherein the local transcription is generated by a local ASR engine at the mobile phone or tablet using the personal vocabulary, and wherein the score is modified based at least in part on the local transcription.
  - 9. The method of claim 7, wherein the received primary transcription data includes a rich-recognition-result with phoneme data associated with the audio recording.
  - 10. The method of claim 7, wherein the replacement word is associated with a weighting, and wherein the weighting is indicative of a relative significance of the replacement word to the user.
  - 11. The method of claim 7, wherein the personal data associated with the user includes data from one of an SMS or MMS message, an email, a contact, or a social network.
  - 12. The method of claim 7, wherein the received primary transcription data includes as a word lattice and generating the secondary transcription comprises finding a path through the word lattice that is different from a path through the word lattice that was used to generate the primary transcription.
  - 13. The method of claim 7, wherein the replacement word is associated with a tag, and wherein the tag identifies that the replacement word is associated with one of a name, a location, or a family member or frequent contact.
  - 14. The method of claim 7, wherein the replacement word appears in the secondary transcription in place of at least one word from the primary transcription.
  - 15. The method of claim 14, further comprising generating a report that identifies the replacement word and the at least one word from the primary transcription, obtaining user approval, and then forwarding the report to the ASR engine.
  - 16. The method of claim 14, further comprising creating a rule to automatically replace the at least one word from the primary transcription with the replacement word whenever the at least one word is found in a transcription.

17. A method of replacing one or more words in a transcription generated by an automatic speech recognition (ASR) engine, wherein the method is performed by a personal computing system having a processor and a memory, the method comprising:
- maintaining a personal vocabulary that includes replacement words;
  
  wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; and
  
  receiving a transcription of an audio recording,wherein the transcription is generated by an ASR engine using an ASR vocabulary,wherein the ASR vocabulary is separate from the personal vocabulary, and,wherein the transcription includes at least one transcribed word that represents at least one spoken word in the audio recording;
  
  receiving data associated with the transcribed word;
  
  identifying a replacement word from the personal vocabulary; and
  
  ,replacing the transcribed word with the replacement word.
- View Dependent Claims (18)
- - 18. The method of claim 17, wherein the portable computing system is a mobile phone or tablet, wherein the ASR engine is located geographically remotely from the portable computing system, and wherein the method further comprises:
    - receiving audio data that includes the spoken word; and
      
      generating a second transcription using the audio data, wherein the second transcription is generated by a local ASR engine using the personal vocabulary and the second transcription includes the replacement word.

19. A method of replacing one or more words in a transcription generated by an automatic speech recognition (ASR) engine, wherein the method is performed by a portable computing system having a processor and a memory, the method comprising:
- maintaining a personal vocabulary that includes replacement words;
  
  wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; and
  
  wherein the personal data is obtained from;
  
  stored contact data for the user,stored calendar data for the user,text-based messages sent or received by the user;
  
  ora social network of which the user is a member,receiving a transcription of an audio recording,wherein the transcription is generated by the ASR engine using an ASR vocabulary,wherein the transcription includes a transcribed word that represents a spoken word in the audio recording, andwherein the ASR engine is located geographically remotely from the portable computing system;
  
  receiving data associated with the transcribed word,wherein the data associated with the transcribed word includes a confidence score, andwherein the confidence score is generated by the ASR engine;
  
  identifying a replacement word from the personal vocabulary;
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The method of claim 19, wherein the portable computing system is a mobile phone or tablet, and wherein the method further comprises:
    - receiving audio data that includes the spoken word; and
      
      generating a second transcription using the audio data, wherein the second transcription is generated by a local ASR engine using the personal vocabulary and the second transcription includes the replacement word.
  - 21. The method of claim 19, wherein the replacement word is associated with a confidence score that is greater than a confidence score associated with the transcribed word, and wherein the confidence score of the transcribed word is less than a threshold confidence level.
  - 22. The method of claim 19, wherein the replacement word is associated with a weighting, and wherein the weighting is indicative of a relative significance of the word to the user.
  - 23. The method of claim 19, wherein the data associated with the transcribed word includes a phonetic spelling of the transcribed word.
  - 24. The method of claim 19, wherein the transcription includes metadata including a phone number or electronic address of a person, and wherein the replacement word is associated with a name of the person from the contact for the user.
  - 25. The method of claim 19, wherein the replacement word is associated with a tag, and wherein the tag identifies that the replacement word is associated with one of a name, a location, or a family member or frequent contact.
  - 26. The method of claim 19, further comprising generating a report that includes the transcribed word and the replacement word, and obtaining user approval before forwarding the report to the ASR engine.
  - 27. The method of claim 19, further comprising creating a rule to automatically replace the transcribed word with the replacement word whenever the transcribed word is found in a transcription.
  - 28. The method of claim 19, further comprising receiving a selection from the user of the replacement word, wherein upon the selection from the user, the replacement word is substituted in the transcription for the transcribed word.
  - 29. The method of claim 19, wherein the threshold confidence level depends at least in part on a weighting associated with the replacement word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zavaliagkos, George, Ganong, William F. III, Jost, Uwe H., Madhavapeddi, Shreedhar, Clayton, Gary B.

Granted Patent

US 9,009,041 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/08   Speech classification or se...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/227   of the speaker; Human-fact...

SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

313 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

313 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links