SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA

US 20150221306A1
Filed: 04/13/2015
Published: 08/06/2015
Est. Priority Date: 07/26/2011
Status: Active Grant

First Claim

Patent Images

1-29. -29. (canceled)

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.

Citations

57 Claims

1-29. -29. (canceled)

30. A method of generating a personalized transcription from an audio recording, wherein the method is performed by a computing system, the method comprising:
- maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user;
  
  receiving, at the mobile device, a first transcription of an audio recording,wherein the first transcription is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, andwherein the first transcription includes a first word list and confidence scores associated with a plurality of words in the first word list;
  
  receiving audio data corresponding to at least a portion of the audio recording;
  
  generating a second transcription,wherein the second transcription is of the received audio data,wherein the second transcription comprises a second word list and confidence scores associated with a plurality of words in the second word list, andwherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary;
  
  re-scoring the first transcription, the re-scoring comprising;
  
  comparing the first transcription with the second transcription, and modifying a confidence score associated with a word in the first word list or adding a word and associated confidence score from the second word list to the first word list; and
  
  generating a final transcription based on the re-scored first transcription.
- View Dependent Claims (31, 32, 33, 34)
- - 31. The method of claim 30, wherein the personal data associated with the user includes data from at least one of an address book of the user, a SMS message sent or received by the user, an email sent or received by the user, a social network of the user, or a website visited by the user.
  - 32. The method of claim 30, wherein the personal data associated with the user includes at least one acoustic model associated with the user.
  - 33. The method of claim 30, wherein the audio recording is of a second user, the first transcription includes metadata associated with the second user, and the word from the second word list is added to the first word list based on the metadata.
  - 34. The method of claim 30, further comprising generating a report based on re-scoring the first transcription, and wherein generating the final transcription is further based on a previously-generated report.

35. A non-transitory computer-readable medium encoded with instructions that, when executed by a processor, perform a method in a computing system of generating a personalized transcription from an audio recording, the method comprising:
- maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user;
  
  receiving, at the mobile device, a first transcription of an audio recording,wherein the first transcription is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, andwherein the first transcription includes a first word list and confidence scores associated with a plurality of words in the first word list;
  
  receiving audio data corresponding to at least a portion of the audio recording;
  
  generating a second transcription,wherein the second transcription is of the received audio data,wherein the second transcription comprises a second word list and confidence scores associated with a plurality of words in the second word list, andwherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary;
  
  re-scoring the first transcription, the re-scoring comprising;
  
  comparing the first transcription with the second transcription, and modifying a confidence score associated with a word in the first word list or adding a word and associated confidence score from the second word list to the first word list; and
  
  generating a final transcription based on the re-scored first transcription.
- View Dependent Claims (36, 37, 38, 39)
- - 36. The non-transitory computer-readable medium of claim 35, wherein the personal data associated with the user includes data from at least one of an address book of the user, a SMS message sent or received by the user, an email sent or received by the user, a social network of the user, or a website visited by the user.
  - 37. The non-transitory computer-readable medium of claim 35, wherein the personal data associated with the user includes at least one acoustic model associated with the user.
  - 38. The non-transitory computer-readable medium of claim 35, wherein the audio recording is of a second user, the first transcription includes metadata associated with the second user, and the word from the second word list is added to the first word list is based on the metadata.
  - 39. The non-transitory computer-readable medium of claim 35, further comprising instructions for generating a report based on re-scoring the first transcription, and wherein generating the final transcription is further based on a previously-generated report.

40. A method of replacing a word in a transcription of an audio recording, wherein the method is performed by a computing system, the method comprising:
- maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user;
  
  receiving, at the mobile device, a first transcription of an audio recording,wherein the first transcription data is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, andwherein the first transcription includes confidence scores associated with certain words in the transcription;
  
  receiving audio data corresponding to the first transcription;
  
  identifying a replaceable word from the first transcription;
  
  generating a second transcription of a portion of the received audio data corresponding to the replaceable word,wherein the second transcription includes phonetic data, andwherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary; and
  
  identifying a replacement word that is used instead of the replaceable word in a final transcription,wherein the replacement word is identified based on a comparison between the phonetic data of the second transcription and the personal vocabulary, andwherein the replacement word is from the personal vocabulary.
- View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48)
- - 41. The method of claim 40, wherein the personal data associated with the user includes data from at least one of an address book of the user, a SMS message sent or received by the user, an email sent or received by the user, a social network of the user, or a website visited by the user.
  - 42. The method of claim 40, wherein the personal data associated with the user includes at least one acoustic model associated with the user.
  - 43. The method of claim 40, wherein the audio recording is of a second user, the first transcription includes metadata associated with the second user, and the replacement word is based on metadata.
  - 44. The method of claim 40, wherein identifying a replaceable word comprises identifying a word from the first transcription having a confidence score that is below a threshold level.
  - 45. The method of claim 44, wherein the threshold level is based on a weighting associated with the replacement word or based on a word in the personal vocabulary having a similar phonetic spelling to the replaceable word.
  - 46. The method of claim 40, wherein identifying a replaceable word comprises identifying a word from the first transcription that has a similar phonetic spelling to a word in the personal vocabulary.
  - 47. The method of claim 40, wherein a confidence score associated with the replacement word is greater than a confidence score associated with the replaceable word.
  - 48. The method of claim 40, further comprising generating a report based on the identified replacement word, and wherein identifying the replacement word is further based on a previously-generated report.

49. A non-transitory computer-readable medium encoded with instruction that, when executed by a processor, perform a method in a computing system of replacing a word in a transcription of an audio recording, the method comprising:
- maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user;
  
  receiving, at the mobile device, a first transcription of an audio recording,wherein the first transcription data is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, andwherein the first transcription includes confidence scores associated with certain words in the transcription;
  
  receiving audio data corresponding to the first transcription;
  
  identifying a replaceable word from the first transcription;
  
  generating a second transcription of a portion of the received audio data corresponding to the replaceable word,wherein the second transcription includes phonetic data, andwherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary; and
  
  identifying a replacement word that is used instead of the replaceable word in a final transcription,wherein the replacement word is identified based on a comparison between the phonetic data of the second transcription and the personal vocabulary, andwherein the replacement word is from the personal vocabulary.
- View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57)
- - 50. The non-transitory computer-readable medium of claim 49, wherein the personal data associated with the user includes data from at least one of an address book of the user, a SMS message sent or received by the user, an email sent or received by the user, a social network of the user, or a website visited by the user.
  - 51. The non-transitory computer-readable medium of claim 49, wherein the personal data associated with the user includes at least one acoustic model associated with the user.
  - 52. The non-transitory computer-readable medium of claim 49, wherein the audio recording is of a second user, the first transcription includes metadata associated with the second user, and the replacement word is based on the metadata.
  - 53. The non-transitory computer-readable medium of claim 49, wherein identifying a replaceable word comprises identifying a word from the first transcription having a confidence score that is below a threshold level.
  - 54. The non-transitory computer-readable medium of claim 53, wherein the threshold level is based on a weighting associated with the replacement word or based on a word in the personal vocabulary having a similar phonetic spelling to the replaceable word.
  - 55. The non-transitory computer-readable medium of claim 49, wherein identifying the replaceable word comprises identifying a word from the first transcription that has a phonetic spelling similar to a word in the personal vocabulary.
  - 56. The non-transitory computer-readable medium of claim 49, wherein a confidence score associated with the replacement word is greater than a confidence score associated with the replaceable word.
  - 57. The non-transitory computer-readable medium of claim 49, further comprising instructions for generating a report based on the identified replacement word, and wherein identifying the replacement word is further based on a previously-generated report.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zavaliagkos, George, Ganong, William F. III, Jost, Uwe H., Madhavapeddi, Shreedhar, Clayton, Gary B.

Granted Patent

US 9,626,969 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/08   Speech classification or se...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/227   of the speaker; Human-fact...

SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

57 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

57 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links