SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA
1 Assignment
0 Petitions
Accused Products
Abstract
A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.
-
Citations
57 Claims
-
1-29. -29. (canceled)
-
30. A method of generating a personalized transcription from an audio recording, wherein the method is performed by a computing system, the method comprising:
-
maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user; receiving, at the mobile device, a first transcription of an audio recording, wherein the first transcription is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, and wherein the first transcription includes a first word list and confidence scores associated with a plurality of words in the first word list; receiving audio data corresponding to at least a portion of the audio recording; generating a second transcription, wherein the second transcription is of the received audio data, wherein the second transcription comprises a second word list and confidence scores associated with a plurality of words in the second word list, and wherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary; re-scoring the first transcription, the re-scoring comprising; comparing the first transcription with the second transcription, and modifying a confidence score associated with a word in the first word list or adding a word and associated confidence score from the second word list to the first word list; and generating a final transcription based on the re-scored first transcription. - View Dependent Claims (31, 32, 33, 34)
-
-
35. A non-transitory computer-readable medium encoded with instructions that, when executed by a processor, perform a method in a computing system of generating a personalized transcription from an audio recording, the method comprising:
-
maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user; receiving, at the mobile device, a first transcription of an audio recording, wherein the first transcription is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, and wherein the first transcription includes a first word list and confidence scores associated with a plurality of words in the first word list; receiving audio data corresponding to at least a portion of the audio recording; generating a second transcription, wherein the second transcription is of the received audio data, wherein the second transcription comprises a second word list and confidence scores associated with a plurality of words in the second word list, and wherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary; re-scoring the first transcription, the re-scoring comprising; comparing the first transcription with the second transcription, and modifying a confidence score associated with a word in the first word list or adding a word and associated confidence score from the second word list to the first word list; and generating a final transcription based on the re-scored first transcription. - View Dependent Claims (36, 37, 38, 39)
-
-
40. A method of replacing a word in a transcription of an audio recording, wherein the method is performed by a computing system, the method comprising:
-
maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user; receiving, at the mobile device, a first transcription of an audio recording, wherein the first transcription data is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, and wherein the first transcription includes confidence scores associated with certain words in the transcription; receiving audio data corresponding to the first transcription; identifying a replaceable word from the first transcription; generating a second transcription of a portion of the received audio data corresponding to the replaceable word, wherein the second transcription includes phonetic data, and wherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary; and identifying a replacement word that is used instead of the replaceable word in a final transcription, wherein the replacement word is identified based on a comparison between the phonetic data of the second transcription and the personal vocabulary, and wherein the replacement word is from the personal vocabulary. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A non-transitory computer-readable medium encoded with instruction that, when executed by a processor, perform a method in a computing system of replacing a word in a transcription of an audio recording, the method comprising:
-
maintaining a personal vocabulary of words on a mobile device associated with a user, wherein the personal vocabulary is based on personal data associated with the user; receiving, at the mobile device, a first transcription of an audio recording, wherein the first transcription data is generated by an automatic speech recognition (ASR) engine not on the mobile device using an ASR vocabulary associated with a population of users, and wherein the first transcription includes confidence scores associated with certain words in the transcription; receiving audio data corresponding to the first transcription; identifying a replaceable word from the first transcription; generating a second transcription of a portion of the received audio data corresponding to the replaceable word, wherein the second transcription includes phonetic data, and wherein the second transcription is generated by an ASR engine on the mobile device using the maintained personal vocabulary; and identifying a replacement word that is used instead of the replaceable word in a final transcription, wherein the replacement word is identified based on a comparison between the phonetic data of the second transcription and the personal vocabulary, and wherein the replacement word is from the personal vocabulary. - View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57)
-
Specification