Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data
First Claim
1. A personal computing device for use with a remote automatic speech recognition engine, the device comprising:
- a communications port configured to receive a data set and audio data from the remote automatic speech recognition engine,wherein the data set and the audio data reflect speech,wherein the data set is a rich data set that includes a word list for candidate words with confidence scores, andwherein the data set is generated by the remote automatic speech recognition engine in response to the audio data;
a display device for displaying information to a user;
memory for at least temporarily storing personal data and executable code for a re-recognition engine,wherein the re-recognition engine includes automatic speech recognition capability; and
at least one processor coupled among the communications port, the display device, and the memory,wherein the at least one processor is configured to execute the code for the re-recognition engine and—
access the personal data from the memory,generate a local transcription using the audio data, wherein the local transcription is generated using the speech recognition capability of the re-recognition engine and the accessed personal data,rescore the data set received from the remote automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data and confidence scores associated with the local transcription,generate a final transcription of the speech using the rescored data set and the local transcription,present, via the display device, the final transcription of the speech to the user based on the rescored data set and local transcription, andcreate a rule that a particular word in the data set from the remote automatic speech recognition engine is to be replaced by a particular replacement word from the local transcription, and transmit, via the communications port, the rule or the rescored data set to the remote automatic speech recognition engine from which a vocabulary of the remote automatic speech recognition engine is modified,wherein the remote automatic speech recognition engine is hosted by a server accessible via a network, and the personal computing device is a cell phone, smart phone, tablet or portable telecommunications device.
2 Assignments
0 Petitions
Accused Products
Abstract
A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.
-
Citations
23 Claims
-
1. A personal computing device for use with a remote automatic speech recognition engine, the device comprising:
-
a communications port configured to receive a data set and audio data from the remote automatic speech recognition engine, wherein the data set and the audio data reflect speech, wherein the data set is a rich data set that includes a word list for candidate words with confidence scores, and wherein the data set is generated by the remote automatic speech recognition engine in response to the audio data; a display device for displaying information to a user; memory for at least temporarily storing personal data and executable code for a re-recognition engine, wherein the re-recognition engine includes automatic speech recognition capability; and at least one processor coupled among the communications port, the display device, and the memory, wherein the at least one processor is configured to execute the code for the re-recognition engine and— access the personal data from the memory, generate a local transcription using the audio data, wherein the local transcription is generated using the speech recognition capability of the re-recognition engine and the accessed personal data, rescore the data set received from the remote automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data and confidence scores associated with the local transcription, generate a final transcription of the speech using the rescored data set and the local transcription, present, via the display device, the final transcription of the speech to the user based on the rescored data set and local transcription, and create a rule that a particular word in the data set from the remote automatic speech recognition engine is to be replaced by a particular replacement word from the local transcription, and transmit, via the communications port, the rule or the rescored data set to the remote automatic speech recognition engine from which a vocabulary of the remote automatic speech recognition engine is modified, wherein the remote automatic speech recognition engine is hosted by a server accessible via a network, and the personal computing device is a cell phone, smart phone, tablet or portable telecommunications device. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of generating a secondary transcription from a primary transcription generated by a remote automatic speech recognition (ASR) engine, wherein the method is performed by a computing system having a processor and a memory, the method comprising:
-
maintaining a personal vocabulary that includes replacement words, wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; receiving primary transcription data from an audio recording, wherein the primary transcription data is generated by the remote ASR engine using an ASR vocabulary; wherein the primary transcription data includes a primary transcription and confidence scores associated with words in the primary transcription, and wherein the confidence scores are generated by the remote ASR engine; receiving audio data that corresponds at least in part to a portion of the received primary transcription data; generating a local transcription using the audio data, wherein the local transcription is generated by a local ASR engine at the computing system using the personal vocabulary; identifying at least one replacement word from the local transcription; comparing the replacement word to at least a portion of the received primary transcription; producing a modified score associated with the portion of the received primary transcription based at least in part on the comparison; generating a secondary transcription using the modified score, wherein the secondary transcription includes at least the one replacement word and the replacement word appears in the secondary transcription in place of at least one word from the primary transcription; and creating a rule that a particular word in the data set from the remote speech recognition engine is to be replaced by a particular replacement word from the secondary transcription, and transmitting the rule or the modified score to the remote ASR engine from which a vocabulary of the remote ASR engine is modified. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A method of replacing one or more words in a transcription generated by a remote automatic speech recognition (ASR) engine, wherein the method is performed by a personal computing system having a processor and a memory, the method comprising:
-
maintaining a personal vocabulary that includes replacement words; wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; and receiving a transcription of an audio recording, wherein the transcription is generated by the remote ASR engine using an ASR vocabulary, wherein the ASR vocabulary is separate from the personal vocabulary, and, wherein the transcription includes at least one transcribed word that represents at least one spoken word in the audio recording; receiving data associated with the transcribed word, wherein the received data is a rich data set that includes a word lattice, confidence scores, and a phoneme lattice; receiving audio data that includes the spoken word; generating a second transcription using the audio data, wherein the second transcription is generated by a local ASR engine using the personal vocabulary to rescore the rich data set; identifying a replacement word from the second transcription; replacing the transcribed word with the replacement word; and creating a rule that a particular word in the data set from the remote ASR engine is to be replaced by the replacement word from the second transcription and transmitting the rule or the rescored rich data set to the remote ASR engine from which a vocabulary of the remote ASR engine is modified; wherein the personal computing system is a mobile phone or tablet, wherein the remote ASR engine is located geographically remotely from the personal computing system; and wherein the replacement word is from the personal vocabulary.
-
-
15. A method of replacing one or more words in a transcription generated by a remote automatic speech recognition (ASR) engine, wherein the method is performed by a portable computing system having a processor and a memory, the method comprising:
-
maintaining a personal vocabulary that includes replacement words; wherein the replacement words in the personal vocabulary are obtained from personal data associated with a user; and wherein the personal data is obtained from; stored contact data for the user, stored calendar data for the user, text-based messages sent or received by the user;
ora social network of which the user is a member, receiving a transcription of an audio recording, wherein the transcription is generated by the remote ASR engine using an ASR vocabulary, wherein the transcription includes a transcribed word that represents a spoken word in the audio recording, and wherein the remote ASR engine is located geographically remotely from the portable computing system; receiving data associated with the transcribed word, wherein the data associated with the transcribed word includes a word lattice and associated confidence scores, and wherein the confidence scores are generated by the remote ASR engine; receiving audio data that includes the spoken word; generating a second transcription using the audio data, wherein the second transcription is generated by a local ASR engine using the personal vocabulary and re-scores the word lattice; identifying a replacement word from the second transcription; and creating a rule to automatically replace the transcribed word with the replacement word whenever the transcribed word is found in a transcription, and transmitting the rule or the re-scored word lattice to the remote ASR engine from which a vocabulary of the remote ASR engine is modified; wherein the portable computing system is a mobile phone or table, and wherein the replacement word is from the personal vocabulary. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification