Transcription generation from multiple speech recognition systems
First Claim
1. A method comprising:
- obtaining first audio data originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the first audio data includes speech;
obtaining a first text string that is a transcription of the first audio data, the first text string generated by a first automatic speech recognition system using the first audio data and using a first model trained for a plurality of individuals;
obtaining a second text string that is a transcription of second audio data, the second audio data including a revoicing of the first audio data by a captioning assistant and the second text string generated by a second automatic speech recognition system using the second audio data and using a second model trained for the captioning assistant;
obtaining a third text string that is a transcription of the first audio data or the second audio data, the third text string generated by a third automatic speech recognition system using a third model;
generating an output text string from the first text string, the second text string, and the third text string; and
providing the output text string as a transcription of the speech to the second device for presentation during the communication session concurrently with the presentation of the first audio data by the second device.
11 Assignments
0 Petitions
Accused Products
Abstract
A method may include obtaining first audio data originating at a first device during a communication session between the first device and a second device. The method may also include obtaining a first text string that is a transcription of the first audio data, where the first text string may be generated using automatic speech recognition technology using the first audio data. The method may also include obtaining a second text string that is a transcription of second audio data, where the second audio data may include a revoicing of the first audio data by a captioning assistant and the second text string may be generated by the automatic speech recognition technology using the second audio data. The method may further include generating an output text string from the first text string and the second text string and using the output text string as a transcription of the speech.
-
Citations
20 Claims
-
1. A method comprising:
-
obtaining first audio data originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the first audio data includes speech; obtaining a first text string that is a transcription of the first audio data, the first text string generated by a first automatic speech recognition system using the first audio data and using a first model trained for a plurality of individuals; obtaining a second text string that is a transcription of second audio data, the second audio data including a revoicing of the first audio data by a captioning assistant and the second text string generated by a second automatic speech recognition system using the second audio data and using a second model trained for the captioning assistant; obtaining a third text string that is a transcription of the first audio data or the second audio data, the third text string generated by a third automatic speech recognition system using a third model; generating an output text string from the first text string, the second text string, and the third text string; and providing the output text string as a transcription of the speech to the second device for presentation during the communication session concurrently with the presentation of the first audio data by the second device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
obtaining first audio data originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the first audio data includes speech; obtaining a first text string that is a transcription of the first audio data, the first text string generated using automatic speech recognition technology using the first audio data; obtaining a second text string that is a transcription of second audio data, the second audio data including a revoicing of the first audio data by a captioning assistant and the second text string generated by the automatic speech recognition technology using the second audio data; obtaining a third text string that is a transcription of the first audio data or the second audio data, the third text string generated by the automatic speech recognition technology; generating an output text string from the first text string, the second text string, and the third text string; and using the output text string as a transcription of the speech. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A method comprising:
-
obtaining first audio data originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the first audio data includes speech; obtaining a first text string that is a transcription of the first audio data, the first text string generated using automatic speech recognition technology using the first audio data; obtaining a second text string that is a transcription of second audio data, the second audio data including a revoicing of the first audio data and the second text string generated by the automatic speech recognition technology using the second audio data; obtaining a third text string that is a transcription of the first audio data or the second audio data, the third text string generated by the automatic speech recognition technology; generating an output text string from the first text string, the second text string, and the third text string, the output text string including one or more words based on at least two of the first text string, the second text string, and the third text string including the one or more words; and providing the output text string as a transcription of the speech to the second device for presentation during the communication session by the second device.
-
-
20. A system comprising:
-
one or more processors; and at least one non-transitory computer-readable media coupled to the one or more processors, the at least one non-transitory computer-readable media configured to store one or more instructions that in response to being executed by the one or more processors cause the system to perform operations, the operations comprising; obtain first audio data originating at a first device during a communication session between the first device and a second device, the communication session configured for verbal communication such that the first audio data includes speech; obtain a first text string that is a transcription of the first audio data, the first text string generated using automatic speech recognition technology using the first audio data; obtain a second text string that is a transcription of second audio data, the second audio data including a revoicing of the first audio data and the second text string generated by the automatic speech recognition technology using the second audio data; obtain a third text string that is a transcription of the first audio data or the second audio data, the third text string generated by the automatic speech recognition technology; generate an output text string from the first text string, the second text string, and the third text string; and provide the output text string as a transcription of the speech.
-
Specification