Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers
First Claim
1. A speech recognition system for recognizing speech including overlapping speech by multiple speakers, comprising:
- a hardware processor;
computer storage memory to store data along with having computer-executable instructions stored thereon that, when executed by the processor is to implement a stored speech recognition network;
an input interface to receive an acoustic signal, the received acoustic signal including a mixture of speech signals by multiple speakers, wherein the multiple speakers include target speakers;
an encoder network and a decoder network of the stored speech recognition network are trained to transform the received acoustic signal into a text for each target speaker, such that the encoder network outputs a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker; and
an output interface to transmit the text for each target speaker.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for a speech recognition system for recognizing speech including overlapping speech by multiple speakers. The system including a hardware processor. A computer storage memory to store data along with having computer-executable instructions stored thereon that, when executed by the processor is to implement a stored speech recognition network. An input interface to receive an acoustic signal, the received acoustic signal including a mixture of speech signals by multiple speakers, wherein the multiple speakers include target speakers. An encoder network and a decoder network of the stored speech recognition network are trained to transform the received acoustic signal into a text for each target speaker. Such that the encoder network outputs a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker. An output interface to transmit the text for each target speaker.
-
Citations
21 Claims
-
1. A speech recognition system for recognizing speech including overlapping speech by multiple speakers, comprising:
-
a hardware processor; computer storage memory to store data along with having computer-executable instructions stored thereon that, when executed by the processor is to implement a stored speech recognition network; an input interface to receive an acoustic signal, the received acoustic signal including a mixture of speech signals by multiple speakers, wherein the multiple speakers include target speakers; an encoder network and a decoder network of the stored speech recognition network are trained to transform the received acoustic signal into a text for each target speaker, such that the encoder network outputs a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker; and an output interface to transmit the text for each target speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A speech recognition system for recognizing speech including overlapping speech by multiple speakers, comprising:
-
a hardware processor; computer storage memory to store data along with having computer-executable instructions stored thereon that, when executed by the processor, is to implement a stored speech recognition network; an input interface to receive an acoustic signal, the received acoustic signal includes a mixture of speech signals by multiple speakers, wherein the multiple speakers include target speakers; an encoder network and a decoder network of the stored speech recognition network are trained to transform the received acoustic signal into a text for each target speaker, such that the encoder network outputs a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker, such that the encoder network also includes a mixture encoder network, a set of speaker-differentiating encoder networks, and a recognition encoder network; and an output interface to transmit the text for each target speaker. - View Dependent Claims (19, 20)
-
-
21. A method using a speech recognition system to recognize separate speaker signals within an audio signal having overlapping speech by multiple speakers, comprising:
-
receiving an acoustic signal including a mixture of speech signals by multiple speakers via an input interface, wherein the multiple speakers include target speakers; inputting the received audio signal using a hardware processor into a pre-trained speech recognition network stored in a computer readable memory, such that the pre-trained speech recognition network is configured for transforming the received acoustic signal into a text for each target speaker using an encoder network and a decoder network of the pre-trained speech recognition network by, using the encoder network to output a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker; and transmitting the text for each target speaker using an output interface.
-
Specification