×

LOW-LATENCY MULTI-SPEAKER SPEECH RECOGNITION

  • US 20200135209A1
  • Filed: 08/07/2019
  • Published: 04/30/2020
  • Est. Priority Date: 10/26/2018
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs including instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:

  • receive mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources, wherein the utterances of the target speaker and the utterances of the one or more interfering audio sources at least partially overlap;

    obtain a target speaker representation representing speech characteristics of the target speaker, wherein the target speaker representation is generated by a first learning network pre-trained for speaker verification;

    determine, using a second learning network, probability distributions of phonetic elements directly from the mixed speech data, wherein inputs of the second learning network include the mixed speech data and the target speaker representation, wherein an output of the learning network includes the probability distributions of phonetic elements, and wherein the first learning network and the second learning network are different learning networks;

    generate text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and

    provide a response based on the text corresponding to the utterances of the target speaker.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×