×

Improving speaker verification across locations, languages, and/or dialects

  • US 10,403,291 B2
  • Filed: 06/01/2018
  • Issued: 09/03/2019
  • Est. Priority Date: 07/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a user device comprising one or more processors and one or more storage devices storing instructions that are operable, when executed by the one or more processors, to cause the user device to perform operations comprising;

    receiving, by the user device, audio data representing an utterance, spoken by a user, of a predetermined word or phrase designated as a hotword for a language or location associated with the user, wherein the user device is configured to perform an action or change a state of the user device in response to detecting an utterance of the hotword;

    providing, as input to a neural network stored on the user device, a set of input data derived from the audio data and a language identifier or location identifier associated with the user device, the neural network having parameters trained using speech data representing speech in different languages or different dialects, wherein the parameters of the neural network have been trained using training examples including (i) training utterances of words or phrases designated as hotwords for different languages or locations, and (ii) language identifiers or location identifiers for languages or locations of the respective speakers of the training utterances;

    generating, based on output of the neural network produced in response to receiving the set of input data, a first speaker representation indicative of characteristics of the voice of the user;

    determining, based on the first speaker representation and a second speaker representation, that the utterance is an utterance, spoken by the user, of the predetermined word or phrase designated as a hotword for the language or location associated with the user, wherein the second speaker representation is derived from a previous utterance, spoken by the user, of the predetermined word or phrase designated as a hotword for the language or location associated with the user; and

    providing the user access to the user device based on determining that the utterance is an utterance, spoken by the user, of the predetermined word or phrase designated as a hotword for the language or location associated with the user.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×