Multiple speech locale-specific hotword classifiers for selection of a speech locale

US 10,269,346 B2
Filed: 01/19/2017
Issued: 04/23/2019
Est. Priority Date: 02/05/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a mobile computing device that is configured to exit a low power mode upon detection of one of a set of predefined hotwords that are each associated with a respective language or dialect, audio data corresponding to a user speaking a particular, predefined hotword of the set;

in response to receiving the audio data corresponding to the user speaking the particular, predefined hotword,providing acoustic features of the audio data to multiple hotword classifiers, wherein each hotword classifier is (i) associated with a single language or single dialect of language and (ii) configured to classify acoustic features as either corresponding to, or not corresponding to, an utterance of a respective predefined term in the associated single language or single dialect of language without transcribing words corresponding to the acoustic features and without semantically interpreting the acoustic features; and

identifying a respective language or dialect associated with the particular, predefined hotword by determining one hotword classifier of the multiple hotword classifiers that classifies the particular, predefined hotword as corresponding to an utterance of a respective predefined term in the associated single language or single dialect of language of the hotword classifier; and

generating a transcription of subsequently received audio data by an automated speech recognizer that is configured for the identified respective language or dialect associated with the particular, predefined hotword.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech in an utterance. The methods, systems, and apparatus include actions of receiving an utterance and obtaining acoustic features from the utterance. Further actions include providing the acoustic features from the utterance to multiple speech locale-specific hotword classifiers. Each speech locale-specific hotword classifier (i) may be associated with a respective speech locale, and (ii) may be configured to classify audio features as corresponding to, or as not corresponding to, a respective predefined term. Additional actions may include selecting a speech locale for use in transcribing the utterance based on one or more results from the multiple speech locale-specific hotword classifiers in response to providing the acoustic features from the utterance to the multiple speech locale-specific hotword classifiers. Further actions may include selecting parameters for automated speech recognition based on the selected speech locale.

118 Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, by a mobile computing device that is configured to exit a low power mode upon detection of one of a set of predefined hotwords that are each associated with a respective language or dialect, audio data corresponding to a user speaking a particular, predefined hotword of the set;
  
  in response to receiving the audio data corresponding to the user speaking the particular, predefined hotword,providing acoustic features of the audio data to multiple hotword classifiers, wherein each hotword classifier is (i) associated with a single language or single dialect of language and (ii) configured to classify acoustic features as either corresponding to, or not corresponding to, an utterance of a respective predefined term in the associated single language or single dialect of language without transcribing words corresponding to the acoustic features and without semantically interpreting the acoustic features; and
  
  identifying a respective language or dialect associated with the particular, predefined hotword by determining one hotword classifier of the multiple hotword classifiers that classifies the particular, predefined hotword as corresponding to an utterance of a respective predefined term in the associated single language or single dialect of language of the hotword classifier; and
  
  generating a transcription of subsequently received audio data by an automated speech recognizer that is configured for the identified respective language or dialect associated with the particular, predefined hotword.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the mobile computing device is configured to detect the set of predefined hotwords using two or more language or dialect-specific hotword classifiers that are each associated with a different language or dialect, and that are each associated with the same particular, predefined hotword.
  - 3. The method of claim 1, wherein the mobile computing device is configured to detect the set of predefined hotwords using two or more language or dialect-specific hotword classifiers that are each associated with a different language or dialect, and that are associated with a different, locale-specific hotword.
  - 4. The method of claim 1, comprising selecting a subset of language or dialect-specific hotword classifiers to provide acoustic features of an the initial portion of the audio data from among a set of language or dialect-specific hotword classifiers, based on previous selections of languages or dialects used to transcribe previously received utterances.
  - 5. The method of claim 1, comprising identifying, based on confidence scores that are each output by a different locale-specific hotword detectors, the particular language or dialect that is associated with the particular, predefined hotword.
  - 6. The method of claim 1, comprising selecting a set of speech recognition parameters for use by the automated speech recognizer based at least on the particular language or dialect.
  - 7. The method of claim 1, comprising processing, at least partially in parallel, the audio data by multiple locale-specific hotword detectors.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a mobile computing device that is configured to exit a low power mode upon detection of one of a set of predefined hotwords that are each associated with a respective language or dialect, audio data corresponding to a user speaking a particular, predefined hotword of the set; and
  
  in response to receiving the audio data corresponding to the user speaking the particular, predefined hotword,providing acoustic features of the audio data to multiple hotword classifiers, wherein each hotword classifier is (i) associated with a single language or single dialect of language and (ii) configured to classify acoustic features as either corresponding to, or not corresponding to, an utterance of a respective predefined term in the associated single language or single dialect of language without transcribing words corresponding to the acoustic features and without semantically interpreting the acoustic features;
  
  identifying a respective language or dialect associated with the particular, predefined hotword by determining one hotword classifier of the multiple hotword classifiers that classifies the particular, predefined hotword as corresponding to an utterance of a respective predefined term in the associated single language or single dialect of language of the hotword classifier; and
  
  generating a transcription of subsequently received audio data by an automated speech recognizer that is configured for the identified respective language or dialect associated with the particular, predefined hotword.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the mobile computing device is configured to detect the set of predefined hotwords using two or more language or dialect-specific hotword classifiers that are each associated with a different language or dialect, and that are each associated with the same particular, predefined hotword.
  - 10. The system of claim 8, wherein the mobile computing device is configured to detect the set of predefined hotwords using two or more language or dialect-specific hotword classifiers that are each associated with a different language or dialect, and that are associated with a different, locale-specific hotword.
  - 11. The system of claim 8, wherein the operations comprise selecting a subset of language or dialect-specific hotword classifiers to provide acoustic features of an the initial portion of the audio data from among a set of language or dialect-specific hotword classifiers, based on previous selections of languages or dialects used to transcribe previously received utterances.
  - 12. The system of claim 8, wherein the operations comprise identifying, based on confidence scores that are each output by a different locale-specific hotword detectors, the particular language or dialect that is associated with the particular, predefined hotword.
  - 13. The system of claim 8, wherein the operations comprise selecting a set of speech recognition parameters for use by the automated speech recognizer based at least on the particular language or dialect.
  - 14. The system of claim 8, wherein the operations comprise processing, at least partially in parallel, the audio data by multiple locale-specific hotword detectors.

15. A non-transitory computer-readable medium storing instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a mobile computing device that is configured to exit a low power mode upon detection of one of a set of predefined hotwords that are each associated with a respective language or dialect, audio data corresponding to a user speaking a particular, predefined hotword of the set;
  
  in response to receiving the audio data corresponding to the user speaking the particular, predefined hotword,providing acoustic features of the audio data to multiple hotword classifiers, wherein each hotword classifier is (i) associated with a single language or single dialect of language and (ii) configured to classify acoustic features as either corresponding to, or not corresponding to, an utterance of a respective predefined term in the associated single language or single dialect of language without transcribing words corresponding to the acoustic features and without semantically interpreting the acoustic features; and
  
  identifying a respective language or dialect associated with the particular, predefined hotword by determining one hotword classifier of the multiple hotword classifiers that classifies the particular, predefined hotword as corresponding to an utterance of a respective predefined term in the associated single language or single dialect of language of the hotword classifier; and
  
  generating a transcription of subsequently received audio data by an automated speech recognizer that is configured for the identified respective language or dialect associated with the particular, predefined hotword.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The medium of claim 15, wherein the mobile computing device is configured to detect the set of predefined hotwords using two or more language or dialect-specific hotword classifiers that are each associated with a different speech locale, and that are each associated with the same particular, predefined hotword.
  - 17. The medium of claim 15, wherein the mobile computing device is configured to detect the set of predefined hotwords using two or more speech locale-specific hotword classifiers that are each associated with a different speech locale, and that are associated with a different, locale-specific hotword.
  - 18. The medium of claim 15, wherein the operations comprise selecting a subset of language or dialect-specific hotword classifiers to provide acoustic features of an the initial portion of the audio data from among a set of language or dialect-specific hotword classifiers, based on previous selections of languages or dialects used to transcribe previously received utterances.
  - 19. The medium of claim 15, wherein the operations comprise identifying, based on confidence scores that are each output by a different locale-specific hotword detectors, the particular language or dialect that is associated with the particular, predefined hotword.
  - 20. The medium of claim 15, comprising selecting a set of speech recognition parameters for use by the automated speech recognizer based at least on the particular language or dialect.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Sharifi, Matthew
Primary Examiner(s)
Sirjani, Fariba

Application Number

US15/410,732
Publication Number

US 20170140756A1
Time in Patent Office

824 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/02   Feature extraction for spee...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

Multiple speech locale-specific hotword classifiers for selection of a speech locale

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

118 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Multiple speech locale-specific hotword classifiers for selection of a speech locale

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

118 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others