Methods and systems for detecting and processing speech signals

US 9,779,735 B2
Filed: 02/24/2016
Issued: 10/03/2017
Est. Priority Date: 02/24/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a first computing device, audio data that corresponds to an utterance;

processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword;

based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword;

receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword;

receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword;

comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;

based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;

determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and

selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and

providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are methods, systems, and apparatuses for detecting, processing, and responding to audio signals, including speech signals, within a designated area or space. A platform for multiple media devices connected via a network is configured to process speech, such as voice commands, detected at the media devices, and respond to the detected speech by causing the media devices to simultaneously perform one or more requested actions. The platform is capable of scoring the quality of a speech request, handling speech requests from multiple end points of the platform using a centralized processing approach, a de-centralized processing approach, or a combination thereof, and also manipulating partial processing of speech requests from multiple end points into a coherent whole when necessary.

170 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- receiving, by a first computing device, audio data that corresponds to an utterance;
  
  processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword;
  
  based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword;
  
  receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword;
  
  receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword;
  
  comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
  
  based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
  
  determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and
  
  selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and
  
  providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, comprising:
    - activating a microphone of the first computing device in response to determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance.
  - 3. The computer-implemented method of claim 1, wherein:
    - the selected one or more computing devices comprises the second computing device, andthe method comprises;
      
      providing an instruction to the second computing device to activate a microphone of the second computing device; and
      
      providing an instruction to the third computing device to deactivate a microphone of the third computing device.
  - 4. The computer-implemented method of claim 1, wherein:
    - comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score comprises;
      
      determining that the first hotword confidence score and the second hotword confidence score satisfy a hotword score threshold,determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance based on based on determining that the first hotword confidence score satisfies the hotword score threshold, andselecting, from among the second computing device and the third computing device, the one or more computing devices to process the additional audio data that corresponds to the subsequent utterance comprises;
      
      selecting the second computing device to process the additional audio based on determining that the second hotword confidence score satisfies the hotword score threshold.
  - 5. The computer-implemented method of claim 1, wherein:
    - comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score comprises;
      
      determining that the first hotword confidence score and the second hotword confidence score are greater than the third hotword confidence score,determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance based on based on determining that the first hotword confidence score and the second hotword confidence score are greater than the third hotword confidence score, andselecting, from among the second computing device and the third computing device, the one or more computing devices to process the additional audio data that corresponds to the subsequent utterance comprises;
      
      selecting the second computing device to process the additional audio based on determining that the first hotword confidence score and the second hotword confidence score are greater than the third hotword confidence score.
  - 6. The computer-implemented method of claim 1, wherein:
    - the selected one or more computing devices comprises the second computing device, andthe method comprises;
      
      receiving, by the first computing device, first additional audio data that corresponds to the subsequent utterance;
      
      receiving, from the second computing device, second additional audio data that corresponds to the subsequent utterance; and
      
      providing, to a voice search server, the received first additional audio data and the received second additional audio data.
  - 7. The computer-implemented method of claim 6, further comprising:
    - in response to providing the received first additional audio data and the received second additional audio data, receiving, from the voice search server, an action;
      
      providing, to the second computing device and to the third computing device, the action and an instruction to perform the action; and
      
      performing, by the first computing device, the action.
  - 8. The computer-implemented method of claim 1, wherein:
    - the first computing device generates the first hotword confidence score using a first localizer of a first beamformer to obtain a first angle of a user relative to the first computing device,the second computing device generates the second hotword confidence score using a second localizer of a second beamformer to obtain a second angle of the user relative to the second computing device, andthe third computing device generates the third hotword confidence score using a third localizer of a third beamformer to obtain a third angle of the user relative to the third computing device.

9. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a first computing device, audio data that corresponds to an utterance;
  
  processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword;
  
  based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword;
  
  receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword;
  
  receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword;
  
  comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
  
  based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
  
  determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and
  
  selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and
  
  providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the operations further comprise:
    - activating a microphone of the first computing device in response to determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance.
  - 11. The system of claim 9, wherein:
    - the selected one or more computing devices comprises the second computing device, andthe operations further comprise;
      
      providing an instruction to the second computing device to activate a microphone of the second computing device; and
      
      providing an instruction to the third computing device to deactivate a microphone of the third computing device.
  - 12. The system of claim 9, wherein:
    - comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score comprises;
      
      determining that the first hotword confidence score and the second hotword confidence score satisfy a hotword score threshold,determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance based on based on determining that the first hotword confidence score satisfies the hotword score threshold, andselecting, from among the second computing device and the third computing device, the one or more computing devices to process the additional audio data that corresponds to the subsequent utterance comprises;
      
      selecting the second computing device to process the additional audio based on determining that the second hotword confidence score satisfies the hotword score threshold.
  - 13. The system of claim 9, wherein:
    - comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score comprises;
      
      determining that the first hotword confidence score and the second hotword confidence score are greater than the third hotword confidence score,determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance based on based on determining that the first hotword confidence score and the second hotword confidence score are greater than the third hotword confidence score, andselecting, from among the second computing device and the third computing device, the one or more computing devices to process the additional audio data that corresponds to the subsequent utterance comprises;
      
      selecting the second computing device to process the additional audio based on determining that the first hotword confidence score and the second hotword confidence score are greater than the third hotword confidence score.
  - 14. The system of claim 9, wherein:
    - the selected one or more computing devices comprises the second computing device, andthe operations further comprise;
      
      receiving, by the first computing device, first additional audio data that corresponds to the subsequent utterance;
      
      receiving, from the second computing device, second additional audio data that corresponds to the subsequent utterance; and
      
      providing, to a voice search server, the received first additional audio data and the received second additional audio data.
  - 15. The system of claim 14, wherein the operations further comprise:
    - in response to providing the received first additional audio data and the received second additional audio data, receiving, from the voice search server, an action;
      
      providing, to the second computing device and to the third computing device, the action and an instruction to perform the action; and
      
      performing, by the first computing device, the action.
  - 16. The system of claim 9, wherein:
    - the first computing device generates the first hotword confidence score using a first localizer of a first beamformer to obtain a first angle of a user relative to the first computing device,the second computing device generates the second hotword confidence score using a second localizer of a second beamformer to obtain a second angle of the user relative to the second computing device, andthe third computing device generates the third hotword confidence score using a third localizer of a third beamformer to obtain a third angle of the user relative to the third computing device.

17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a first computing device, audio data that corresponds to an utterance;
  
  processing the audio data using a hotword data module that is configured to detect a particular, predefined hotword;
  
  based on processing the audio data using the hotword data module, generating a first hotword confidence score that reflects a likelihood that the audio data received by the first computing device includes the particular, predefined hotword;
  
  receiving, from a second computing device, a second hotword confidence score that reflects a likelihood that the audio data received by the second computing device includes the particular, predefined hotword;
  
  receiving, from a third computing device, a third hotword confidence score that reflects a likelihood that the audio data received by the third computing device includes the particular, predefined hotword;
  
  comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
  
  based on comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score;
  
  determining, by the first computing device, to process additional audio that corresponds to a subsequent utterance; and
  
  selecting, from among the second computing device and the third computing device, one or more computing devices to process the additional audio data that corresponds to the subsequent utterance; and
  
  providing, to the selected one or more computing devices, an instruction to process the additional audio data that corresponds to the subsequent utterance.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable medium of claim 17, wherein:
    - the selected one or more computing devices comprises the second computing device, andthe operations further comprise;
      
      providing an instruction to the second computing device to activate a microphone of the second computing device; and
      
      providing an instruction to the third computing device to deactivate a microphone of the third computing device.
  - 19. The computer-readable medium of claim 17, wherein:
    - comparing the first hotword confidence score, the second hotword confidence score, and the third hotword confidence score comprises;
      
      determining that the first hotword confidence score and the second hotword confidence score satisfy a hotword score threshold,determining, by the first computing device, to process the additional audio that corresponds to the subsequent utterance based on based on determining that the first hotword confidence score satisfies the hotword score threshold, andselecting, from among the second computing device and the third computing device, the one or more computing devices to process the additional audio data that corresponds to the subsequent utterance comprises;
      
      selecting the second computing device to process the additional audio based on determining that the second hotword confidence score satisfies the hotword score threshold.
  - 20. The computer-readable medium of claim 17, wherein:
    - the selected one or more computing devices comprises the second computing device, andthe operations further comprise;
      
      receiving, by the first computing device, first additional audio data that corresponds to the subsequent utterance;
      
      receiving, from the second computing device, second additional audio data that corresponds to the subsequent utterance; and
      
      providing, to a voice search server, the received first additional audio data and the received second additional audio data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Civelli, Jay Pierre, Shemer, Mikhal, Shabestary, Turaj Zakizadeh, Tapuska, David
Primary Examiner(s)
Chawan, Vijay B

Application Number

US15/052,426
Publication Number

US 20170243586A1
Time in Patent Office

587 Days
Field of Search

704270, 7042701, 704271, 704272, 704273, 704274, 704275, 704500, 345 23
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

Methods and systems for detecting and processing speech signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

170 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for detecting and processing speech signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

170 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links