Collaborative voice controlled devices

US 10,559,309 B2
Filed: 12/22/2016
Issued: 02/11/2020
Est. Priority Date: 12/22/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword;

receiving, by the first computing device, audio data of an utterance spoken by a user;

determining, by the first computing device, that the utterance includes the particular, predefined hotword;

receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device;

based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and

providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for collaboration between multiple voice controlled devices are disclosed. In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving audio data that corresponds to an utterance; receiving a transcription of additional audio data outputted by the second computing device in response to the utterance; based on the transcription of the additional audio data and based on the utterance, generating a transcription that corresponds to a response to the additional audio data; and providing, for output, the transcription that corresponds to the response.

Citations

19 Claims

1. A computer-implemented method, comprising:
- identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword;
  
  receiving, by the first computing device, audio data of an utterance spoken by a user;
  
  determining, by the first computing device, that the utterance includes the particular, predefined hotword;
  
  receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device;
  
  based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and
  
  providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein providing, for output, audio data of the subsequent response or a transcription of the subsequent response comprises:
    - providing, to a speech synthesizer of the first computing device, an initial portion of the transcription of the subsequent response; and
      
      providing, to the second computing device, (i) a remaining portion of the transcription of the subsequent response and (ii) instructions to audibly output synthesized speech of the remaining portion of the transcription of the subsequent response.
  - 3. The method of claim 1, comprising:
    - after providing, for output, the transcription that corresponds to the response, receiving, by the first computing device, audio data of a second utterance;
      
      based on the transcription of the initial response, based on the utterance, and based on the second utterance, generating, by the first computing device, an additional transcription of an additional response to the second utterance; and
      
      providing, for output by the first computing device, the additional transcription.
  - 4. The method of claim 1, wherein providing, for output, audio data of the subsequent response or a transcription of the subsequent response comprises:
    - providing the transcription to a display of the first computing device.
  - 5. The method of claim 1, wherein generating a subsequent response to the initial response comprises:
    - determining user information that is associated with a first user of the first computing device or with a second user of the second computing device; and
      
      wherein generating the subsequent response to the initial response is based on the user information.
  - 6. The method of claim 1, wherein generating subsequent response to the initial response comprises:
    - accessing data that is associated with the initial response; and
      
      wherein generating the subsequent response to the initial response is based on the accessed data.
  - 7. The method of claim 1, comprising:
    - determining, by the first computing device, a location of the first computing device,wherein generating the subsequent response to the initial response is based on the location of the first computing device.
  - 8. The method of claim 1, comprising:
    - generating, by the first computing device, a first audio fingerprint of the audio data of the utterance;
      
      receiving, from the second computing device, a second audio fingerprint of the audio data of the utterance;
      
      comparing, by the first computing device, the first audio fingerprint to the second audio fingerprint; and
      
      based on comparing the first audio fingerprint to the second audio fingerprint, determining, by the first computing device, that the audio data received by the first computing device represents the same utterance as the audio data received by the second computing device.
  - 9. The method of claim 1, wherein the first computing device and the second computing device are configured to communicate with each other using short range radio.
  - 10. The method of claim 1, wherein the first computing device and the second computing device are co-located.
  - 11. The method of claim 1, wherein receiving the transcription of the initial response to the utterance provided by the second computing device comprises:
    - receiving, from the second computing device, the transcription of the initial response to the utterance provided by the second computing device.
  - 12. The method of claim 1, wherein providing, for output, audio data of the subsequent response or a transcription of the subsequent response comprises:
    - providing the transcription the subsequent response to a speech synthesizer.
  - 13. The method of claim 12, wherein synthesized speech of a transcription of the subsequent response is received by a third computing device that is configured to generate a response based on the synthesized speech of the transcription of the subsequent response, the transcription of the initial response, and the utterance.
  - 14. The method of claim 1, comprising:
    - determining that the utterance includes the particular, predefined hotword by;
      
      determining that the utterance includes the particular, predefined hotword without preforming speech recognition on the audio data; and
      
      receiving data indicating that the second computing device will output the initial response to the utterance.
  - 15. The method of claim 14, comprising:
    - receiving data indicating that the second computing device will output the initial response to the utterance by;
      
      receiving, from the second computing device, a short range radio signal that indicates that the second computing device will output the initial response to the utterance;
      
      receiving, from the second computing device and through a local network, data indicating that the second computing device will output the initial response to the utterance;
      
      orreceiving, from a server, data indicating that the second computing device will output the initial response to the utterance.
  - 16. The method of claim 14, wherein determining that the utterance includes the particular, predefined hotword without preforming speech recognition on the audio data comprises:
    - extracting audio features of the audio data of the utterance;
      
      generating a hotword confidence score by processing the audio features;
      
      determining that the hotword confidence score satisfies a hotword confidence threshold; and
      
      based on determining that the hotword confidence score satisfies a hotword confidence threshold, determining that the utterance includes the particular, predefined hotword.
  - 17. The method of claim 15, comprising:
    - in response to receiving data indicating that the second computing device will output the initial response to the utterance, providing, by the first computing device and to the second computing device or to a server, the audio data of the utterance.

18. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword;
  
  receiving, by the first computing device, audio data of an utterance spoken by a user;
  
  determining, by the first computing device, that the utterance includes the particular, predefined hotword;
  
  receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device;
  
  based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and
  
  providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.

19. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword;
  
  receiving, by the first computing device, audio data of an utterance spoken by a user;
  
  determining, by the first computing device, that the utterance includes the particular, predefined hotword;
  
  receiving, by the first computing device, a transcription of an initial response to the utterance provided by the second computing device, wherein the initial response comprises synthesized speech audibly outputted by the second computing device;
  
  based on the transcription of the initial response and based on the utterance, generating, by the first computing device, a subsequent response to the initial response; and
  
  providing, for output by the first computing device, audio data of the subsequent response or a transcription of the subsequent response.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Carbune, Victor, Gonnet Anders, Pedro, Deselaers, Thomas, Feuz, Sandro
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US15/387,884
Publication Number

US 20180182397A1
Time in Patent Office

1,146 Days
Field of Search
US Class Current
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/08   Text analysis or generation...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

H04W 4/80   Services using short range ...

Collaborative voice controlled devices

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Collaborative voice controlled devices

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links