Enhanced speech endpointing

US 10,339,917 B2
Filed: 09/03/2015
Issued: 07/02/2019
Est. Priority Date: 09/03/2015
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method, comprising:

storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context;

after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device;

while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device;

accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier;

before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier;

based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and

based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking,setting the end of speech condition and providing, for output by the computing device the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.

119 Citations

11 Claims

1. A computer implemented method, comprising:
- storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context;
  
  after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device;
  
  while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device;
  
  accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier;
  
  before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier;
  
  based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and
  
  based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking,setting the end of speech condition and providing, for output by the computing device the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data.
- View Dependent Claims (2, 3, 9, 10, 11)
- - 2. The method of claim 1, wherein the one or more expected speech recognition results associated with the context identifier are stored on the client device.
  - 3. The method of claim 1, wherein setting the end of speech condition comprises turning off an audio input device into which the utterance was made.
  - 9. The method of claim 1, wherein:
    - the context associated with (i) the client device or (ii) the user of the client device comprises a query prompting the user for a name, andthe one or more expected speech recognition results associated with the context identifier comprises names in a contact list stored on the client device.
  - 10. The method of claim 1, wherein:
    - the context associated with (i) the client device or (ii) the user of the client device comprises a query prompting the user for a filename, andthe one or more expected speech recognition results associated with the context identifier comprises filenames of files accessible by the client device.
  - 11. The method of claim 1, wherein the one or more expected speech recognition results associated with the context identifier are displayed on the client device.

4. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
- storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context;
  
  after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device;
  
  while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device;
  
  accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier;
  
  before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier;
  
  based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and
  
  based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking,setting the end of speech condition and providing, for output by the computing device, the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data.
- View Dependent Claims (5, 6)
- - 5. The system of claim 4, wherein the one or more expected speech recognition results associated with the context identifier are stored on the client device.
  - 6. The system of claim 4, wherein setting the end of speech condition comprises turning off an audio input device into which the utterance was made.

7. A computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context,after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device;
  
  while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device;
  
  accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier;
  
  before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier;
  
  based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and
  
  based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer match matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking,setting the end of speech condition and providing, for output by the computing device, the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data.
- View Dependent Claims (8)
- - 8. The device of claim 7, wherein the one or more expected speech recognition results associated with the context identifier are stored on the client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Aleksic, Petar, Shires, Glen, Buchanan, Michael
Primary Examiner(s)
Mishra, Richa

Application Number

US14/844,563
Publication Number

US 20170069308A1
Time in Patent Office

1,398 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/279   Recognition of textual enti...

G10L 15/04   Segmentation; Word boundary...

G10L 15/18   using natural language mode...

G10L 2015/228   of application context

Enhanced speech endpointing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

119 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Enhanced speech endpointing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

119 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links