Disambiguation of a spoken query term

US 10,210,267 B1
Filed: 08/10/2016
Issued: 02/19/2019
Est. Priority Date: 07/28/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for speech recognition comprising:

receiving, by a computing device, audio data that corresponds to a spoken utterance of a user;

generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user;

selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription;

determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and

based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing spoken query terms. In one aspect, a method includes performing speech recognition on an audio signal to select two or more textual, candidate transcriptions that match a spoken query term, and to establish a speech recognition confidence value for each candidate transcription, obtaining a search history for a user who spoke the spoken query term, where the search history references one or more past search queries that have been submitted by the user, generating one or more n-grams from each candidate transcription, where each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription, and determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency.

24 Citations

View as Search Results

24 Claims

1. A computer-implemented method for speech recognition comprising:
- receiving, by a computing device, audio data that corresponds to a spoken utterance of a user;
  
  generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user;
  
  selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription;
  
  determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and
  
  based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, comprising:
    - in response to receiving the audio data that corresponds to the spoken utterance, determining context data that indicates a context of the computing device,wherein selecting the particular candidate transcription comprises selecting the particular candidate transcription based on the context data.
  - 3. The method of claim 2, wherein the context data comprises:
    - a time and date when the computing device received the audio data that corresponds to the spoken utterance;
      
      data that specifies a type of the computing device;
      
      data that specifies a type of audio subsystem of the computing device;
      
      data that specifies whether the computing device is plugged in;
      
      data that specifies a location of the computing device where the computing device received the audio data that corresponds to the spoken utterance; and
      
      audio data that corresponds to ambient noise near the computing device.
  - 4. The method of claim 2, wherein the context data comprises a time and date when the computing device received the audio data that corresponds to the spoken utterance.
  - 5. The method of claim 2, wherein the context data comprises data that specifies a type of the computing device.
  - 6. The method of claim 2, wherein the context data comprises data that specifies a type of audio subsystem of the computing device.
  - 7. The method of claim 2, wherein the context data comprises data that specifies whether the computing device is plugged in.
  - 8. The method of claim 2, wherein the context data comprises data that specifies a location of the computing device where the computing device received the audio data that corresponds to the spoken utterance.
  - 9. The method of claim 2, wherein the context data comprises audio data that corresponds to ambient noise near the computing device.
  - 10. The method of claim 1, comprising:
    - selecting, from among the multiple candidate transcriptions of the spoken utterance, an additional candidate transcription;
      
      determining that the additional candidate transcription includes an additional term that appears, less than the predetermined number of times, in the transcriptions of utterances previously spoken by the user; and
      
      based on determining that the additional candidate transcription includes the additional term that appears, less than the predetermined number of times, in the transcriptions of utterances previously spoken by the user, providing, for display on the computing device, the additional transcription.
  - 11. The method of claim 1, comprising:
    - selecting, from among the multiple candidate transcriptions of the spoken utterance, an additional transcription;
      
      determining that the additional transcription includes the term that appears, more than the predetermined number of times, in the transcriptions of utterances previously spoken by the user; and
      
      based on determining that the additional transcription includes the term that appears, more than the predetermined number of times, in the transcriptions of utterances previously spoken by the user, providing, for display on the computing device, the additional transcription.
  - 12. The method of claim 1, wherein providing, for display on the computing device, the particular candidate transcription comprises:
    - providing, for display on the computing device, search engine results based on the particular candidate transcription.
  - 13. The method of claim 1, wherein the spoken utterance is a search query.
  - 14. The method of claim 1, comprising:
    - determining, for each of the multiple candidate transcriptions of the spoken utterance, a confidence score that reflects a likelihood that the candidate transcription is an acoustic match for the spoken utterance,wherein selecting the particular transcription is based on the confidence scores of the multiple candidate transcriptions of the spoken utterance.
  - 15. The method of claim 14, comprising:
    - determining a frequency that the term appears in each of the utterances previously spoken by the user before speaking the spoken utterance; and
      
      based on the frequency that the term appears in each of the utterances previously spoken by the user before speaking the spoken utterance, weighting, for each of the multiple candidate transcriptions, the confidence score,wherein providing the particular candidate transcription as the transcription of the spoken utterance is further based on the weighted confidence scores.
  - 16. The method of claim 1, comprising:
    - determining, for each of the multiple candidate transcriptions of the spoken utterance and using an acoustic model, a confidence score that reflects a likelihood that the candidate transcription is an acoustic match for the spoken utterance; and
      
      bypassing providing, for output as a transcription of the spoken utterance, a candidate transcription that has a highest confidence score,wherein the computing device selects, from among the multiple candidate transcriptions of the spoken utterance, the particular candidate transcription after bypassing providing, for output as a transcription of the spoken utterance, the candidate transcription that has the highest confidence score.
  - 17. The method of claim 16, wherein bypassing providing, for output as a transcription of the spoken utterance, a candidate transcription that has a highest confidence score comprises:
    - bypassing providing, for visual output as the transcription of the spoken utterance on a display of the computing device, the candidate transcription that has the highest confidence score,wherein the computing device selects, from among the multiple candidate transcriptions of the spoken utterance, the particular candidate transcription after bypassing providing, for visual output as a transcription of the spoken utterance, the candidate transcription that has the highest confidence score.
  - 18. The method of claim 16, wherein bypassing providing, for output as a transcription of the spoken utterance, a candidate transcription that has a highest confidence score comprises:
    - bypassing providing, for audible output as the transcription of the spoken utterance through a speaker of the computing device, the candidate transcription that has the highest confidence score,wherein the computing device selects, from among the multiple candidate transcriptions of the spoken utterance, the particular candidate transcription after bypassing providing, for audible output as a transcription of the spoken utterance, the candidate transcription that has the highest confidence score.
  - 19. The method of claim 1, comprising:
    - providing, by the computing device and to a server, a request for a number of times that the user has previously spoken each term, wherein the server includes a data storage device that stores transcriptions of utterances previously spoken by the user; and
      
      receiving, by the computing device and from the server, data identifying the number of times that the user has previously spoken each term,wherein determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance is based on the data identifying the number of times that the user has previously spoken each term.

20. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a computing device, audio data that corresponds to a spoken utterance of a user;
  
  generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user;
  
  selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription;
  
  determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and
  
  based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance.
- View Dependent Claims (21, 22, 23)
- - 21. The system of claim 20, wherein the operations further comprise:
    - in response to receiving the audio data that corresponds to the spoken utterance, determining context data that indicates a context of the computing device,wherein selecting the particular candidate transcription comprises selecting the particular candidate transcription based on the context data.
  - 22. The system of claim 21, wherein the context data comprises:
    - a time and date when the computing device received the audio data that corresponds to the spoken utterance;
      
      data that specifies a type of the computing device;
      
      data that specifies a type of audio subsystem of the computing device;
      
      data that specifies whether the computing device is plugged in;
      
      data that specifies a location of the computing device where the computing device received the audio data that corresponds to the spoken utterance; and
      
      audio data that corresponds to ambient noise near the computing device.
  - 23. The system of claim 20, wherein the operations further comprise:
    - selecting, from among the multiple candidate transcriptions of the spoken utterance, an additional candidate transcription;
      
      determining that the additional candidate transcription includes an additional term that appears, less than the predetermined number of times, in the transcriptions of utterances previously spoken by the user; and
      
      based on determining that the additional candidate transcription includes the additional term that appears, less than the predetermined number of times, in the transcriptions of utterances previously spoken by the user, providing, for display on the computing device, the additional transcription.

24. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving, by a computing device, audio data that corresponds to a spoken utterance of a user;
  
  generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user;
  
  selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription;
  
  determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and
  
  based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Schalkwyk, Johan, Risbood, Pankaj
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US15/233,141
Time in Patent Office

923 Days
Field of Search

704246, 704247, 704251, 704252
US Class Current
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 16/90332   Natural language query form...

G06F 16/9535   Search customisation based ...

G10L 15/01   Assessment or evaluation of...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

Disambiguation of a spoken query term

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Disambiguation of a spoken query term

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others