Disambiguation of a spoken query term

US 8,521,526 B1
Filed: 07/28/2010
Issued: 08/27/2013
Est. Priority Date: 07/28/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A system comprising:

one or more computers; and

a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;

receiving an audio signal that corresponds to a spoken query term;

performing speech recognition on the audio signal to select two or more textual, candidate transcriptions that match the spoken query term, and to establish a speech recognition confidence value for each candidate transcription;

obtaining a search history for a user who spoke the spoken query term, wherein the search history references one or more past search queries that have been submitted by the user;

generating one or more n-grams from each candidate transcription, wherein each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription;

determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency;

generating, for each of the candidate transcriptions, a combined value based on combining the speech recognition confidence value for the candidate transcription with the weighting value for one or more of the n-grams that are generated from the candidate transcription;

selecting an intended query term from among the candidate transcriptions based on the combined values; and

causing a search engine to perform a search query that includes the intended query term.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing spoken query terms. In one aspect, a method includes performing speech recognition on an audio signal to select two or more textual, candidate transcriptions that match a spoken query term, and to establish a speech recognition confidence value for each candidate transcription, obtaining a search history for a user who spoke the spoken query term, where the search history references one or more past search queries that have been submitted by the user, generating one or more n-grams from each candidate transcription, where each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription, and determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency.

145 Citations

20 Claims

1. A system comprising:
- one or more computers; and
  
  a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving an audio signal that corresponds to a spoken query term;
  
  performing speech recognition on the audio signal to select two or more textual, candidate transcriptions that match the spoken query term, and to establish a speech recognition confidence value for each candidate transcription;
  
  obtaining a search history for a user who spoke the spoken query term, wherein the search history references one or more past search queries that have been submitted by the user;
  
  generating one or more n-grams from each candidate transcription, wherein each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription;
  
  determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency;
  
  generating, for each of the candidate transcriptions, a combined value based on combining the speech recognition confidence value for the candidate transcription with the weighting value for one or more of the n-grams that are generated from the candidate transcription;
  
  selecting an intended query term from among the candidate transcriptions based on the combined values; and
  
  causing a search engine to perform a search query that includes the intended query term.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1, wherein the operations further comprise:
    - determining a context associated with the spoken query term; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term,wherein determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries further comprises determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries that are members of the subset only.
  - 3. The system of claim 2, wherein:
    - determining a context associated with the spoken query term further comprises determining a date that the spoken query term was spoken by the user; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user on the same date, that were submitted within a predetermined time period before the date, that were submitted on a same day of the week as the date, that were submitted on the same day, week, month or year as the date, that were submitted on a weekend when the date occurs on a weekend, or that were submitted on a weekday when the date occurs on a weekday.
  - 4. The system of claim 2, wherein:
    - determining a context associated with the spoken query term further comprises determining a time that the spoken query term was spoken by the user; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user within a predetermined amount of period before the time, that were submitted on a same time of day as the time;
      
      that were submitted on a same minute or hour as the time, that were submitted on during daytime when the time occurs during daytime, or that were submitted during nighttime when the time occurs during nighttime.
  - 5. The system of claim 2, wherein:
    - determining a context associated with the spoken query term further comprises determining that a device used by the user to enter the spoken query term was docked or holstered when the user spoke the spoken query terms; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user when a device used by the user to enter the past search queries was docked or holstered when the user entered the past search queries.
  - 6. The system of claim 2, wherein:
    - determining a context associated with the spoken query term further comprise determining a location of the user when the spoken query term was spoken by the user; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user when the user was within a predetermined distance from the location, or when the user was in a same geographic region as the location.
  - 7. The system of claim 2, wherein:
    - determining a context associated with the spoken query term further comprises determining information that specifies a device or a type of device that the user used to enter the spoken query term; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user using the same device or the same type of device that the user used to enter the spoken query term.
  - 8. The system of claim 2, wherein:
    - the operations further comprise selecting a subset of the past search queries that were submitted by the user as voice search queries and that generated one or more search results that were selected by the user; and
      
      determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries further comprises determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries that are members of the subset only.
  - 9. The system of claim 2, wherein:
    - the operations further comprise selecting a subset of the past search queries that were selected by the user from a list of m-best search queries; and
      
      determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries further comprises determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries that are members of the subset only.
  - 10. The system of claim 1, wherein:
    - the operations further comprise selecting a subset of the past search queries that include query terms that were entered by the user using a keyboard; and
      
      determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries further comprises determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries that are members of the subset only.
  - 11. The system of claim 1, wherein:
    - the operations further comprise determining one or more categories that are associated with each of the past search queries; and
      
      for each n-gram, determining a frequency with which the n-gram occurs in the past search queries further comprises;
      
      determining a particular category that is associated with the n-gram; and
      
      determining a quantity of the past search queries that are associated the particular category.
  - 12. The system of claim 1, wherein:
    - the operations further comprise selecting the candidate transcriptions that have an m highest combined values and providing information to the user that references the selected candidate transcriptions that have the m highest combined values; and
      
      selecting the intended query term from among the candidate transcriptions based on the combined values further comprises receiving an indication of one of the selected candidate transcriptions that have the m highest combined values that has been selected by the user.
  - 13. The system of claim 1, wherein selecting an intended query term from among the candidate transcriptions based on the combined values further comprises automatically selecting, as the intended query term, the candidate transcription that has a highest combined value from among the candidate transcriptions.

14. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by data processing apparatus cause the data processing apparatus to perform operations comprising:
- receiving an audio signal that corresponds to a spoken query term;
  
  performing speech recognition on the audio signal to select two or more textual, candidate transcriptions that match the spoken query term, and to establish a speech recognition confidence value for each candidate transcription;
  
  obtaining a search history for a user who spoke the spoken query term, wherein the search history references one or more past search queries that have been submitted by the user;
  
  generating one or more n-grams from each candidate transcription, wherein each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription;
  
  determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency;
  
  generating, for each of the candidate transcriptions, a combined value based on combining the speech recognition confidence value for the candidate transcription with the weighting value for one or more of the n-grams that are generated from the candidate transcription;
  
  selecting an intended query term from among the candidate transcriptions based on the combined values; and
  
  causing a search engine to perform a search query that includes the intended query term.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The non-transitory computer storage medium of claim 14, wherein the operations further comprise:
    - determining a context associated with the spoken query term; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term,wherein determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries further comprises determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries that are members of the subset only.
  - 16. The non-transitory computer storage medium of claim 15, wherein:
    - determining a context associated with the spoken query term further comprises determining a date that the spoken query term was spoken by the user; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user on the same date, that were submitted within a predetermined time period before the date, that were submitted on a same day of the week as the date, that were submitted on the same day, week, month or year as the date, that were submitted on a weekend when the date occurs on a weekend, or that were submitted on a weekday when the date occurs on a weekday.
  - 17. The non-transitory computer storage medium of claim 15, wherein:
    - determining a context associated with the spoken query term further comprises determining a time that the spoken query term was spoken by the user; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user within a predetermined amount of period before the time, that were submitted on a same time of day as the time;
      
      that were submitted on a same minute or hour as the time, that were submitted on during daytime when the time occurs during daytime, or that were submitted during nighttime when the time occurs during nighttime.
  - 18. The non-transitory computer storage medium of claim 15, wherein:
    - determining a context associated with the spoken query term further comprises determining that a device used by the user to enter the spoken query term was docked or holstered when the user spoke the spoken query terms; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user when a device used by the user to enter the past search queries was docked or holstered when the user entered the past search queries.
  - 19. The non-transitory computer storage medium of claim 15, wherein:
    - determining a context associated with the spoken query term further comprise determining a location of the user when the spoken query term was spoken by the user; and
      
      selecting a subset of the past search queries that include contexts which are similar to the context of the spoken query term further comprises selecting a subset of the past search queries that were submitted by the user when the user was within a predetermined distance from the location, or when the user was in a same geographic region as the location.

20. A method comprising:
- receiving an audio signal that corresponds to a spoken query term;
  
  performing speech recognition on the audio signal to select two or more textual, candidate transcriptions that match the spoken query term, and to establish a speech recognition confidence value for each candidate transcription;
  
  obtaining a search history for a user who spoke the spoken query term, wherein the search history references one or more past search queries that have been submitted by the user;
  
  generating one or more n-grams from each candidate transcription, wherein each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription;
  
  determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency;
  
  generating, by one or more computers, and for each of the candidate transcriptions, a combined value based on combining the speech recognition confidence value for the candidate transcription with the weighting value for one or more of the n-grams that are generated from the candidate transcription;
  
  selecting an intended query term from among the candidate transcriptions based on the combined values; and
  
  causing a search engine to perform a search query that includes the intended query term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Schalkwyk, Johan, Risbood, Pankaj
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US12/845,034
Time in Patent Office

1,126 Days
Field of Search

704/236, 704/246, 704/247, 704/251, 704/252
US Class Current

704/236
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 16/90332   Natural language query form...

G06F 16/9535   Search customisation based ...

G10L 15/01   Assessment or evaluation of...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

Disambiguation of a spoken query term

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

145 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Disambiguation of a spoken query term

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

145 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links