Speaker and call characteristic sensitive open voice search

US 8,630,860 B1
Filed: 03/03/2011
Issued: 01/14/2014
Est. Priority Date: 03/03/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for executing a voice search, the computer-implemented method comprising:

receiving a spoken query;

converting the spoken query into a first text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings associated with the first text query, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;

identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal;

classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features;

re-assigning scores to the respective text strings based on evaluating the respective text strings with the respective language model of the at least one voice cluster;

identifying a second text query based on the re-assigned scores;

receiving search results from an information retrieval system based on the second text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and

modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.

71 Citations

View as Search Results

17 Claims

1. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
- receiving a spoken query;
  
  converting the spoken query into a first text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings associated with the first text query, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;
  
  identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal;
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features;
  
  re-assigning scores to the respective text strings based on evaluating the respective text strings with the respective language model of the at least one voice cluster;
  
  identifying a second text query based on the re-assigned scores;
  
  receiving search results from an information retrieval system based on the second text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and
  
  modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, further comprising:
    - using a user interaction log of user activity with the search results to update the voice cluster and text cluster.
  - 3. The computer-implemented method of claim 1, further comprising:
    - accessing utterances from a collection of utterances;
      
      automatically separating utterances into groups of utterances based on identified acoustic voice features and a predetermined measure of similarity among acoustic voice features, wherein a given group of utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances, creating a statistical language model specific to a respective group of utterances.
  - 4. The computer-implemented method of claim 1, further comprising:
    - playing utterances, from a collection of utterances, via a user interface; and
      
      receiving manual input, based on acoustic voice features, that classifies each utterance into at least one group of utterances, wherein each group of utterances represents a set of speakers having similar acoustic voice features.
  - 5. The computer-implemented method of claim 1, further comprising:
    - receiving metadata that corresponds to the spoken query; and
      
      wherein classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal includes classifying the spoken query based on the metadata in addition to the identified acoustic voice features.
  - 6. The computer-implemented method of claim 5, wherein receiving the metadata includes at least one of an area code of a telephone that captured the spoken query, a location of a mobile device that captured the spoken query, or a time of day when the spoken query was captured.
  - 7. The computer-implemented method of claim 1, wherein receiving a spoken query includes receiving the spoken query from a wireless mobile device;
    - andwherein receiving search results from an information retrieval system based on the text query includes receiving search results from an open domain search executed by a search engine.

8. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
- receiving a spoken query;
  
  converting the spoken query into one or more text strings;
  
  assigning a score to each of the text strings based on a language model, the score of each respective text string being used to compute a probability of correct text conversion of the spoken query;
  
  identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal;
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features;
  
  re-assigning scores to the text strings based on evaluating the text strings with the respective language model of the voice cluster, the text query being selected based on the re-assigned scores; and
  
  receiving search results from an information retrieval system based on the text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and
  
  modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
- View Dependent Claims (9, 10)
- - 9. The computer-implemented method of claim 8, further comprising:
    - accessing utterances from a collection of utterances;
      
      separating utterances into groups of utterances based on identified acoustic voice features, wherein a given group of utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances, creating a statistical language model specific to a respective group of utterances.
  - 10. The computer-implemented method of claim 9, further comprising:
    - receiving metadata that corresponds to the spoken query;
      
      wherein classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal includes classifying the spoken query based on the metadata in addition to the identified acoustic voice features; and
      
      using a user interaction log of user activity with the search results to update the voice cluster and text cluster.

11. A system for executing a voice search, the system comprising:
- a processor; and
  
  a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to perform the operations of;
  
  receiving a spoken query;
  
  converting the spoken query into a first text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings associated with the first text query, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;
  
  identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal;
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features;
  
  re-assigning scores to one or more respective text strings based on evaluating the respective text strings with the respective language model of the at least one voice cluster;
  
  identifying a second text query based on the re-assigned scores;
  
  receiving search results from an information retrieval system based on the second text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and
  
  modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The system of claim 11, the memory stores further instructions that, when executed by the processor, cause the system to perform the operation of:
    - using a user interaction log of user activity with the search results to update the voice cluster and text cluster.
  - 13. The system of claim 11, the memory stores further instructions that, when executed by the processor, cause the system to perform the operations of:
    - accessing utterances from a collection of utterances;
      
      automatically separating utterances into groups of utterances based on identified acoustic voice features and a predetermined measure of similarity among acoustic voice features, wherein a given group of utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances, creating a statistical language model specific to a respective group of utterances.
  - 14. The system of claim 11, the memory stores further instructions that, when executed by the processor, cause the system to perform the operations of:
    - playing utterances, from a collection of utterances, via a user interface; and
      
      receiving manual input, acoustic voice features, that classifies each utterance into at least one group of utterances, wherein each group of utterances represents a set of speakers having similar acoustic voice features.
  - 15. The system of claim 11, the memory stores further instructions that, when executed by the processor, cause the system to perform the operations of:
    - receiving metadata that corresponds to the spoken query; and
      
      wherein classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal includes classifying the spoken query based on the metadata in addition to the identified acoustic voice features.
  - 16. The system of claim 15, wherein receiving the metadata includes at least one of an area code of a telephone that captured the spoken query, a location of a mobile device that captured the spoken query, or a time of day when the spoken query was captured.
  - 17. The system of claim 11, wherein receiving a spoken query includes receiving the spoken query from a wireless mobile device;
    - andwherein receiving search results from an information retrieval system based on the text query includes receiving search results from an open domain search executed by a search engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bao, Shenghua, Liu, Wen, Qin, Yong, Shuang, Zhiwei, Chen, Jian, Su, Zhong, Shi, Qin, Ganong, William F. III, Zhang, Shilei
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
THOMAS-HOMESCU, ANNE L

Application Number

US13/039,467
Time in Patent Office

1,048 Days
Field of Search

704/275, 704/249, 704/235, 704/9, 704/257, 704/252, 704/276, 704/253, 704/251, 704/236, 707/707, 707/752, 707/723
US Class Current

704/275
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/35   Clustering; Classification

G06F 16/433   using audio data

G06F 16/9535   Search customisation based ...

G10L 15/18   using natural language mode...

G10L 15/1807   using prosody or stress

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Speaker and call characteristic sensitive open voice search

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

71 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker and call characteristic sensitive open voice search

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links