SPEAKER AND CALL CHARACTERISTIC SENSITIVE OPEN VOICE SEARCH

US 20140129220A1
Filed: 01/10/2014
Published: 05/08/2014
Est. Priority Date: 03/03/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for executing a voice search, the computer-implemented method comprising:

receiving a spoken query;

converting the spoken query into a text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;

identifying acoustic features of a voice signal corresponding to the spoken query;

classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster;

receiving search results from an information retrieval system based on the text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and

modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.

75 Citations

View as Search Results

20 Claims

1. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
- receiving a spoken query;
  
  converting the spoken query into a text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;
  
  identifying acoustic features of a voice signal corresponding to the spoken query;
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster;
  
  receiving search results from an information retrieval system based on the text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and
  
  modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, further comprising:
    - re-assigning scores to the text strings based on evaluating the text strings with the respective language model of the voice cluster, the text query being selected based on the re-assigned scores.
  - 3. The computer-implemented method of claim 2, further comprising:
    - using a user interaction log of user activity with the search results to update the voice cluster and text cluster.
  - 4. The computer-implemented method of claim 1, further comprising:
    - accessing utterances from a collection of utterances;
      
      automatically separating utterances into groups of utterances based on identified acoustic voice features and a predetermined measure of similarity among acoustic voice features, wherein a given group of utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances, creating a statistical language model specific to a respective group of utterances.
  - 5. The computer-implemented method of claim 1, further comprising:
    - playing utterances, from a collection of utterances, via a user interface; and
      
      receiving manual input, based on acoustic voice features, that classifies each utterance into at least one group of utterances, wherein each group of utterances represents a set of speakers having similar acoustic voice features.
  - 6. The computer-implemented method of claim 1, further comprising:
    - receiving metadata that corresponds to the spoken query; and
      
      wherein classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal includes classifying the spoken query based on the metadata in addition to the identified acoustic voice features.
  - 7. The computer-implemented method of claim 6, wherein receiving the metadata includes at least one of an area code of a telephone that captured the spoken query, a location of a mobile device that captured the spoken query, or a time of day when the spoken query was captured.
  - 8. The computer-implemented method of claim 1, wherein receiving a spoken query includes receiving the spoken query from a wireless mobile device;
    - andwherein receiving search results from an information retrieval system based on the text query includes receiving search results from an open domain search executed by a search engine.

9. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
- receiving a spoken query;
  
  converting the spoken query into a text query using a speech recognition process, the speech recognition process using a language model that assigns a score to text strings, the score of each respective text string being used to compute a probability of correct text conversion of the spoken query;
  
  identifying acoustic features of a voice signal corresponding to the spoken query;
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster;
  
  re-assigning scores to the text strings based on evaluating the text strings with the respective language model of the voice cluster, the text query being selected based on the re-assigned scores; and
  
  receiving search results from an information retrieval system based on the text query, each respective search result having a ranking indicating a measure of importance relative to other search results.
- View Dependent Claims (10, 11, 12)
- - 10. The computer-implemented method of claim 9, further comprising:
    - modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
  - 11. The computer-implemented method of claim 10, further comprising:
    - accessing utterances from a collection of utterances;
      
      separating utterances into groups of utterances based on identified acoustic voice features, wherein a given group of utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances, creating a statistical language model specific to a respective group of utterances.
  - 12. The computer-implemented method of claim 11, further comprising:
    - receiving metadata that corresponds to the spoken query;
      
      wherein classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal includes classifying the spoken query based on the metadata in addition to the identified acoustic voice features; and
      
      using a user interaction log of user activity with the search results to update the voice cluster and text cluster.

13. A system for executing a voice search, the system comprising:
- a processor; and
  
  a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to perform the operations of;
  
  receiving a spoken query;
  
  converting the spoken query into a text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;
  
  identifying acoustic features of a voice signal corresponding to the spoken query;
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster;
  
  receiving search results from an information retrieval system based on the text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and
  
  modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system of claim 13, wherein the memory stores further instructions that, when executed by the processor, cause the system to perform the operation of:
    - re-assigning scores to the text strings based on evaluating the text strings with the respective language model of the voice cluster, the text query being selected based on the re-assigned scores.
  - 15. The system of claim 14, the memory stores further instructions that, when executed by the processor, cause the system to perform the operation of:
    - using a user interaction log of user activity with the search results to update the voice cluster and text cluster.
  - 16. The system of claim 13, the memory stores further instructions that, when executed by the processor, cause the system to perform the operations of:
    - accessing utterances from a collection of utterances;
      
      automatically separating utterances into groups of utterances based on identified acoustic voice features and a predetermined measure of similarity among acoustic voice features, wherein a given group of utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances, creating a statistical language model specific to a respective group of utterances.
  - 17. The system of claim 13, the memory stores further instructions that, when executed by the processor, cause the system to perform the operations of:
    - playing utterances, from a collection of utterances, via a user interface; and
      
      receiving manual input, acoustic voice features, that classifies each utterance into at least one group of utterances, wherein each group of utterances represents a set of speakers having similar acoustic voice features.
  - 18. The system of claim 13, the memory stores further instructions that, when executed by the processor, cause the system to perform the operations of:
    - receiving metadata that corresponds to the spoken query; and
      
      wherein classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal includes classifying the spoken query based on the metadata in addition to the identified acoustic voice features.
  - 19. The system of claim 18, wherein receiving the metadata includes at least one of an area code of a telephone that captured the spoken query, a location of a mobile device that captured the spoken query, or a time of day when the spoken query was captured.
  - 20. The system of claim 13, wherein receiving a spoken query includes receiving the spoken query from a wireless mobile device;
    - andwherein receiving search results from an information retrieval system based on the text query includes receiving search results from an open domain search executed by a search engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Shilei Zhang, William F. Ganong III
Inventors
Zhang, Shilei, Bao, Shenghua, Liu, Wen, Qin, Yong, Shuang, Zhiwei, Chen, Jian, Su, Zhong, Shi, Qin, Ganong, William F. III

Granted Patent

US 9,099,092 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/35   Clustering; Classification

G06F 16/433   using audio data

G06F 16/9535   Search customisation based ...

G10L 15/18   using natural language mode...

G10L 15/1807   using prosody or stress

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

SPEAKER AND CALL CHARACTERISTIC SENSITIVE OPEN VOICE SEARCH

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

75 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

SPEAKER AND CALL CHARACTERISTIC SENSITIVE OPEN VOICE SEARCH

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

75 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others