Speaker and call characteristic sensitive open voice search

US 9,099,092 B2
Filed: 01/10/2014
Issued: 08/04/2015
Est. Priority Date: 03/03/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by a microphone of a computing device, a spoken query from a user;

processing, by the computing device, the spoken query using parallel processes,wherein a first process of the parallel processes comprises;

converting, by the computing device, the spoken query into one or more text strings using a speech recognition process; and

assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings;

wherein a second process of the parallel processes comprises;

identifying, by the computing device, acoustic features of a voice signal corresponding to the spoken query; and

classifying, by the computing device, the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user;

selecting a text query based on the one or more text strings and the customized language model.receiving, by the computing device, search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and

re-ranking, by the computing device, the search results based on re-scoring the search results using the respective text cluster.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.

Citations

20 Claims

1. A method comprising:
- receiving, by a microphone of a computing device, a spoken query from a user;
  
  processing, by the computing device, the spoken query using parallel processes,wherein a first process of the parallel processes comprises;
  
  converting, by the computing device, the spoken query into one or more text strings using a speech recognition process; and
  
  assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings;
  
  wherein a second process of the parallel processes comprises;
  
  identifying, by the computing device, acoustic features of a voice signal corresponding to the spoken query; and
  
  classifying, by the computing device, the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user;
  
  selecting a text query based on the one or more text strings and the customized language model.receiving, by the computing device, search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and
  
  re-ranking, by the computing device, the search results based on re-scoring the search results using the respective text cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - re-assigning scores to the one or more text strings based on evaluating the one or more text strings with the customized language model, the text query being selected based on the re-assigned scores.
  - 3. The method of claim 2, further comprising:
    - using a user interaction log of user activity of the user with the search results to update the at least one voice cluster and the text cluster.
  - 4. The method of claim 1, further comprising:
    - accessing utterances from a collection of utterances;
      
      separating the utterances into groups of utterances based on the identified acoustic features of the voice signal and a predetermined measure of similarity among acoustic voice features, wherein a given group of the utterances represents a set of speakers having similar acoustic voice features; and
      
      for each group of utterances of the groups of utterances, creating a statistical language model specific to the group of utterances.
  - 5. The method of claim 1, further comprising:
    - playing utterances, from a collection of utterances, via a user interface; and
      
      receiving manual input, based on acoustic voice features, that classifies each utterance into at least one group of utterances representing a set of speakers having similar acoustic voice features.
  - 6. The method of claim 1, further comprising:
    - receiving metadata that corresponds to the spoken query,wherein classifying the spoken query into the at least one voice cluster based on the identified acoustic features of the voice signal comprises classifying the spoken query based on the metadata in addition to the identified acoustic features of the voice signal.
  - 7. The method of claim 6, wherein the metadata comprises a time of day when the spoken query was captured.
  - 8. The method of claim 1, wherein receiving the spoken query from the user comprises receiving the spoken query from a wireless mobile device, andwherein receiving the search results from the information retrieval system based on the text query comprises receiving search results from an open domain search executed by a search engine.

9. A system for executing a voice search, the system comprising:
- a microphone;
  
  a processor; and
  
  non-transitory memory storing instructions that, when executed by the processor, cause the system to;
  
  receive, via the microphone, a spoken query from a user;
  
  process the spoken query using parallel processes,wherein a first process of the parallel processes comprises;
  
  converting the spoken query into one or more text strings using a speech recognition process; and
  
  assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings;
  
  wherein a second process of the parallel processes comprises;
  
  identifying acoustic features of a voice signal corresponding to the spoken query; and
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user;
  
  select a text query based on the one or more text strings and the customized language model;
  
  receive search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and
  
  re-rank the search results based on re-scoring the search results using the respective text cluster.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 9, wherein the non-transitory memory stores instructions that, when executed by the processor, cause the system to:
    - re-assign scores to the one or more text strings based on evaluating the one or more text strings with the customized language model of the at least one voice cluster, the text query being selected based on the re-assigned scores.
  - 17. The system of claim 16, wherein the non-transitory memory stores instructions that, when executed by the processor, cause the system to:
    - use a user interaction log of user activity of the user with the search results to update the at least one voice cluster and the text cluster.
  - 18. The system of claim 9, wherein the non-transitory memory stores instructions that, when executed by the processor, cause the system to:
    - play utterances, from a collection of utterances, via a user interface; and
      
      receive manual input, based on acoustic voice features, that classifies each utterance into at least one group of utterances representing a set of speakers having similar acoustic voice features.
  - 19. The system of claim 9, wherein the non-transitory memory stores instructions that, when executed by the processor, cause the system to:
    - receive metadata that corresponds to the spoken query,wherein classifying the spoken query into the at least one voice cluster based on the identified acoustic features of the voice signal comprises classifying the spoken query based on the metadata in addition to the identified acoustic features of the voice signal.
  - 20. The system of claim 9, wherein receiving the spoken query from the user comprises receiving the spoken query from a wireless mobile device, andwherein receiving the search results from the information retrieval system based on the text query comprises receiving search results from an open domain search executed by a search engine.

10. A non-transitory computer-readable medium storing executable instructions that, when executed by a processor, cause a device to:
- receive, via a microphone, a spoken query from a user;
  
  process the spoken query using parallel processes,wherein a first process of the parallel processes comprises;
  
  converting the spoken query into one or more text strings using a speech recognition process; and
  
  assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings;
  
  wherein a second process of the parallel processes comprises;
  
  identifying acoustic features of a voice signal corresponding to the spoken query; and
  
  classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user;
  
  select a text query based on the one or more text strings and the customized language model;
  
  receive search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and
  
  re-rank the search results based on re-scoring the search results using the respective text cluster.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The non-transitory computer-readable medium of claim 10, storing executable instructions that, when executed by the processor, cause the device to:
    - re-assign scores to the one or more text strings based on evaluating the one or more text strings with the customized language model of the at least one voice cluster, the text query being selected based on the re-assigned scores.
  - 12. The non-transitory computer-readable medium of claim 11, storing executable instructions that, when executed by the processor, cause the device to:
    - use a user interaction log of user activity of the user with the search results to update the at least one voice cluster and the text cluster.
  - 13. The non-transitory computer-readable medium of claim 10, storing executable instructions that, when executed by the processor, cause the device to:
    - play utterances, from a collection of utterances, via a user interface; and
      
      receive manual input, based on acoustic voice features, that classifies each utterance into at least one group of utterances representing a set of speakers having similar acoustic voice features.
  - 14. The non-transitory computer-readable medium of claim 10, storing executable instructions that, when executed by the processor, cause the device to:
    - receive metadata that corresponds to the spoken query,wherein classifying the spoken query into the at least one voice cluster based on the identified acoustic features of the voice signal comprises classifying the spoken query based on the metadata in addition to the identified acoustic features of the voice signal.
  - 15. The non-transitory computer-readable medium of claim 10, wherein receiving the spoken query from the user comprises receiving the spoken query from a wireless mobile device, andwherein receiving the search results from the information retrieval system based on the text query comprises receiving search results from an open domain search executed by a search engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zhang, Shilei, Bao, Shenghua, Liu, Wen, Qin, Yong, Shuang, Zhiwei, Chen, Jian, Su, Zhong, Shi, Qin, Ganong, William F. III
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
THOMAS-HOMESCU, ANNE L

Application Number

US14/152,136
Publication Number

US 20140129220A1
Time in Patent Office

571 Days
Field of Search

704/235, 704/249, 704/9, 704/257, 704/252, 704/276, 704/275, 704/253, 704/251, 704/236, 704/200, 704231- 23, 704/245, 704/240, 704/270, 704/254, 704/256.6, 704/244, 707/723, 707/752, 707/707, 707/731, 707/722, 707/759, 707/751, 707/769, 707/999.003, 707/999.103, 707/999.104, 707/999.007, 707/999.01
US Class Current

1/1
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/35   Clustering; Classification

G06F 16/433   using audio data

G06F 16/9535   Search customisation based ...

G10L 15/18   using natural language mode...

G10L 15/1807   using prosody or stress

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Speaker and call characteristic sensitive open voice search

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker and call characteristic sensitive open voice search

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links