Speaker and call characteristic sensitive open voice search

US 10,032,454 B2
Filed: 06/25/2015
Issued: 07/24/2018
Est. Priority Date: 03/03/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

classifying, by a computing device, speech into at least one voice cluster based on identified acoustic features of the speech, the at least one voice cluster corresponding to a text cluster and a customized language model that reflects characteristics of a speaker of the speech;

determining, by the computing device, a text query based on the customized language model and one or more text strings determined based on the speech;

receiving, by the computing device, search results based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and

re-ranking, by the computing device, the search results based on re-scoring the search results using the text cluster;

receiving a user interaction log comprising click data associated with a user interaction with the re-ranked search results;

updating the at least one voice cluster based on the user interaction with the re-ranked search results; and

updating the customized language model based on the click data associated with the user interaction with the re-ranked search results.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.

Citations

16 Claims

1. A method comprising:
- classifying, by a computing device, speech into at least one voice cluster based on identified acoustic features of the speech, the at least one voice cluster corresponding to a text cluster and a customized language model that reflects characteristics of a speaker of the speech;
  
  determining, by the computing device, a text query based on the customized language model and one or more text strings determined based on the speech;
  
  receiving, by the computing device, search results based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and
  
  re-ranking, by the computing device, the search results based on re-scoring the search results using the text cluster;
  
  receiving a user interaction log comprising click data associated with a user interaction with the re-ranked search results;
  
  updating the at least one voice cluster based on the user interaction with the re-ranked search results; and
  
  updating the customized language model based on the click data associated with the user interaction with the re-ranked search results.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, comprising:
    - re-assigning scores to the one or more text strings based on evaluating the one or more text strings using the customized language model.
  - 3. The method of claim 1, comprising:
    - receiving, by the computing device, the speech;
      
      converting, by the computing device, the speech into the one or more text strings using a speech recognition process; and
      
      determining, using an initial language model, a score for each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the speech into the one or more text strings.
  - 4. The method of claim 1, comprising:
    - receiving metadata that corresponds to the speech,wherein classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech comprises classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech and the metadata that corresponds to the speech.
  - 5. The method of claim 4, wherein the metadata that corresponds to the speech comprises at least one of a time of day when the speech was captured, a telephone number associated with a telephone that captured the speech, model information of a mobile device that captured the speech, hardware information of the mobile device that captured the speech, or profile information of a profile of the speaker of the speech on the mobile device that captured the speech.
  - 6. The method of claim 5, comprising:
    - receiving previous spoken queries requested by the telephone associated with the telephone number; and
      
      modifying rankings of the search results based on the previous spoken queries.
  - 7. The method of claim 5, wherein the metadata that corresponds to the speech comprises an area code of the telephone number of the telephone that captured the speech, and wherein the method comprises:
    - using the area code to determine regional word choices and interests; and
      
      modifying rankings of the search results based on the regional word choices and interests.
  - 8. The method of claim 4, wherein the metadata comprises a location of a mobile device that captured the speech, wherein the location of the mobile device comprises a location of a moving vehicle.
  - 9. The method of claim 1, wherein classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech comprises determining that the identified acoustic features are associated with one of an age of the speaker or an accent of a geographic region.
  - 10. The method of claim 1, comprising:
    - receiving the speech from a wireless mobile device,wherein receiving the search results based on the text query comprises receiving the search results from an open domain search executed by a search engine based on the one or more text strings.

11. Non-transitory computer-readable media storing executable instructions that, when executed by one or more processors, cause a system to:
- classify speech into at least one voice cluster based on identified acoustic features of the speech, the at least one voice cluster corresponding to a text cluster and a customized language model that reflects characteristics of a speaker of the speech;
  
  determine a text query based on the customized language model and one or more text strings determined based on the speech;
  
  receive search results based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results;
  
  re-rank the search results based on re-scoring the search results using the text cluster;
  
  receive a user interaction log comprising click data associated with a user interaction with the re-ranked search results;
  
  update the at least one voice cluster based on the user interaction with the re-ranked search results; and
  
  update the customized language model based on the click data associated with the user interaction with the re-ranked search results.
- View Dependent Claims (15, 16)
- - 15. The non-transitory computer-readable media of claim 11, storing further executable instructions that, when executed by the one or more processors, cause the system to:
    - re-assign scores to the one or more text strings based on evaluating the one or more text strings using the customized language model.
  - 16. The non-transitory computer-readable media of claim 11, storing further executable instructions that, when executed by the one or more processors, cause the system to:
    - receive metadata that corresponds to the speech,wherein classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech comprises classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech and the metadata that corresponds to the speech.

12. A system comprising:
- one or more processors; and
  
  non-transitory memory storing executable instructions that, when executed by the one or more processors, cause the system to;
  
  classify speech into at least one voice cluster based on identified acoustic features of the speech, the at least one voice cluster corresponding to a text cluster and a customized language model that reflects characteristics of a speaker of the speech;
  
  determine a text query based on the customized language model and one or more text strings determined based on the speech;
  
  receive search results based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results;
  
  re-rank the search results based on re-scoring the search results using the text cluster;
  
  receive a user interaction log comprising click data associated with a user interaction with the re-rank search results;
  
  update the at least one voice cluster based on the user interaction with the re-ranked search results; and
  
  update the customized language model based on the click data associated with the user interaction with the re-ranked search results.
- View Dependent Claims (13, 14)
- - 13. The system of claim 12, wherein the non-transitory memory stores further executable instructions that, when executed by the one or more processors, cause the system to:
    - re-assign scores to the one or more text strings based on evaluating the one or more text strings using the customized language model.
  - 14. The system of claim 12, wherein the non-transitory memory stores further executable instructions that, when executed by the one or more processors, cause the system to:
    - receive metadata that corresponds to the speech,wherein classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech comprises classifying the speech into the at least one voice cluster based on the identified acoustic features of the speech and the metadata that corresponds to the speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zhang, Shilei, Bao, Shenghua, Liu, Wen, Qin, Yong, Shuang, Zhiwei, Chen, Jian, Su, Zhong, Shi, Qin, Ganong, III, William F.
Primary Examiner(s)
Thomas-Homescu, Anne L

Application Number

US14/750,000
Publication Number

US 20150294669A1
Time in Patent Office

1,125 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/35   Clustering; Classification

G06F 16/433   using audio data

G06F 16/9535   Search customisation based ...

G10L 15/18   using natural language mode...

G10L 15/1807   using prosody or stress

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Speaker and call characteristic sensitive open voice search

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker and call characteristic sensitive open voice search

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links