Speaker and call characteristic sensitive open voice search
First Claim
1. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
- receiving a spoken query;
converting the spoken query into a first text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings associated with the first text query, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query;
identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal;
classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features;
re-assigning scores to the respective text strings based on evaluating the respective text strings with the respective language model of the at least one voice cluster;
identifying a second text query based on the re-assigned scores;
receiving search results from an information retrieval system based on the second text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and
modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
71 Citations
17 Claims
-
1. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
-
receiving a spoken query; converting the spoken query into a first text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings associated with the first text query, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query; identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal; classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features; re-assigning scores to the respective text strings based on evaluating the respective text strings with the respective language model of the at least one voice cluster; identifying a second text query based on the re-assigned scores; receiving search results from an information retrieval system based on the second text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for executing a voice search, the computer-implemented method comprising:
-
receiving a spoken query; converting the spoken query into one or more text strings; assigning a score to each of the text strings based on a language model, the score of each respective text string being used to compute a probability of correct text conversion of the spoken query; identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal; classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features; re-assigning scores to the text strings based on evaluating the text strings with the respective language model of the voice cluster, the text query being selected based on the re-assigned scores; and receiving search results from an information retrieval system based on the text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster. - View Dependent Claims (9, 10)
-
-
11. A system for executing a voice search, the system comprising:
a processor; and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to perform the operations of; receiving a spoken query; converting the spoken query into a first text query using a speech recognition process, the speech recognition process using a language model that assigns a score to respective text strings associated with the first text query, the score of each respective text string being used to compute a probability of correct conversion of the spoken query to the text query; identifying acoustic features of a voice signal corresponding to the spoken query, the identified acoustic features including at least one of pitch, frequency, or volume associated with the voice signal; classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the voice cluster having a respective language model and a respective text cluster, the text cluster including a corpus of text that has been determined to be relevant or commonly used with users having the identified acoustic features; re-assigning scores to one or more respective text strings based on evaluating the respective text strings with the respective language model of the at least one voice cluster; identifying a second text query based on the re-assigned scores; receiving search results from an information retrieval system based on the second text query, each respective search result having a ranking indicating a measure of importance relative to other search results; and modifying rankings of the search results based on evaluating the search results with the respective text cluster of the voice cluster. - View Dependent Claims (12, 13, 14, 15, 16, 17)
Specification