Speaker and call characteristic sensitive open voice search
First Claim
1. A method comprising:
- receiving, by a microphone of a computing device, a spoken query from a user;
processing, by the computing device, the spoken query using parallel processes,wherein a first process of the parallel processes comprises;
converting, by the computing device, the spoken query into one or more text strings using a speech recognition process; and
assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings;
wherein a second process of the parallel processes comprises;
identifying, by the computing device, acoustic features of a voice signal corresponding to the spoken query; and
classifying, by the computing device, the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user;
selecting a text query based on the one or more text strings and the customized language model.receiving, by the computing device, search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and
re-ranking, by the computing device, the search results based on re-scoring the search results using the respective text cluster.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving, by a microphone of a computing device, a spoken query from a user; processing, by the computing device, the spoken query using parallel processes, wherein a first process of the parallel processes comprises; converting, by the computing device, the spoken query into one or more text strings using a speech recognition process; and assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings; wherein a second process of the parallel processes comprises; identifying, by the computing device, acoustic features of a voice signal corresponding to the spoken query; and classifying, by the computing device, the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user; selecting a text query based on the one or more text strings and the customized language model. receiving, by the computing device, search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and re-ranking, by the computing device, the search results based on re-scoring the search results using the respective text cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for executing a voice search, the system comprising:
-
a microphone; a processor; and non-transitory memory storing instructions that, when executed by the processor, cause the system to; receive, via the microphone, a spoken query from a user; process the spoken query using parallel processes, wherein a first process of the parallel processes comprises; converting the spoken query into one or more text strings using a speech recognition process; and assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings; wherein a second process of the parallel processes comprises; identifying acoustic features of a voice signal corresponding to the spoken query; and classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user; select a text query based on the one or more text strings and the customized language model; receive search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and re-rank the search results based on re-scoring the search results using the respective text cluster. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
10. A non-transitory computer-readable medium storing executable instructions that, when executed by a processor, cause a device to:
-
receive, via a microphone, a spoken query from a user; process the spoken query using parallel processes, wherein a first process of the parallel processes comprises; converting the spoken query into one or more text strings using a speech recognition process; and assigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings; wherein a second process of the parallel processes comprises; identifying acoustic features of a voice signal corresponding to the spoken query; and classifying the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user; select a text query based on the one or more text strings and the customized language model; receive search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and re-rank the search results based on re-scoring the search results using the respective text cluster. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification