Information retrieval and speech recognition based on language models
First Claim
1. A method of adapting a language model used in a speech recognition system which has access to a first data store and a second data store, the second data store being large relative to the first data store, the method comprising:
- formulating an information retrieval query based on information contained in the first data store;
querying the second data store based on the query formulated;
retrieving information from the second data store based on the query; and
adapting the language model based on the information retrieved and the information in the first data store.
2 Assignments
0 Petitions
Accused Products
Abstract
A language model is used in a speech recognition system which has access to a first, smaller data store and a second, larger data store. The language model is adapted by formulating an information retrieval query based on information contained in the first data store and querying the second data store. Information retrieved from the second data store is used in adapting the language model. Also, language models are used in retrieving information from the second data store. Language models are built based on information in the first data store, and based on information in the second data store. The perplexity of a document in the second data store is determined, given the first language model, and given the second language model. Relevancy of the document is determined based upon the first and second perplexities. Documents are retrieved which have a relevancy measure that exceeds a threshold level.
338 Citations
36 Claims
-
1. A method of adapting a language model used in a speech recognition system which has access to a first data store and a second data store, the second data store being large relative to the first data store, the method comprising:
-
formulating an information retrieval query based on information contained in the first data store;
querying the second data store based on the query formulated;
retrieving information from the second data store based on the query; and
adapting the language model based on the information retrieved and the information in the first data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
repeating the steps of formulating, querying, retrieving, and adapting while a user is using the speech recognition system.
-
-
3. The method of claim 2 wherein the steps of formulating, querying, retrieving, and adapting are performed intermittently while a user is using the speech recognition system.
-
4. The method of claim 1 wherein formulating an information retrieval query comprises:
formulating an information retrieval query based on documents previously created by the user and stored in the first data store.
-
5. The method of claim 1 wherein formulating an information retrieval query comprises:
formulating an information retrieval query based on information contained in a document then being prepared by the user.
-
6. The method of claim 1 wherein formulating an information retrieval query comprises:
formulating an information retrieval query based on information related to a type of document then being prepared by the user.
-
7. The method of claim 6 wherein formulating an information retrieval query comprises:
formulating an information retrieval query based on a template then being used by the user to prepare the document.
-
8. The method of claim 6 wherein formulating an information retrieval query comprises:
formulating an information retrieval query based on an application program then being used by the user to prepare the document.
-
9. The method of claim 6 wherein formulating an information retrieval query comprises:
formulating an information retrieval query based on a time of day during which the user is preparing the document.
-
10. The method of claim 1 wherein retrieving information comprises:
-
retrieving a plurality of documents from the second information store; and
determining a relevance measure associated with each document retrieved.
-
-
11. The method of claim 10 wherein adapting the language model comprises:
adapting the language model based on relevant documents retrieved which have a relevance measure which meets a threshold value.
-
12. The method of claim 11 wherein adapting the language model comprises:
-
assigning a weight to each relevant document; and
adapting the language model based on the relevant documents according to the weight assigned to each relevant document.
-
-
13. The method of claim 1 wherein retrieving information from the second data store comprises retrieving a plurality of documents from the second data store and further comprising:
-
weighting the documents retrieved from the second data store lower than the information in the first data store; and
wherein adapting the language model comprises adapting the language model based on the information in the first data store and the documents retrieved, as weighted against the information in the first data store.
-
-
14. The method of claim 1 wherein the language model includes probability estimates of word sequences, and wherein adapting the language model comprises:
adjusting the probability estimates based on the information in the first data store and the information retrieved from the second data store.
-
15. The method of claim 12 wherein assigning a weight to the documents retrieved from the second data store comprises:
assigning an increased weight to the documents retrieved from the second data store as a number of times the second data store is queried increases, at least until the increased weight reaches a weight threshold.
-
16. The method of claim 1 wherein querying the second data store comprises:
querying information through a global computer network.
-
17. The method of claim 1 wherein adapting comprises:
constructing a first language model based on the information retrieved from a first query and the information in the first data store.
-
18. The method of claim 17 wherein adapting further comprises:
-
constructing a second language model based on the information retrieved from a subsequent query; and
combining the first and second language models.
-
-
19. A method of retrieving information from a second data store which is relevant to information stored in a first data store wherein the second data store is larger than the first data store, the method comprising:
-
providing a first language model based on information stored in the first data store;
providing a second language model;
determining a first perplexity of a document in the second data store, given the first language model;
determining a second perplexity of the document, given the second language model;
determining a relevancy measure of the document based on the first and second perplexities; and
selectively retrieving the document based on the relevancy measure. - View Dependent Claims (20, 21, 22, 23)
repeating the steps of determining a first perplexity, determining a second perplexity and determining a relevancy measure, for a plurality of documents in the second data store; and
retrieving relevant documents from the plurality of documents which have a relevancy measure which meets a threshold level.
-
-
21. The method of claim 19 wherein providing a second language model comprises:
providing the second language model based on information stored in the second data store.
-
22. The method of claim 19 wherein determining a relevancy measure comprises:
-
determining a ratio of the first and second perplexities relative to one another; and
determining the relevancy measure based on the ratio.
-
-
23. The method of claim 20 wherein retrieving relevant documents comprises:
ranking documents according to the relevancy measure determined for each document.
-
24. A method of retrieving information from a second data store which is relevant to information stored in a first data store wherein the second data store is larger than the first data store, the method comprising:
-
providing a first context dependent language model based on information in the first data store;
providing a second context dependent language model based on information in the second data store;
determining a relevancy of a document in the second data store based on a predictive capability of the first language model given the document and based on a predictive capability of the second language model given the document; and
retrieving the document if the relevancy meets a relevancy threshold value. - View Dependent Claims (25, 26, 27, 28, 29, 30)
determining the relevancy based on a branching factor of the first language model given the document and based on a branching factor of the second language model given the document.
-
-
26. The method of claim 24 and further comprising:
-
repeating the steps of determining a relevancy for a plurality of documents in the second data store;
comparing the relevancy determined to the relevancy threshold; and
retrieving the documents having a relevancy which meets the relevancy threshold.
-
-
27. The method of claim 26 and further comprising:
adapting the relevancy threshold based on a number of documents which meet the relevancy threshold.
-
28. The method of claim 24 wherein providing the first language model comprises:
-
querying the second data store based on information in the first data store; and
constructing the first language model based on the information in the first data store and based on information from the second data store retrieved based on the query.
-
-
29. The method of claim 24 wherein providing the first language model comprises:
-
constructing a preliminary language model based on information in the first data store; and
combining the preliminary language model with the second language model to obtain the first language model.
-
-
30. The method of claim 24 wherein providing the second language model comprises:
constructing the second language model based on a subset of all information stored in the second data store.
-
31. A method of retrieving information from a second data store which is relevant to information stored in a first data store wherein the second data store is larger than the first data store, the method comprising:
-
providing a first language model based on information stored in the first data store;
determining a first perplexity of a document in the second data store, given the first language model;
determining a relevancy measure of the document based on the first perplexity;
repeating the steps of determining a first perplexity, and determining a relevancy measure, for a plurality of documents in the second data store; and
retrieving relevant documents from the plurality of documents which have a relevancy measure which meets a threshold level. - View Dependent Claims (32)
providing a second language model based on information stored in the second data store;
determining a second perplexity of the document, given the second language model;
wherein determining a relevancy measure comprises determining the relevancy measure of the document based on the first perplexity and based on the second perplexity; and
wherein repeating comprises repeating the steps of determining a first perplexity, determining a second perplexity and determining a relevancy measure, for a plurality of documents in the second data store.
-
-
33. A method of recognizing speech, comprising:
-
providing a first data store;
providing a second data store, the second data store being large relative to the first data store;
providing a language model;
formulating an information retrieval query based on information contained in the first data store;
querying the second data store based on the query formulated;
retrieving information from the second data store based on the query; and
adapting the language model based on the information retrieved and the information in the first data store. - View Dependent Claims (34, 35, 36)
repeating the steps of formulating, querying, retrieving, and adapting, intermittently, while a user is using the speech recognition system.
-
-
35. The method of claim 34 wherein repeating comprises:
repeating the steps intermittently based on time.
-
36. The method of claim 34 wherein repeating comprises:
repeating the steps while the user is preparing a document using the speech recognition system after a predetermined number of words have been recognized during preparation of the document.
Specification