Information retrieval and speech recognition based on language models

US 6,418,431 B1
Filed: 03/30/1998
Issued: 07/09/2002
Est. Priority Date: 03/30/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of adapting a language model used in a speech recognition system which has access to a first data store and a second data store, the second data store being large relative to the first data store, the method comprising:

formulating an information retrieval query based on information contained in the first data store;

querying the second data store based on the query formulated;

retrieving information from the second data store based on the query; and

adapting the language model based on the information retrieved and the information in the first data store.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A language model is used in a speech recognition system which has access to a first, smaller data store and a second, larger data store. The language model is adapted by formulating an information retrieval query based on information contained in the first data store and querying the second data store. Information retrieved from the second data store is used in adapting the language model. Also, language models are used in retrieving information from the second data store. Language models are built based on information in the first data store, and based on information in the second data store. The perplexity of a document in the second data store is determined, given the first language model, and given the second language model. Relevancy of the document is determined based upon the first and second perplexities. Documents are retrieved which have a relevancy measure that exceeds a threshold level.

338 Citations

36 Claims

1. A method of adapting a language model used in a speech recognition system which has access to a first data store and a second data store, the second data store being large relative to the first data store, the method comprising:
- formulating an information retrieval query based on information contained in the first data store;
  
  querying the second data store based on the query formulated;
  
  retrieving information from the second data store based on the query; and
  
  adapting the language model based on the information retrieved and the information in the first data store.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1 and further comprising:
3. The method of claim 2 wherein the steps of formulating, querying, retrieving, and adapting are performed intermittently while a user is using the speech recognition system.
4. The method of claim 1 wherein formulating an information retrieval query comprises:
- formulating an information retrieval query based on documents previously created by the user and stored in the first data store.
5. The method of claim 1 wherein formulating an information retrieval query comprises:
- formulating an information retrieval query based on information contained in a document then being prepared by the user.
6. The method of claim 1 wherein formulating an information retrieval query comprises:
- formulating an information retrieval query based on information related to a type of document then being prepared by the user.
7. The method of claim 6 wherein formulating an information retrieval query comprises:
- formulating an information retrieval query based on a template then being used by the user to prepare the document.
8. The method of claim 6 wherein formulating an information retrieval query comprises:
- formulating an information retrieval query based on an application program then being used by the user to prepare the document.
9. The method of claim 6 wherein formulating an information retrieval query comprises:
- formulating an information retrieval query based on a time of day during which the user is preparing the document.
10. The method of claim 1 wherein retrieving information comprises:
- retrieving a plurality of documents from the second information store; and
  
  determining a relevance measure associated with each document retrieved.
11. The method of claim 10 wherein adapting the language model comprises:
- adapting the language model based on relevant documents retrieved which have a relevance measure which meets a threshold value.
12. The method of claim 11 wherein adapting the language model comprises:
- assigning a weight to each relevant document; and
  
  adapting the language model based on the relevant documents according to the weight assigned to each relevant document.
13. The method of claim 1 wherein retrieving information from the second data store comprises retrieving a plurality of documents from the second data store and further comprising:
- weighting the documents retrieved from the second data store lower than the information in the first data store; and
  
  wherein adapting the language model comprises adapting the language model based on the information in the first data store and the documents retrieved, as weighted against the information in the first data store.
14. The method of claim 1 wherein the language model includes probability estimates of word sequences, and wherein adapting the language model comprises:
- adjusting the probability estimates based on the information in the first data store and the information retrieved from the second data store.
15. The method of claim 12 wherein assigning a weight to the documents retrieved from the second data store comprises:
- assigning an increased weight to the documents retrieved from the second data store as a number of times the second data store is queried increases, at least until the increased weight reaches a weight threshold.
16. The method of claim 1 wherein querying the second data store comprises:
- querying information through a global computer network.
17. The method of claim 1 wherein adapting comprises:
- constructing a first language model based on the information retrieved from a first query and the information in the first data store.
18. The method of claim 17 wherein adapting further comprises:
- constructing a second language model based on the information retrieved from a subsequent query; and
  
  combining the first and second language models.

19. A method of retrieving information from a second data store which is relevant to information stored in a first data store wherein the second data store is larger than the first data store, the method comprising:
- providing a first language model based on information stored in the first data store;
  
  providing a second language model;
  
  determining a first perplexity of a document in the second data store, given the first language model;
  
  determining a second perplexity of the document, given the second language model;
  
  determining a relevancy measure of the document based on the first and second perplexities; and
  
  selectively retrieving the document based on the relevancy measure.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The method of claim 19 and further comprising:
21. The method of claim 19 wherein providing a second language model comprises:
- providing the second language model based on information stored in the second data store.
22. The method of claim 19 wherein determining a relevancy measure comprises:
- determining a ratio of the first and second perplexities relative to one another; and
  
  determining the relevancy measure based on the ratio.
23. The method of claim 20 wherein retrieving relevant documents comprises:
- ranking documents according to the relevancy measure determined for each document.

24. A method of retrieving information from a second data store which is relevant to information stored in a first data store wherein the second data store is larger than the first data store, the method comprising:
- providing a first context dependent language model based on information in the first data store;
  
  providing a second context dependent language model based on information in the second data store;
  
  determining a relevancy of a document in the second data store based on a predictive capability of the first language model given the document and based on a predictive capability of the second language model given the document; and
  
  retrieving the document if the relevancy meets a relevancy threshold value.
- View Dependent Claims (25, 26, 27, 28, 29, 30)
- - 25. The method of claim 24 wherein determining a relevancy of the document based on a predictive capability of the first and second language models comprises:
26. The method of claim 24 and further comprising:
- repeating the steps of determining a relevancy for a plurality of documents in the second data store;
  
  comparing the relevancy determined to the relevancy threshold; and
  
  retrieving the documents having a relevancy which meets the relevancy threshold.
27. The method of claim 26 and further comprising:
- adapting the relevancy threshold based on a number of documents which meet the relevancy threshold.
28. The method of claim 24 wherein providing the first language model comprises:
- querying the second data store based on information in the first data store; and
  
  constructing the first language model based on the information in the first data store and based on information from the second data store retrieved based on the query.
29. The method of claim 24 wherein providing the first language model comprises:
- constructing a preliminary language model based on information in the first data store; and
  
  combining the preliminary language model with the second language model to obtain the first language model.
30. The method of claim 24 wherein providing the second language model comprises:
- constructing the second language model based on a subset of all information stored in the second data store.

31. A method of retrieving information from a second data store which is relevant to information stored in a first data store wherein the second data store is larger than the first data store, the method comprising:
- providing a first language model based on information stored in the first data store;
  
  determining a first perplexity of a document in the second data store, given the first language model;
  
  determining a relevancy measure of the document based on the first perplexity;
  
  repeating the steps of determining a first perplexity, and determining a relevancy measure, for a plurality of documents in the second data store; and
  
  retrieving relevant documents from the plurality of documents which have a relevancy measure which meets a threshold level.
- View Dependent Claims (32)
- - 32. The method of claim 31 and further comprising:

33. A method of recognizing speech, comprising:
- providing a first data store;
  
  providing a second data store, the second data store being large relative to the first data store;
  
  providing a language model;
  
  formulating an information retrieval query based on information contained in the first data store;
  
  querying the second data store based on the query formulated;
  
  retrieving information from the second data store based on the query; and
  
  adapting the language model based on the information retrieved and the information in the first data store.
- View Dependent Claims (34, 35, 36)
- - 34. The method of claim 33 and further comprising:
35. The method of claim 34 wherein repeating comprises:
- repeating the steps intermittently based on time.
36. The method of claim 34 wherein repeating comprises:
- repeating the steps while the user is preparing a document using the speech recognition system after a predetermined number of words have been recognized during preparation of the document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Mahajan, Milind V., Huang, Xuedong D.
Primary Examiner(s)
Black, Thomas
Assistant Examiner(s)
RONES, CHARLES

Application Number

US09/050,286
Time in Patent Office

1,562 Days
Field of Search

704/253, 704/243, 704/252, 704/258, 704/268, 704/9, 704/201, 704/257, 382/159, 382/116, 707/4
US Class Current

1/1
CPC Class Codes

G06F 16/3346   using probabilistic model

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

Y10S 707/99934   Query formulation, input pr...

Information retrieval and speech recognition based on language models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

338 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Information retrieval and speech recognition based on language models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

338 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links