Vocabulary and/or language model training
First Claim
1. A method for establishing a vocabulary model and /or a statistical language model for subsequent use by a pattern recognition system, said method comprising:
- receiving at least one context identifier;
deriving at least one search criterion from said at least one context identifier;
selecting a first set of at least one document based on said at least one search criterion;
composing a training corpus from said first set of at least one document;
selecting a second set of at least one document based on said at least one search criterion; and
modifying said training corpus to incorporate said second set of at least one document.
2 Assignments
0 Petitions
Accused Products
Abstract
A system includes means for creating a vocabulary and/or statistical language model from a textual training corpus. The vocabulary and/or language model are used in a pattern recognition system, such as a speech recognition system or a handwriting recognition system, for recognizing a time-sequential input pattern. The system includes means for determining at least one context identifier and means for deriving at least one search criterion, such as a keyword, from the context identifier. The system further includes means for selecting documents from a set of documents based on the search criterion. Advantageously, an Internet search engine is used for selecting the documents. Means are used for composing the training corpus from the selected documents.
247 Citations
14 Claims
-
1. A method for establishing a vocabulary model and /or a statistical language model for subsequent use by a pattern recognition system, said method comprising:
-
receiving at least one context identifier;
deriving at least one search criterion from said at least one context identifier;
selecting a first set of at least one document based on said at least one search criterion;
composing a training corpus from said first set of at least one document;
selecting a second set of at least one document based on said at least one search criterion; and
modifying said training corpus to incorporate said second set of at least one document. - View Dependent Claims (2, 3, 4, 5)
creating a vocabulary model from said training corpus.
-
-
3. The method of claim 1, further comprising:
creating a statistical language model from said training corpus.
-
4. The method of claim 1, further comprising:
modifying a vocabulary model in response to said modification of said training corpus.
-
5. The method of claim 1, further comprising:
modifying a statistical language model in response to said modification of said training corpus.
-
6. A system for establishing a language model, said system comprising:
-
a means for deriving at least one search criterion from said at least one context identifier;
a means for selecting a first set of at least one document based on said at least one search criterion;
a means for composing a training corpus from said first set of at least one document;
a means for selecting a second set of at least one document based on said at least one search criterion; and
a means for modifying said training corpus to incorporate said second set of at least one document. - View Dependent Claims (7, 8, 9, 10)
a means for creating a vocabulary model from said training corpus.
-
-
8. The system of claim 6, further comprising:
a means for creating a statistical language model from said training corpus.
-
9. The system of claim 6, further comprising:
a means for modifying a vocabulary model in response to said modification of said training corpus.
-
10. The system of claim 6, further comprising:
a means for modifying a statistical language model in response to said modification of said training corpus.
-
11. In combination,
a computer system operable to establish a model, said computer system including a means for receiving at least one context identifier, a means for deriving at least one search criterion from said at least one context identifier, a means for selecting a first set of at least one document based on said at least one search criterion, a means for composing a training corpus from said first set of at least one document, and a means for creating said model from said training corpus; - and
a pattern recognition system operable to recognize a time-sequential input pattern based on said model. - View Dependent Claims (12, 13, 14)
a means for selecting a second set of at least one document based on said at least one search criterion; a means for modifying said training corpus to incorporate said second set of at least one document; and
a means for modifying said model in response to said modification of said training corpus.
- and
Specification