Apparatus and method for forming a filtered inflected language model for automatic speech recognition
First Claim
1. A method of forming a language model for a language having a selected vocabulary of word forms, the method comprising the steps of:
- (a) mapping the word forms into integer vectors in accordance with frequencies of word form occurrence;
(b) partitioning the integer vectors into subsets, the subsets respectively having ranges of frequencies of word form occurrence associated therewith, the subsets being arranged in a descending order of ranges;
(c) respectively assigning maps to the subsets;
(d) filtering a textual corpora using the maps assigned to the subsets in order to generate indexed integers;
(e) determining n-gram statistics for the indexed integers;
(f) estimating n-gram language model probabilities from the n-gram statistics to form the language model; and
(g) determining a probability of a word sequence uttered by a speaker, using said language model.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of forming a language model for a language having a selected vocabulary of word forms comprises: (a) mapping the word forms into integer vectors in accordance with frequencies of word form occurrence; (b) partitioning the integer vectors into subsets, the subsets respectively having ranges of frequencies of word form occurrence associated therewith, the subsets being arranged in a descending order of frequency ranges; (c) respectively assigning maps to the subsets; (d) filtering a textual corpora using the maps assigned to the subsets in order to generate indexed integers; (e) determining n-gram statistics for the indexed integers; and (f) estimating n-gram language model probabilities from the n-gram statistics to form the language model.
103 Citations
27 Claims
-
1. A method of forming a language model for a language having a selected vocabulary of word forms, the method comprising the steps of:
-
(a) mapping the word forms into integer vectors in accordance with frequencies of word form occurrence; (b) partitioning the integer vectors into subsets, the subsets respectively having ranges of frequencies of word form occurrence associated therewith, the subsets being arranged in a descending order of ranges; (c) respectively assigning maps to the subsets; (d) filtering a textual corpora using the maps assigned to the subsets in order to generate indexed integers; (e) determining n-gram statistics for the indexed integers; (f) estimating n-gram language model probabilities from the n-gram statistics to form the language model; and (g) determining a probability of a word sequence uttered by a speaker, using said language model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. Apparatus for forming a language model for a language having a selected vocabulary of word forms, the apparatus comprising:
-
means for mapping the word forms into integer vectors in accordance with frequencies of word form occurrence; means for partitioning the integer vectors into subsets, the subsets respectively having ranges of frequencies of word form occurrence associated therewith, the subsets being arranged in a descending order of ranges; means for respectively assigning maps to the subsets; means for filtering a textual corpora using the maps assigned to the subsets in order to generate indexed integers; means for determining n-gram statistics for the indexed integers; means for estimating n-gram language model probabilities from the n-gram statistics to form the language model; and means for determining a probability of a word sequence uttered by a speaker, using said language model. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for forming a language model for a language having a selected vocabulary of word forms, the method comprising the steps of:
-
(a) mapping the word forms into integer vectors in accordance with frequencies of word form occurrence; (b) partitioning the integer vectors into subsets, the subsets respectively having ranges of frequencies of word form occurrence associated therewith, the subsets being arranged in a descending order of ranges; (c) respectively assigning maps to the subsets; (d) filtering a textual corpora using the maps assigned to the subsets in order to generate indexed integers; (e) determining n-gram statistics for the indexed integers; (f) estimating n-gram language model probabilities from the n-gram statistics to form the language model; and (g) determining a probability of a word sequence uttered by a speaker, using said language model.
-
Specification