Hierarchical word indexes used for efficient N-gram storage
First Claim
1. A method for storing an N-gram model in a memory of a device, comprising:
- identifying a plurality of word classes;
receiving a vocabulary of words, wherein each word in the vocabulary is associated with at least one of the plurality of classes;
associating a follower list with each word in the vocabulary;
storing in the memory information associated with a first word in the vocabulary, the information comprising;
(1) a first class index corresponding to a class in which at least a subset of the follower list is a member, and(2) a first plurality of word indexes corresponding to at least a subset of the follower list for the first word, wherein said word indexes are indexed based on the first class index.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for compressing data models, for example, N-gram language models used in speech recognition applications. Words in the vocabulary of the language model are assigned to classes of words, for example, by syntactic criteria, semantic criteria, or statistical analysis of an existing language model. After word classes are defined, the follower lists for words in the vocabulary may be stored as hierarchical sets of class indexes and word indexes within each class. Hierarchical word indexes may reduce the storage requirements for the N-gram language model by more efficiently representing multiple words in a single list in the same follower list.
20 Citations
35 Claims
-
1. A method for storing an N-gram model in a memory of a device, comprising:
-
identifying a plurality of word classes; receiving a vocabulary of words, wherein each word in the vocabulary is associated with at least one of the plurality of classes; associating a follower list with each word in the vocabulary; storing in the memory information associated with a first word in the vocabulary, the information comprising; (1) a first class index corresponding to a class in which at least a subset of the follower list is a member, and (2) a first plurality of word indexes corresponding to at least a subset of the follower list for the first word, wherein said word indexes are indexed based on the first class index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An electronic device comprising:
-
a processor controlling at least some operations of the electronic device; a memory storing computer executable instructions that, when executed by the processor, cause the electronic device to perform a method for storing an N-gram model, the method comprising; identifying a plurality of word classes; receiving a vocabulary of words, wherein each word in the vocabulary is associated with at least one of the plurality of classes; associating a follower list with each word in the vocabulary; storing in the memory information associated with a first word in the vocabulary, the information comprising; (1) a first class index corresponding to a class in which at least a subset of the follower list is a member, and (2) a first plurality of word indexes corresponding to at least a subset of the follower list for the first word, wherein said word indexes are indexed based on the first class index. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. One or more computer readable media storing computer-executable instructions which, when executed on a computer system, perform a method for storing an N-gram model in a memory of a device, the method comprising:
-
identifying a plurality of word classes; receiving a vocabulary of words, wherein each word in the vocabulary is associated with at least one of the plurality of classes; associating a follower list with each word in the vocabulary; storing in the memory information associated with a first word in the vocabulary, the information comprising; (1) a first class index corresponding to a class in which at least a subset of the follower list is a member, and (2) a first plurality of word indexes corresponding to at least a subset of the follower list for the first word, wherein said word indexes are indexed based on the first class index. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. An electronic device comprising:
-
an input component for receiving input from a user of the electronic device; a processor controlling at least some operations of the electronic device; and a memory storing computer executable instructions that, when executed by the processor, cause the electronic device to perform a method for retrieving follower words from an N-gram model, said method comprising; receiving an input corresponding to a sequence of words; retrieving from the memory a first word identifier corresponding to a first word in the sequence of words; retrieving from the memory a follower list associated with the first word, the follower list comprising a class index and a plurality of word indexes, wherein said word indexes are indexed based on the class index; and retrieving from the memory a plurality of follower words corresponding to the combinations of the class index with the plurality of word indexes. - View Dependent Claims (25, 26, 27, 28)
-
-
29. A method for retrieving follower words from an N-gram model in a memory of a device, comprising:
-
receiving an input corresponding to a sequence of words; retrieving from the memory a first word identifier corresponding to a first word in the sequence of words; retrieving from the memory a follower list associated with the first word, the follower list comprising a class index and a plurality of word indexes, wherein said word indexes are indexed based on the class index; and retrieving from the memory a plurality of follower words corresponding to the combinations of the class index with the plurality of word indexes. - View Dependent Claims (30, 31, 32, 33)
-
-
34. An electronic device comprising:
-
a storage means for storing an N-gram model of follower words; an input means for receiving an input corresponding to a sequence of words; means for retrieving from the storage means a first word identifier corresponding to a first word in the sequence of words; means for retrieving from the storage means a follower list associated with the first word, the follower list comprising a class index and a plurality of word indexes, wherein said word indexes are indexed based on the class index; and means for retrieving from the storage means a plurality of follower words corresponding to the combinations of the class index with the plurality of word indexes. - View Dependent Claims (35)
-
Specification