Method and apparatus to provide a hierarchical index for a language model data structure
First Claim
1. A method for storing a plurality of bigram word indexes corresponding to a specified unigram as a common base with a specific offset characterized in that the bigram word indexes are part of a trigram language model of a consecutive speech recognition system wherein language model models the Wall Street Journal task.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for storing bigram word indexes of a language model for a consecutive speech recognition system (200) is described. The bigram word indexes (321) are stored as a common two-byte base with a specific one-byte offset to significantly reduce storage requirements of the language model data file. In one embodiment the storage space required for storing the bigram word indexes (321) sequentially is compared to the storage space required to store the bigram word indexes as a common base with specific offset. The bigram word indexes (321) are then stored so as to minimize the size of the language model data file.
56 Citations
22 Claims
- 1. A method for storing a plurality of bigram word indexes corresponding to a specified unigram as a common base with a specific offset characterized in that the bigram word indexes are part of a trigram language model of a consecutive speech recognition system wherein language model models the Wall Street Journal task.
-
3. A method for storing a plurality of bigram word indexes, each bigram word index corresponding to a specified unigram as a common base with a specific offset, the bigram word indexes part of a trigram language model of a consecutive speech recognition system wherein language model models the Wall Street Journal task, the method comprising:
-
determining storage space required for sequential storage of the plurality of bigram word indexes corresponding to a specified unigram;
determining storage space required for hierarchical data structure storage of the plurality of bigram word indexes; and
implementing hierarchical data structure storage of the plurality of bigram word indexes if the storage space required for hierarchical data structure storage of the plurality of bigram word indexes is less than the storage space required for sequential storage of the plurality of bigram word indexes. - View Dependent Claims (4, 5)
-
-
6. A machine-readable medium that provides executable instructions which, when executed by a processor, cause the processor to perform a method for storing a plurality of bigram word indexes, the bigram word indexes part of a trigram language model of a consecutive speech recognition system wherein language model models the Wall Street Journal task, the method comprising:
-
determining storage space required for sequential storage of the plurality of bigram word indexes corresponding to a specified unigram;
determining storage space required for hierarchical data structure storage of the plurality of bigram word indexes; and
implementing hierarchical data structure storage of the plurality of bigram word indexes if the storage space required for hierarchical data structure storage of the plurality of bigram word indexes is less than the storage space required for sequential storage of the plurality of bigram word indexes. - View Dependent Claims (7, 8)
-
-
9. An apparatus comprising a processor with a memory coupled thereto, characterized in that
the memory has stored therein instructions which, when executed by the processor, cause the processor to (a) determine storage space required for sequential storage of a plurality of bigram word indexes, the bigram word indexes part of a trigram language model of a consecutive speech recognition system wherein language model models the Wall Street Journal task (b) determine storage space required for hierarchical data structure storage of the plurality of bigram word indexes, and (c) implement hierarchical data structure storage of the plurality of bigram word indexes if the storage space required for hierarchical data structure storage of the plurality of bigram word indexes is less than the storage space required for sequential storage of the plurality of bigram word indexes.
- 12. A method for storing a plurality of bigram word indexes corresponding to a specified unigram as a common base with a specific offset characterized in that the bigram word indexes are part of a trigram language model of a consecutive speech recognition system wherein language model models the Chinese Task 863.
-
14. A method for storing a plurality of bigram word indexes, each bigram word index corresponding to a specified unigram as a common base with a specific offset, the bigram word indexes part of a trigram language model of a consecutive speech recognition system wherein language model models the Chinese Task 863, the method comprising:
-
determining storage space required for sequential storage of the plurality of bigram word indexes corresponding to a specified unigram;
determining storage space required for hierarchical data structure storage of the plurality of bigram word indexes; and
implementing hierarchical data structure storage of the plurality of bigram word indexes if the storage space required for hierarchical data structure storage of the plurality of bigram word indexes is less than the storage space required for sequential storage of the plurality of bigram word indexes. - View Dependent Claims (15, 16)
-
-
17. A machine-readable medium that provides executable instructions which, when executed by a processor, cause the processor to perform a method for storing a plurality of bigram word indexes, the bigram word indexes part of a trigram language model of a consecutive speech recognition system wherein language model models the Chinese Task 863, the method comprising:
-
determining storage space required for sequential storage of the plurality of bigram word indexes corresponding to a specified unigram;
determining storage space required for hierarchical data structure storage of the plurality of bigram word indexes; and
implementing hierarchical data structure storage of the plurality of bigram word indexes if the storage space required for hierarchical data structure storage of the plurality of bigram word indexes is less than the storage space required for sequential storage of the plurality of bigram word indexes. - View Dependent Claims (18, 19)
-
-
20. An apparatus comprising a processor with a memory coupled thereto, characterized in that
the memory has stored therein instructions which, when executed by the processor, cause the processor to (a) determine storage space required for sequential storage of a plurality of bigram word indexes, the bigram word indexes part of a trigram language model of a consecutive speech recognition system wherein language model models the Chinese Task 863 (b) determine storage space required for hierarchical data structure storage of the plurality of bigram word indexes, and (c) implement hierarchical data structure storage of the plurality of bigram word indexes if the storage space required for hierarchical data structure storage of the plurality of bigram word indexes is less than the storage space required for sequential storage of the plurality of bigram word indexes.
Specification