Method and apparatus for generating and managing a language model data structure
First Claim
Patent Images
1. A method comprising:
- assigning each of a plurality of segments comprising a received corpus to a node in a data structure denoting dependencies between nodes;
calculating a transitional probability between each of the nodes in the data structure; and
managing storage of the data structure across a system memory of a computer system and an extended memory of the computer system such that at least one said node is stored in the system memory and another said node is stored in the extended memory simultaneously.
3 Assignments
0 Petitions
Accused Products
Abstract
The generation and management of a language model data structure include assigning each segment of a received corpus to a node in a data structure that denotes dependencies between the respective nodes. A transitional probability between each of the nodes in the data structure is calculated. A frequency of occurrence is calculated for each item of the respective segments, and those nodes of the data structure associated with items that do not meet a minimum frequency of occurrence threshold are removed. The data structure may be managed across a system memory of a computer system and an extended memory of the computer system.
34 Citations
26 Claims
-
1. A method comprising:
-
assigning each of a plurality of segments comprising a received corpus to a node in a data structure denoting dependencies between nodes; calculating a transitional probability between each of the nodes in the data structure; and managing storage of the data structure across a system memory of a computer system and an extended memory of the computer system such that at least one said node is stored in the system memory and another said node is stored in the extended memory simultaneously. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for predicting a likelihood of an item in a corpus comprised of a plurality of items, the method comprising:
-
building a data structure, across a system memory of a computer system and an extended memory of the computer system, of corpus segments representing a dynamic context of item dependencies within the segments; calculating the likelihood of each item based, at least in part, on a likelihood of preceding items within the dynamic context; iteratively re-segmenting the corpus; and predicting a likelihood of an item in the re-segmented corpus. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A storage medium comprising executable instructions that are configured to generate, from a corpus, a data structure representing a statistical language model, the data structure for storage across a system memory and an extended memory, the data structure including:
-
one or more root nodes; and a plurality of subordinate nodes, ultimately linked to a root node, cumulatively comprising one or more sub-trees, wherein each node of a sub-tree represents, one or more items of a corpus and includes a measure of a Markov transition probability between the node and another linked node. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A modeling agent comprising:
-
a controller, to receive a corpus; and a data structure generator, responsive to and selectively invoked by the controller, to assign each of a plurality of segments comprising the received corpus to a node in a data structure denoting dependencies between nodes; wherein the modeling agent calculates a transitional probability between each of the nodes of the data structure to determine a predictive capability of a language model represented by the data structure and iteratively re-segments the received corpus until a threshold predictive capability is reached. - View Dependent Claims (23, 24, 25)
-
-
26. A storage medium comprising a plurality of executable instructions including at least a subset of which, when executed, implement a language modeling agent to assign each of a plurality of segments of a received corpus to a node in a data structure denoting dependencies between nodes, and to calculate a transitional probability between each of the nodes in the data structure to determine a predictive capability of a language model denoted by the data structure, wherein the modeling agent dynamically re-segments the received corpus to remove segments which do not meet a minimum frequency threshold.
Specification