Method for dynamic context scope selection in hybrid N-GRAM+LSA language modeling
First Claim
Patent Images
1. A method of language modeling of a document comprising:
- computing, by a computer processor, a plurality of local probabilities of a current document;
determining a vector representation of the current document in a latent semantic analysis (LSA) space at a first time based on a first number of words present in the current document from a second time to the first time, wherein the second time precedes the first time;
computing, by a computer processor, a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and
combining the local probabilities and the global probabilities to produce a language modeling.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.
16 Citations
28 Claims
-
1. A method of language modeling of a document comprising:
-
computing, by a computer processor, a plurality of local probabilities of a current document; determining a vector representation of the current document in a latent semantic analysis (LSA) space at a first time based on a first number of words present in the current document from a second time to the first time, wherein the second time precedes the first time; computing, by a computer processor, a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and combining the local probabilities and the global probabilities to produce a language modeling. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer system for language modeling of a document comprising:
-
means for computing a plurality of local probabilities of a current document; means for determining a vector representation of the current document in a latent semantic analysis (LSA) space at a first time based on a first number of words present in the current document from a second time to the first time, wherein the second time precedes the first time; means for computing a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and means for combining the local probabilities and the global probabilities to produce a language modeling.
-
-
15. A non-transitory computer readable storage medium comprising instructions, which when executed on a processor, perform a method for language modeling of a document, comprising:
-
computing a plurality of local probabilities of a current document; determining a vector representation of the current document in a latent semantic analysis (LSA) space at a first time based on a first number of words present in the current document from a second time to the first time, wherein the second time precedes the first time; computing a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and combining the local probabilities and the global probabilities to produce a language modeling.
-
-
16. A system for language modeling of a document comprising:
- a digital signal processor;
a hybrid training/recognition processor coupled to the digital signal processor, configured to compute a plurality of local probabilities of a current document, determine a vector representation of the current document in a latent semantic analysis (LSA) space at a first time based on a first number of words present in the current document from a second time to the first time, wherein the second time precedes the first time, compute a plurality of global probabilities based upon the vector representation of the current document in an LSA space, and combine the local probabilities and the global probabilities to produce a language modeling. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- a digital signal processor;
-
28. A system for language modeling of a document comprising:
- a digital signal proceeor;
a hybrid training/recognition processor coupled to the digital signal processor, configured to compute a plurality of local probabilities of a current document, determine a vector representation of the current document in a latent semantic analysis (LSA) space based on a first number of words in the current document, compute a plurality of global probabilities based upon the vector representation of the current document in an LSA space, and combine the local probabilities and the global probabilities to produce a language modeling, wherein the processor is further configured to generate the vector representation of the current document in an LSA space, {tilde over (v)}, at time q, wherein nq is the total number of words in the current document, ip is the index of the word observed at time p, ε
ip is the normalized entropy of the word observed at time p within a text T, 0<
λ
≦
1, μ
ip is the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
- a digital signal proceeor;
Specification