Method for dynamic context scope selection in hybrid n-gram+LSA language modeling

US 6,477,488 B1
Filed: 03/10/2000
Issued: 11/05/2002
Est. Priority Date: 03/10/2000
Status: Active Grant

First Claim

Patent Images

1. A method of dynamic language modeling of a document comprising:

computing a plurality of local probabilities of a current document;

determining a vector representation of the current document in a latent semantic analysis (LSA) space, wherein the vector representation of the current document in an LSA space is based upon a plurality of temporally ordered words and is generated from at least one decomposition matrix of a singular value decomposition of a co-occurrence matrix, W, between M words in a vocabulary V and N documents in a text corpus T;

computing a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and

combining the local probabilities and the global probabilities to produce the language modeling.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.

251 Citations

44 Claims

1. A method of dynamic language modeling of a document comprising:
- computing a plurality of local probabilities of a current document;
  
  determining a vector representation of the current document in a latent semantic analysis (LSA) space, wherein the vector representation of the current document in an LSA space is based upon a plurality of temporally ordered words and is generated from at least one decomposition matrix of a singular value decomposition of a co-occurrence matrix, W, between M words in a vocabulary V and N documents in a text corpus T;
  
  computing a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and
  
  combining the local probabilities and the global probabilities to produce the language modeling.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 22)
- - 2. The method of claim 1 wherein the plurality of local probabilities is based upon an n-gram paradigm.
  - 3. The method of claim 1 wherein the plurality of local probabilities Pr(w_q|H_q^(t)) for a particular word w_q, drawn from a vocabulary V comprising a plurality of words w_i, given a local contextual history H_q^(T)of n−
    - 1 words w_q−
      
      1, w_q−
      
      2, . . . w_q−
      
      n+1, as;
4. The method of claim 1 wherein the vector representation of the current document in an LSA space is based upon all words from a beginning of a session.
5. The method of claim 4 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  ${\tilde{v}}_{q} = \frac{1}{n_{q}} \sum_{p = 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
6. The method of claim 1 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, P is the number of temporally adjacent words up to the current word, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  ${\tilde{v}}_{q} = \frac{1}{P} \sum_{p = q - P + 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
7. The method of claim 1 wherein the vector representation of the current document in an LSA space is based upon a plurality of exponentially weighted temporally ordered words.
8. The method of claim 7 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, 0<
  
  λ
  
  ≦
  
  1, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  $\tilde{v} = \frac{1}{n_{q}} \sum_{p = 1}^{q} λ^{(nq - np)} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
9. The method of claim 1 wherein the plurality of global probabilities is based upon a latent semantic paradigm.
10. The method of claim 1 wherein the plurality of global probabilities Pr(w_q|H_q−
- 1) for a particular word w_q, for an associated history of the word, H_q−
  
  1, for the current document {tilde over (d)}_q−
  
  i, as;
11. The method of claim 10 wherein combining the local probabilities and the global probabilities is computed as follows:
- $\Pr (w_{q} \langle {\tilde{H}}_{q - 1}) = \frac{\Pr (w_{q} \langle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \langle w_{q})}{\sum_{w_{i} \in V} \Pr (w_{i} \langle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \langle w_{i})} .$
21. The method of claim 1 wherein the plurality of global probabilities Pr(w_q|H_q−
- 1) for a particular word w_q, for an associated history of the word, H_q−
  
  1, for the current document {tilde over (d)}_q−
  
  i, as;
  
  ${\tilde{v}}_{q} = \frac{1}{n_{q}} \sum_{p = 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
  
  Pr(w_q|H_q−
  
  1)=Pr(w_q|{tilde over (d)}_q−
  
  i),based upon the vector representation of the current document in an LSA space.
22. The method of claim 21 wherein combining the local probabilities and the global probabilities is computed as follows:
- $\Pr (w_{q} \langle {\tilde{H}}_{q - 1}) = \frac{\Pr (w_{q} \langle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \langle w_{q})}{\sum_{w_{i} \in V} \Pr (w_{i} \langle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \langle w_{i})} .$

12. A system for dynamic language modeling of a document comprising:
- means for computing a plurality of local probabilities of a current document;
  
  means for determining a vector representation of the current document in a latent semantic analysis (LSA) space based upon a plurality of temporally ordered words and is generated from at least one decomposition matrix of a singular value decomposition of a co-occurrence matrix, W, between M words in a vocabulary V and N documents in a text corpus T;
  
  means for computing a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and
  
  means for combining the local probabilities and the global probabilities to produce the language modeling.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The system of claim 12 wherein the plurality of local probabilities is based upon an n-gram paradigm.
  - 14. The system of claim 12 wherein the plurality of local probabilities Pr(w_q|H_q^(t)) for a particular word w_q, drawn from a vocabulary V comprising a plurality of words w_i, given a local contextual history H_q^(T)of n−
    - 1 words w_q−
      
      1, w_q−
      
      2, . . . w_q−
      
      n+1, as;
15. The method of claim 12 wherein the vector representation of the current document in an LSA space is based upon all words from a beginning of a session.
16. The method of claim 15 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  ${\tilde{v}}_{q} = \frac{1}{n_{q}} \sum_{p = 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
17. The method of claim 12 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, P is the number of temporally adjacent words up to the current word, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  ${\tilde{v}}_{q} = \frac{1}{P} \sum_{p = q - P + 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
18. The system of claim 12 wherein the vector representation of the current document in an LSA space is based upon a plurality of exponentially weighted temporally ordered words.
19. The method of claim 18 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, 0<
  
  λ
  
  ≦
  
  1, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, as;
  
  $\tilde{v} = \frac{1}{n_{q}} \sum_{p = 1}^{q} λ^{(nq - np)} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
20. The method of claim 12 wherein the plurality of global probabilities is based upon a latent semantic paradigm.

23. A computer readable medium comprising instructions, which when executed on a processor, perform a method for dynamic language modeling of a document, comprising:
- computing a plurality of local probabilities of a current document;
  
  determining a vector representation of the current document in a latent semantic analysis (LSA) space based upon a plurality of temporally ordered words and is generated from at least one decomposition matrix of a singular value decomposition of a co-occurrence matrix, W, between M words in a vocabulary V and N documents in a text corpus T;
  
  computing a plurality of global probabilities based upon the vector representation of the current document in an LSA space; and
  
  combining the local probabilities and the global probabilities to produce the language modeling.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 24. The computer-readable medium of claim 23 wherein the plurality of local probabilities is based upon an n-gram paradigm.
  - 25. The computer-readable medium of claim 23 wherein the plurality of local probabilities Pr(w_q|H_q^(t)) for a particular word w_q, drawn from a vocabulary V comprising a plurality of words w_i, given a local contextual history H_q^(T)of n−
    - 1 words w_q−
      
      1, w_q−
      
      2, . . . w_q−
      
      n+1, as;
26. The computer-readable medium of claim 23 wherein the vector representation of the current document in an LSA space is based upon all words from a beginning of a session.
27. The computer-readable medium of claim 26 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  ${\tilde{v}}_{q} = \frac{1}{n_{q}} \sum_{p = 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
28. The computer-readable medium of claim 23 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, P is the number of temporally adjacent words up to the current word, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  ${\tilde{v}}_{q} = \frac{1}{P} \sum_{p = q - P + 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
29. The computer-readable medium of claim 23 wherein the vector representation of the current document in an LSA space is based upon a plurality of exponentially weighted temporally ordered words.
30. The computer-readable medium of claim 29 wherein the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
- _i_pis the normalized entropy of the word observed at time p within a text T, 0<
  
  λ
  
  ≦
  
  1, μ
  
  _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
  
  $\tilde{v} = \frac{1}{n_{q}} \sum_{p = 1}^{q} λ^{(nq - np)} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
31. The computer-readable medium of claim 23 wherein the plurality of global probabilities is based upon a latent semantic paradigm.
32. The computer-readable medium of claim 23 wherein the plurality of global probabilities Pr(w_q|H_q−
- 1) for a particular word w_q, for an associated history of the word, H_q−
  
  1, for the current document {tilde over (d)}_q−
  
  i, as;
33. The computer-readable medium of claim 32 wherein combining the local probabilities and the global probabilities is computed as follows:
- $\Pr (w_{q} \langle {\tilde{H}}_{q - 1}) = \frac{\Pr (w_{q} \langle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \langle w_{q})}{\sum_{w_{i} \in V} \Pr (w_{i} \langle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \langle w_{i})}$

34. A system for dynamic language modeling of a document comprising a hybrid training/recognition processor configured to compute a plurality of local probabilities of a current document, determine a vector representation of the current document in a latent semantic analysis (LSA) space, compute a plurality of local probabilities based upon the vector representation of the current document in an LSA space, and combine the local probabilities and the global probabilities to produce the language modeling, wherein the processor is further configured to generate the vector representation of the current document in an LSA space based upon a plurality of temporally ordered words from at least one decomposition matrix of a singular value decomposition of a co-occurrence matrix, W, between M words in a vocabulary V and N documents in a text corpus T.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 35. The system of claim 34 wherein the processor is further configured to generate the plurality of local probabilities based upon an n-gram paradigm.
  - 36. The system of claim 34 wherein the processor is further configured to generate the plurality of local probabilities Pr(w_q|H_q^(t)) for a particular word w_q, drawn from a vocabulary V comprising a plurality of words w_i, given a local contextual history H_q^(T)of n−
    - 1 words w_q−
      
      1, w_q−
      
      2, . . . w_q−
      
      n+1, as;
  - 37. The system of claim 34 wherein the processor is further configured to generate the vector representation of the current document in an LSA space based upon all words from a beginning of a session.
  - 38. The method of claim 37 wherein the the processor is further configured to generate the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
    - _i_pis the normalized entropy of the word observed at time p within a text T, μ
      
      _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
      
      ${\tilde{v}}_{q} = \frac{1}{n_{q}} \sum_{p = 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
  - 39. The system of claim 34 wherein the processor is further configured to generate the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
    - _i_pis the normalized entropy of the word observed at time p within a text T, P is the number of temporally adjacent words up to the current word, μ
      
      _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
      
      ${\tilde{v}}_{q} = \frac{1}{P} \sum_{p = q - P + 1}^{q} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
  - 40. The method of claim 34 wherein the processor is further configured to generate the vector representation of the current document in an LSA space is based upon a plurality of exponentially weighted temporally ordered words.
  - 41. The method of claim 40 wherein the the processor is further configured to generate the vector representation of the current document in an LSA space, v_q, at time q, wherein n_qis the total number of words in the current document, i_pis the index of the word observed at time p, ε
    - _i_pis the normalized entropy of the word observed at time p within a text T, 0<
      
      λ
      
      ≦
      
      1, μ
      
      _i_pis the left singular vector at time p of the singular value decomposition of W, and S is the diagonal matrix of singular values of the singular value decomposition of W, as;
      
      $\tilde{v} = \frac{1}{n_{q}} \sum_{p = 1}^{q} λ^{(nq - np)} (1 - ɛ_{i_{p}}) u_{i_{p}} S^{- 1} .$
  - 42. The system of claim 34 wherein the processor is further configured to generate the plurality of global probabilities based upon a latent semantic paradigm.
  - 43. The system of claim 34 wherein the processor is further configured to generate the plurality of local probabilities Pr(w_q|H_q−
    - 1) for a particular word w_q, for an associated history of the word, H_q−
      
      1, for the current document {tilde over (d)}_q−
      
      i, as;
  - 44. The system of claim 43 wherein the processor is further configured to combine the local probabilities and the global probabilities as follows:
    - $\Pr (w_{q} \rangle {\tilde{H}}_{q - 1}) = \frac{\Pr (w_{q} \rangle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \rangle w_{q})}{\sum_{w_{i} \in V} \Pr (w_{i} \rangle w_{q - 1} w_{q - 2} \dots w_{q - n + 1}) \Pr ({\tilde{d}}_{q - 1} \rangle w_{i})} .$

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Bellegarda, Jerome R.
Primary Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US09/523,070
Time in Patent Office

970 Days
Field of Search

704/9, 704/1, 704/10, 704/251, 704/256, 704/257, 704/255, 707/530, 707/531, 706/55, 706/56
US Class Current

704/9
CPC Class Codes

G06F 40/216   using statistical methods

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/197   Probabilistic grammars, e.g...

Method for dynamic context scope selection in hybrid n-gram+LSA language modeling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

251 Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Method for dynamic context scope selection in hybrid n-gram+LSA language modeling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

251 Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links