Method and system for deriving a large-span semantic language model for large-vocabulary recognition systems

US 5,828,999 A
Filed: 05/06/1996
Issued: 10/27/1998
Est. Priority Date: 05/06/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method for deriving a large-span semantic language model for a large vocabulary recognition system, the method comprising the steps of:

(a) mapping words into a vector space, where each word is represented by a vector;

(b) clustering the vectors into a set of clusters, where each cluster represents a semantic event;

(c) computing a first probability that a first word will occur given a history of prior words by,(i) calculating a second probability that a vector representing the first word belongs to each of the clusters, the second probability capable of being independent of a location of the first word in a sentence;

(ii) calculating a third probability of each cluster occurring in a history of prior words; and

(iii) weighting the second probability by the third probability.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for deriving a large-span semantic language model for a large vocabulary recognition system is disclosed. The method and system maps words from a vocabulary into a vector space, where each word is represented by a vector. After the vectors are mapped to the space, the vectors are clustered into a set of clusters, where each cluster represents a semantic event. After clustering the vectors, a probability that a first word will occur given a history of prior words is computed by (i) calculating a probability that the vector representing the first word belongs to each of the clusters; (ii) calculating a probability of each cluster occurring in a history of prior words; and weighting (i) by (ii) to provide the probability.

Citations

32 Claims

1. A method for deriving a large-span semantic language model for a large vocabulary recognition system, the method comprising the steps of:
- (a) mapping words into a vector space, where each word is represented by a vector;
  
  (b) clustering the vectors into a set of clusters, where each cluster represents a semantic event;
  
  (c) computing a first probability that a first word will occur given a history of prior words by,(i) calculating a second probability that a vector representing the first word belongs to each of the clusters, the second probability capable of being independent of a location of the first word in a sentence;
  
  (ii) calculating a third probability of each cluster occurring in a history of prior words; and
  
  (iii) weighting the second probability by the third probability.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 22, 23, 24, 25, 26, 27)
- - 2. A method as in claim 1 wherein step (a) further includes the step of:
    - (a)(i) tabulating the number of times each word occurs in a set of N training documents using a word-document matrix, wherein entries from the matrix form the vectors, which have dimension N.
  - 3. A method as in claim 2 wherein step (c)(i) further includes the step of:
    - (c)(i)(1) calculating a distance between the vector representing the first word and each of the clusters.
  - 4. A method as in claim 3 wherein step (c)(ii) further includes the step of:
    - (c)(ii)(1) calculating how frequently each cluster occurs in the history of prior words.
  - 5. A method as in claim 4 wherein step (a) further includes the step of:
    - (a)(ii) reducing the vectors to a dimension less than N.
  - 6. A method as in claim 5 wherein the dimension of the vectors is reduced by performing a singular value decomposition of the word-document matrix.
  - 7. A method as in claim 6 wherein the singular value decomposition is performed on the word-document matrix K by calculating:
    - space="preserve" listing-type="equation">K.sub.≈
      
      =K=USV.sup.T
      where U is a (M×
      
      R) matrix of left singular vectors, S is a (R×
      
      R) diagonal matrix of singular values, and V is a (N×
      
      R) matrix of right singular vectors.
  - 8. A method as in claim 7 wherein the distance between the vector representing the first word (u_i) and a vector representing a second word (u_j) is calculated by ##EQU9##
  - 9. A method as in claim 8 wherein the first probability that the first word (w_i) will occur given a history of prior words (H_i-1) is calculated by ##EQU10##
  - 22. A computer-readable medium as in claim 1 wherein instruction (c)(ii) further includes an instruction for:
    - (c)(ii)(1) calculating how frequently each cluster occurs in the history of prior words.
  - 23. A computer-readable medium as in claim 22 wherein instruction (a) further includes an instruction for:
    - (a)(ii) reducing the vectors to a dimension less than N.
  - 24. A computer-readable medium as in claim 23 wherein the dimension of the vectors is reduced by performing a singular value decomposition of the word-document matrix.
  - 25. A computer-readable medium as in claim 24 wherein the singular value decomposition is performed on the word-document matrix K by calculating:
    - space="preserve" listing-type="equation">K.sub.≈
      
      K=USV.sup.T
      where U is a (M×
      
      R) matrix of left singular vectors, S is a (R×
      
      R) diagonal matrix of singular values, and V is a (N×
      
      R) matrix of right singular vectors.
  - 26. A computer-readable medium as in claim 25 wherein the distance between the vector representing the first word (u_i) and a vector representing a second word (u_j) is calculated by ##EQU13##
  - 27. A computer-readable medium as in claim 26 wherein the first probability that the first word (w_i) will occur given a history of prior words (H_i-1) is calculated by ##EQU14##

10. A system for deriving a large-span semantic language model comprising:
- a training database;
  
  a memory;
  
  a processor coupled to the memory; and
  
  a pattern recognition system executed by the process, the pattern recognition system including,means for mapping words from the training database into a vector space, where each word is represented by a vector, means for clustering the vectors into a set of clusters, where each cluster represents a semantic event, andmeans for computing a first probability that a first word will occur given a history of prior words by calculating a second probability that a vector representing the first word belongs to each of the clusters, calculating a third probability of each cluster occurring in a history of prior words, and by weighting the second probability by the third probability;
  
  wherein the second probability is capable of being independent of a location of the first word in a sentence.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. A system as in claim 10 wherein the means for mapping includes means for tabulating the number of times each word occurs in a set of N training documents using a word-document matrix, wherein entries from the matrix form the vectors having dimension N.
  - 12. A system as in claim 11 wherein the second probability is calculated by calculating a distance between the vector representing the first word and each of the clusters.
  - 13. A system as in claim 12 wherein the third probability is calculated by calculating how frequently each cluster occurs in the history of prior words.
  - 14. A system as in claim 13 wherein the means for mapping includes means for reducing the vectors to a dimension less than N.
  - 15. A system as in claim 14 wherein the dimension of the vectors is reduced by performing a singular value decomposition of the word-document matrix.
  - 16. A system as in claim 15 wherein the singular value decomposition is performed on the word-document matrix by calculating:
    - space="preserve" listing-type="equation">K.sub.≈
      
      K=USV.sup.T
      where U is a (M×
      
      R) matrix of left singular vectors, S is a (R×
      
      R) diagonal matrix of singular values, and V is a (N×
      
      R) matrix of right singular vectors.
  - 17. A system as in claim 16 wherein the distance between the vector representing the first word (u_i) and a vector representing a second word (u_j) is calculated by ##EQU11##
  - 18. A system as in claim 17 wherein the first probability that the first word (w_i) will occur given a history of prior words (H_i-1) is calculated by ##EQU12##

19. A computer-readable medium containing program instructions for deriving a large-span semantic language model for a large vocabulary recognition system, the program instructions for:
- (a) mapping words into a vector space, where each word is represented by a vector;
  
  (b) clustering the vectors into a set of clusters, where each cluster represents a semantic event;
  
  (c) computing a first probability that a first word will occur given a history of prior words by,(i) calculating a second probability that a vector representing the first word belongs to each of the clusters, the second probability capable of being independent of a location of the first word in a sentence;
  
  (ii) calculating a third probability of each cluster occurring in a history of prior words; and
  
  (iii) providing the first probability by weighting the second probability by the third probability.
- View Dependent Claims (20, 21)
- - 20. A computer-readable medium as in claim 19 wherein instruction (a) further includes an instruction for:
    - (a)(i) tabulating the number of times each word occurs in a set of N training documents using a word-document matrix, wherein entries from the matrix form the vectors, which have dimension N.
  - 21. A computer-readable medium as in claim 20 wherein instruction (c)(i) further includes an instruction for:
    - (c)(i)(1) calculating a distance between the vector representing the first word and each of the clusters.

28. A method for deriving a large-span semantic language model for a large vocabulary recognition system, the method comprising the steps of:
- (a) tabulating the number of times each word occurs in a set of N training documents using a word-document matrix, wherein entries from the matrix form word vectors of dimension N;
  
  (b) reducing the vectors to a dimension R, where R is significantly less than N;
  
  (c) clustering the vectors into a set of clusters, where each cluster represents a semantic event;
  
  (d) calculating a distance between a vector representing a first word and each of the clusters to provide a probability that the vector representing the first word belongs to each of the clusters, the probability that the vector representing the first word belongs to each of the clusters capable of being independent of a location of the first word in a sentence;
  
  (e) calculating how frequently each cluster occurs in a history of prior words to provide a probability of each cluster occurring in the history of prior words; and
  
  (f) computing a probability that the vector representing the first word will occur given the history of prior words by weighting the probability that the vector representing the first word belongs to each of the clusters by the probability of each cluster occurring in the history of prior words.
- View Dependent Claims (29, 30, 31, 32)
- - 29. A method as in claim 28 wherein the dimension of the vectors is reduced by performing a singular value decomposition of the word-document matrix.
  - 30. A method as in claim 29 wherein the singular value decomposition is performed on the word-document matrix K by calculating:
    - space="preserve" listing-type="equation">K.sub.≈
      
      K=USV.sup.T
      where U is a (M×
      
      R) matrix of left singular vectors, S is a (R×
      
      R) diagonal matrix of singular values, and V is a (N×
      
      R) matrix of right singular vectors.
  - 31. A method as in claim 30 wherein the distance between the vector representing the first word (u_i) and a vector representing a second word (u_j) is calculated by ##EQU15##
  - 32. A method as in claim 31 wherein the first probability that the first word (w_i) will occur given a history of prior words (H_i-1) is calculated by ##EQU16##

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Bellegarda, Jerome R., Chow, Yen-Lu
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/643,521
Time in Patent Office

904 Days
Field of Search

395/2.6-2.66, 395/2.54
US Class Current

704/240
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/274   Syntactic or semantic conte...

G10L 15/1815   Semantic context, e.g. disa...

Method and system for deriving a large-span semantic language model for large-vocabulary recognition systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for deriving a large-span semantic language model for large-vocabulary recognition systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links