Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model

US 5,839,106 A
Filed: 12/17/1996
Issued: 11/17/1998
Est. Priority Date: 12/17/1996
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system, comprising:

a pre-processor receiving an acoustic signal and processing the acoustic signal to produce an acoustic feature sequence; and

a recognition processor receiving the acoustic feature sequence and processing the acoustic feature sequence using a multiple-span stochastic language model to form a linguistic message, wherein the multiple-span stochastic language model includes a local span providing an immediate word context and a large span providing a global word context.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for performing large-vocabulary speech recognition employing an integrated syntactic and semantic statistical language model. In an exemplary embodiment, a stochastic language model is developed using a hybrid paradigm in which latent semantic analysis is combined with, and subordinated to, a conventional n-gram paradigm. The hybrid paradigm provides an estimate of the likelihood that a particular word, chosen from an underlying vocabulary will occur given a prevailing contextual history. The estimate is computed as a conditional probability that a word will occur given an "integrated" history combining an n-word, syntactic-type history with a semantic-type history based on a much larger contextual framework. Thus, the exemplary embodiment seamlessly blends local language structures with global usage patterns to provide, in a single language model, the proficiency of a short-horizon, syntactic model with the large-span effectiveness of semantic analysis.

Citations

45 Claims

1. A speech recognition system, comprising:
- a pre-processor receiving an acoustic signal and processing the acoustic signal to produce an acoustic feature sequence; and
  
  a recognition processor receiving the acoustic feature sequence and processing the acoustic feature sequence using a multiple-span stochastic language model to form a linguistic message, wherein the multiple-span stochastic language model includes a local span providing an immediate word context and a large span providing a global word context.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The speech recognition system of claim 1, wherein said recognition processor uses a first single-span language model to produce candidate linguistic messages and a second single-span language model to prioritize the candidate linguistic messages.
  - 3. The speech recognition system of claim 2, wherein one of said first and second single-span language models comprises a priori probabilities computed using an n-gram paradigm and another of said first and second language models comprises a priori probabilities computed using a latent semantic paradigm.
  - 4. The speech recognition system of claim 2, wherein said first single-span language model incorporates local language constraints and said second single-span language model incorporates global language constraints.
  - 5. The speech recognition system of claim 2 wherein said first single-span language model incorporates global language constraints and said second single-span language model incorporates local language constraints.
  - 6. The speech recognition system of claim 1, wherein said recognition processor uses first and second single-span language models, incorporating local and global language constraints, respectively, to produce first and second prioritized groups of candidate linguistic messages.
  - 7. The speech recognition system of claim 6, wherein one of said first and second single-span language models comprises a priori probabilities computed using an n-gram paradigm and another of said first and second language models comprises a priori probabilities computed using a latent semantic paradigm.
  - 8. The speech recognition system of claim 6, wherein the first and second prioritized groups are combined to form a single prioritized group of candidate linguistic messages.
  - 9. The speech recognition system of claim 8, wherein the first and second prioritized groups are weighted prior to being combined.
  - 10. The speech recognition system of claim 1, wherein said speech recognition processor uses an integrated multiple-span language model to produce a prioritized group of candidate linguistic messages.
  - 11. The speech recognition system of claim 10, wherein said integrated multiple-span language model comprises a priori probabilities computed using an n-gram paradigm and a priori probabilities computed using a latent semantic paradigm.
  - 12. The speech recognition system of claim 10, wherein said recognition processor computes, in producing the prioritized group of candidate messages, a conditional probability for a particular word in a system vocabulary given a hybrid contextual history including a global context and a local context.
  - 13. The speech recognition system of claim 10, wherein said recognition processor computes, in producing the prioritized group of candidate messages, a conditional probability for a particular word w_q by dividing (a) a product of a probability of the particular word w_q, given a global word sequence d_q, and a probability of the particular word w_q, given a local word sequence w_q-1 w_q-2 . . . w_q-n+1, by (b) a summation which includes, for each of a plurality of individual words w_i in a system vocabulary V, a product of a probability of the individual word w_i, given the global word sequence d_q, and a probability of the individual word w_i, given the local word sequence w_q-1 w_q-2 . . . w_q-n+1.
  - 14. The speech recognition system of claim 10, wherein said recognition processor computes, in producing the prioritized group of candidate messages, a conditional probability Pr(w_q |H_q.sup.(h)) for a particular word w_q, drawn from a vocabulary V containing a plurality of words w_i, given a hybrid contextual history H_q.sup.(h) including a global word sequence d_q and a local word sequence w_q-1 w_q-2 . . . w_q-n+1, as:
    - ##EQU5##
  - 15. The speech recognition system of claim 11, wherein said recognition processor computes said a priori probabilities during a training process.
  - 16. The speech recognition system of claim 11, wherein said a priori probabilities are provided by said integrated language model prior to recognition processing by said recognition processor.
  - 17. The speech recognition system of claim 12, wherein said global context comprises at least one training document.
  - 18. The speech recognition system of claim 12, wherein said global context comprises a linguistic message previously formed by the recognition processor.

19. A method for performing speech recognition, comprising the steps of:
- receiving an acoustic signal;
  
  processing the acoustic signal to form an acoustic feature sequence; and
  
  processing the acoustic feature sequence using a multiple-span stochastic language model to form a linguistic message, wherein the multiple-span stochastic language model includes a local span providing an immediate word context and a large span providing a global word context.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 20. The method of claim 19, comprising the additional step of storing the linguistic message.
  - 21. The method of claim 19, wherein said step of processing the acoustic feature sequence comprises the steps of using a first single-span language model to produce candidate linguistic messages and then using a second single-span language model to prioritize the candidate linguistic messages.
  - 22. The method of claim 21, wherein one of said first and second single-span language models comprises a priori probabilities computed using an n-gram paradigm and another of said first and second language models comprises a priori probabilities computed using a latent semantic paradigm.
  - 23. The method of claim 21, wherein the first single-span language model incorporates local language constraints and said second single-span language model incorporates global language constraints.
  - 24. The method of claim 21, wherein the first single-span language model incorporates global language constraints and said second single-span language model incorporates local language constraints.
  - 25. The method of claim 19, wherein said step of processing the acoustic feature sequence comprises the step of using first and second single-span language models, incorporating local and global language constraints, respectively, to produce first and second prioritized groups of candidate linguistic messages.
  - 26. The method of claim 25, wherein one of said first and second single-span language models comprises a priori probabilities computed using an n-gram paradigm and another of said first and second language models comprises a priori probabilities computed using a latent semantic paradigm.
  - 27. The method of claim 25, comprising the additional step of combining the first and second prioritized groups to form a single prioritized group of candidate linguistic messages.
  - 28. The method of claim 27, comprising the additional step of weighting the first and second prioritized groups prior to said step of combining.
  - 29. The method of claim 19, wherein said step of processing the acoustic feature sequence comprises the step of using an integrated multiple-span language model to produce a prioritized group of candidate linguistic messages.
  - 30. The method of claim 29, wherein said step of using an integrated multiple-span language model comprises the step of subordinating a first single-span language model to a second single-span language model.
  - 31. The method of claim 30, wherein said first single-span language model comprises local language constraints and said second single-span language model comprises global language constraints.
  - 32. The method of claim 30, wherein said first single-span language model comprises global language constraints and said second single-span language model comprises local language constraints.
  - 33. The method of claim 29, wherein said integrated multiple-span language model comprises a priori probabilities computed using an n-gram paradigm and a priori probabilities computed using a latent semantic paradigm.
  - 34. The method of claim 29, wherein said step of using an integrated multiple-span language model comprises the step of computing a conditional probability for a particular word in a system vocabulary given a hybrid contextual history including a global context and a local context.
  - 35. The method of claim 29, wherein said step of using an integrated multiple-span language model comprises the step of computing a conditional probability for a particular word w_q by dividing (a) a product of a probability of the particular word w_q, given a global word sequence d_q, and a probability of the particular word w_q, given a local word sequence w_q-1 w_q-2 . . . w_q-n+1, by (b) a summation which includes, for each of a plurality of individual words w_i in a system vocabulary V, a product of a probability of the individual word w_i, given the global word sequence d_q, and a probability of the individual word w_i, given the local word sequence w_q-1 w_q-2 . . . w_q-n+1.
  - 36. The method of claim 29, wherein said step of using an integrated multiple-span language model comprises the step of computing a probability Pr(w_q |H_q.sup.(h)) for a particular word w_q, drawn from a vocabulary V containing a plurality of words w_i, given a hybrid contextual history H_q.sup.(h) including a global word sequence d_q and a local word sequence w_q-1 w_q-2 . . . w_q-n+1, as:
    - ##EQU6##

37. A method for deriving a multiple-span stochastic language model, comprising the steps of:
- reading at least one training document;
  
  using an n-gram paradigm to compute a set of local a priori probabilities based on said at least one training document;
  
  using a latent semantic paradigm to compute a set of global a priori probabilities based on said at least one training document; and
  
  combining the local a priori probabilities and the global a priori probabilities to provide the multiple-span stochastic language model.
- View Dependent Claims (38, 39)
- - 38. The method of claim 37, comprising the additional step of storing the local and global a priori probabilities.
  - 39. The method of claim 37, wherein the local and global a priori probabilities are combined to form a set of hybrid a priori probabilities.

40. A computer readable medium containing program instructions for deriving a multiple-span stochastic language model, the program instructions including instructions for:
- reading at least one training document;
  
  computing a set of local a priori probabilities, based on said at least one training document, using an n-gram paradigm;
  
  computing a set of global a priori probabilities, based on said at least one training document, using a latent semantic paradigm; and
  
  combining the local a priori probabilities and the global a priori probabilities to provide the multiple-span stochastic language model.
- View Dependent Claims (41, 42)
- - 41. The method of claim 40, comprising the additional step of storing the local and global a priori probabilities.
  - 42. The method of claim 40, wherein the local and global a priori probabilities are combined to form a set of hybrid a priori probabilities.

43. A system for deriving a multiple-span stochastic language model, comprising:
- at least one training document; and
  
  a processor receiving the at least one training document as input and (a) counting occurrences of local and global word sequences in said at least one training document, (b) computing a set of local a priori probabilities using an n-gram paradigm based on relative frequency counts of local word sequences in said at least one training document, (c) computing a set of global a priori probabilities using a latent semantic paradigm based on relative frequency counts of global word sequences in said at least one training document, and (d) combining the local a priori probabilities and the global a priori probabilities to provide the multiple-span stochastic language model.
- View Dependent Claims (44, 45)
- - 44. The system of claim 43, comprising means for storing the a priori probabilities.
  - 45. The system of claim 43, wherein said processor combines the local and global a priori probabilities to form a set of hybrid a priori probabilities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Bellegarda, Jerome R.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
SAX, ROBERT L

Application Number

US08/768,122
Time in Patent Office

700 Days
Field of Search

704/243, 704/255, 704/257
US Class Current

704/257
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/197 Probabilistic grammars, e.g...

Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

45 Claims

Specification

Solutions

Use Cases

Quick Links

Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

45 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links