Method for automatically punctuating a speech utterance in a continuous speech recognition system

US 6,067,514 A
Filed: 06/23/1998
Issued: 05/23/2000
Est. Priority Date: 06/23/1998
Status: Expired due to Fees

First Claim

Patent Images

1. In a speech recognition system that outputs a sequence of decoded words, a method for automatically punctuating the sequence of decoded words comprises the steps of:

in a vocabulary that defines items in a language model, the items including words and punctuation marks, assigning at least one baseform to each of the punctuation marks, the at least one baseform corresponding to at least one of silence and a non-word noise;

in the language model, defining conditional probabilities for each of the punctuation marks based upon at least one of at least one preceding word and at least one succeeding word; and

automatically inserting a subject punctuation mark at a given point in the sequence of decoded words when an acoustic score and a language model score associated with the subject punctuation mark produce a higher combined likelihood than the acoustic score and the language model score associated with any other item in the vocabulary for the given point in the sequence of decoded words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speech recognition system which recognizes a spoken utterance consisting of a sequence of spoken words and, in response, outputs a sequence of decoded words, a method for automatically punctuating the sequence of decoded words is provided. In a vocabulary of items including words, silences, and punctuation marks, assigning at least one baseform to each punctuation mark corresponding to one of silence and a non-word noise. Additionally, the method includes the step of automatically inserting a subject punctuation mark at a given point in the sequence of decoded words when an acoustic score and a language model score associated with the subject punctuation mark produce a higher combined likelihood than the acoustic score and the language model score associated with any other item in the vocabulary for the given point in the sequence of decoded words.

Citations

29 Claims

1. In a speech recognition system that outputs a sequence of decoded words, a method for automatically punctuating the sequence of decoded words comprises the steps of:
- in a vocabulary that defines items in a language model, the items including words and punctuation marks, assigning at least one baseform to each of the punctuation marks, the at least one baseform corresponding to at least one of silence and a non-word noise;
  
  in the language model, defining conditional probabilities for each of the punctuation marks based upon at least one of at least one preceding word and at least one succeeding word; and
  
  automatically inserting a subject punctuation mark at a given point in the sequence of decoded words when an acoustic score and a language model score associated with the subject punctuation mark produce a higher combined likelihood than the acoustic score and the language model score associated with any other item in the vocabulary for the given point in the sequence of decoded words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the baseform corresponding to the silence is represented by at least one silence phone.
  - 3. The method of claim 1, wherein the baseform corresponding to the non-word noise is represented by one of at least one consonant phone preceded and followed by the silence phone, at least one vowel phone preceded and followed by the silence phone, and at least one consonant phone and at least one vowel phone preceded and followed by the silence phone.
  - 4. The method of claim 1, wherein the language model is an n-gram language model, and said method further comprises the step of:
    - in a training phase of the n-gram language model, segmenting a corresponding text corpus into paragraphs to obtain word frequency counts that include each of the punctuation marks in a position other than a last position.
  - 5. The method of claim 4, wherein each paragraph includes about 200 to 500 words.
  - 6. The method of claim 4, wherein the n-gram language model is one of a bigram language model and a trigram language model.
  - 7. The method of claim 1, wherein the language model is an n-gram language model, and said method further comprises the step of:
    - in a training phase of the n-gram language model, segmenting a corresponding text corpus into paragraphs to obtain the conditional probabilities for the punctuation marks.
  - 8. The method of claim 7, wherein each paragraph includes about 200 to 500 words.
  - 9. The method of claim 7, wherein the n-gram language model is one of a bigram language model and a trigram language model.
  - 10. The method of claim 1, wherein at least one punctuation mark is also assigned at least one baseform corresponding to a pronunciation of a word representing the punctuation mark.

11. In a speech recognition system which recognizes a spoken utterance and, in response, outputs a sequence of decoded words, a method for automatically punctuating the sequence of decoded words comprises the steps of:
- assigning at least one baseform to a subject punctuation mark in a vocabulary of words and punctuation marks, the at least one baseform including at least one of a silence phone, two consecutive silence phones, at least one consonant phone preceded and followed by the silence phone, at least one vowel phone preceded and followed by the silence phone, and at least one consonant phone and at least one vowel phone preceded and followed by the silence phone, the vocabulary defining items for which language model scores exist;
  
  defining at least one conditional probability for the subject punctuation mark in a language model based upon at least one of at least one preceding word and at least one succeeding word; and
  
  generating a word match score corresponding to the probability that the subject punctuation mark corresponds to acoustic data generated by said utterance;
  
  generating a language model score corresponding to the probability that the subject punctuation mark corresponds to the acoustic data generated by said utterance; and
  
  automatically inserting the subject punctuation mark at a given position in the sequence of decoded words when the combination of the word match score and the language model score produces the highest combined likelihood over any other word and punctuation mark in the vocabulary for the given position.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11, wherein each word, silence and punctuation mark in the vocabulary is represented by at least one baseform.
  - 13. The method of claim 11, wherein the language model is an n-gram language model, and said method further comprises the step of:
    - in a training phase of the n-gram language model, segmenting a corresponding text corpus into paragraphs to obtain word frequency counts that include the subject punctuation mark in a position other than a last position.
  - 14. The method of claim 13, wherein each paragraph includes about 200 to 500 words.
  - 15. The method of claim 13, wherein the n-gram language model is one of a bigram language model and a trigram language model.
  - 16. The method of claim 11, wherein the language model is an n-gram language model, and said method further comprises the step of:
    - in a training phase of the n-gram language model, segmenting a corresponding text corpus into paragraphs to obtain the at least one conditional probability for the subject punctuation mark.
  - 17. The method of claim 16, wherein each paragraph includes about 200 to 500 words.
  - 18. The method of claim 16, wherein the n-gram language model is one of a bigram language model and a trigram language model.

19. In a speech recognition system which recognizes a spoken utterance consisting of an input sequence of spoken words and outputs a sequence of decoded words, a method for automatically punctuating the decoded words comprises the steps of:
- training an acoustic model;
  
  training an n-gram language model;
  
  building a vocabulary of items, the items including words and punctuation marks, said building step including assigning at least one baseform to each of the punctuation marks in the vocabulary, the at least one baseform corresponding to at least one of silence and a non-word noise, the vocabulary defining the items for which language model scores exist;
  
  defining conditional probabilities for the punctuation marks in the language model based upon at least one of at least one preceding word and at least one succeeding word; and
  
  generating a word match score corresponding to the probability that a subject punctuation mark in the vocabulary corresponds to acoustic data generated by said utterance;
  
  generating a language model score corresponding to the probability that the subject punctuation mark corresponds to acoustic data generated by said utterance; and
  
  automatically inserting the subject punctuation mark at a given point in the sequence of decoded words when the combination of the word match score and the language model score produces the highest combined likelihood over any other item in the vocabulary for the given point.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 20. The method of claim 19, wherein the baseform corresponding to the silence is represented by at least one silence phone.
  - 21. The method of claim 19, wherein the baseform corresponding to the silence is represented by two consecutive silence phones.
  - 22. The method of claim 19, wherein the baseform corresponding to the non-word noise is represented by one of at least one consonant phone preceded and followed by the silence phone, at least one vowel phone preceded and followed by the silence phone, and at least one consonant phone and at least one vowel phone preceded and followed by the silence phone.
  - 23. The method of claim 19, wherein the step of training the n-gram language model further comprises the step of segmenting a training text corpus into a plurality of paragraphs to obtain word frequency counts that include the subject punctuation mark in a position other than a last position.
  - 24. The method of claim 23, wherein each paragraph includes about 200 to 500 words.
  - 25. The method of claim 23, wherein the n-gram language model is one of a bigram language model and a trigram language model.
  - 26. The method of claim 19, wherein the step of training the n-gram language model further comprises the step of segmenting a corresponding text corpus into paragraphs to obtain the conditional probabilities for the punctuation marks.
  - 27. The method of claim 26, wherein each paragraph includes about 200 to 500 words.
  - 28. The method of claim 26, wherein the n-gram language model is one of a bigram language model and a trigram language model.
  - 29. The method of claim 19, wherein at least one punctuation mark is also assigned at least one baseform corresponding to a pronunciation of a word representing the punctuation mark.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chen, Chengjun Julian
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/102,920
Time in Patent Office

700 Days
Field of Search

704/231, 704/251, 704/240, 704/246, 704/255, 704/256, 704/257, 704/270, 704/235, 704/200, 704/239, 704/277
US Class Current

704/235
CPC Class Codes

G06F 40/216 using statistical methods

G10L 15/18 using natural language mode...

Method for automatically punctuating a speech utterance in a continuous speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Method for automatically punctuating a speech utterance in a continuous speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links