SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR

US 20100057457A1
Filed: 11/30/2007
Published: 03/04/2010
Est. Priority Date: 11/30/2006
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system comprising:

a speech recognition section that converts speech data into text data by using a speech recognition dictionary containing a large volume of word pronunciation data each constituted by a combination of a word and one or more corresponding pronunciations, each pronunciation including one or more phonemes, and that has a function of adding to the text data a start time and a finish time of a word segment in the speech data corresponding to each word included in text data;

a word correcting section that presents competitive candidates for each word in the text data acquired from the speech recognition section, allows each word to be corrected by selecting a correct word from among the competitive candidates for correction if the correct word is included in the competitive candidates, or by manually inputting a correct word if no correct word is included in the competitive candidates;

a phoneme sequence converting section that recognizes the speech data in units of phoneme, converts the recognized speech data into a phoneme sequence composed of a plurality of phonemes, and that has a function of adding to the phoneme sequence a start time and a finish time of each phoneme unit in the speech data corresponding to each phoneme included in the phoneme sequence;

a phoneme sequence extracting section that extracts from the phoneme sequence a phoneme sequence portion composed of one or more phonemes existing in a segment corresponding to a period of the start time and finish time of the word segment of a word corrected by the word correcting section;

a pronunciation determining section that determines the phoneme sequence portion as the pronunciation of the word corrected by the word correcting section; and

an additional registration section that combines the corrected word with the pronunciation determined by the pronunciation determining section as new word pronunciation data and additionally registers the new word pronunciation data in the speech recognition dictionary if it is determined that the corrected word has not been registered in the speech recognition dictionary, or additionally registers the pronunciation determined by the pronunciation determining section in the speech recognition dictionary as another pronunciation of the corrected word if it is determined that the corrected word is a registered word that has already been registered in the speech recognition dictionary.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word. An additional registration section 17 combines the corrected word with the pronunciation determined by a pronunciation determining section 16 and additionally registers the combination as new word pronunciation data in the speech recognition dictionary 5 if it is determined that a word obtained after correction has not been registered in the speech recognition dictionary 5. The additional registration section 17 additionally registers the pronunciation determined by the pronunciation determining section 16 as another pronunciation of the corrected word if it is determined that the corrected word has been registered.

Citations

13 Claims

1. A speech recognition system comprising:
- a speech recognition section that converts speech data into text data by using a speech recognition dictionary containing a large volume of word pronunciation data each constituted by a combination of a word and one or more corresponding pronunciations, each pronunciation including one or more phonemes, and that has a function of adding to the text data a start time and a finish time of a word segment in the speech data corresponding to each word included in text data;
  
  a word correcting section that presents competitive candidates for each word in the text data acquired from the speech recognition section, allows each word to be corrected by selecting a correct word from among the competitive candidates for correction if the correct word is included in the competitive candidates, or by manually inputting a correct word if no correct word is included in the competitive candidates;
  
  a phoneme sequence converting section that recognizes the speech data in units of phoneme, converts the recognized speech data into a phoneme sequence composed of a plurality of phonemes, and that has a function of adding to the phoneme sequence a start time and a finish time of each phoneme unit in the speech data corresponding to each phoneme included in the phoneme sequence;
  
  a phoneme sequence extracting section that extracts from the phoneme sequence a phoneme sequence portion composed of one or more phonemes existing in a segment corresponding to a period of the start time and finish time of the word segment of a word corrected by the word correcting section;
  
  a pronunciation determining section that determines the phoneme sequence portion as the pronunciation of the word corrected by the word correcting section; and
  
  an additional registration section that combines the corrected word with the pronunciation determined by the pronunciation determining section as new word pronunciation data and additionally registers the new word pronunciation data in the speech recognition dictionary if it is determined that the corrected word has not been registered in the speech recognition dictionary, or additionally registers the pronunciation determined by the pronunciation determining section in the speech recognition dictionary as another pronunciation of the corrected word if it is determined that the corrected word is a registered word that has already been registered in the speech recognition dictionary.
- View Dependent Claims (2, 3, 4, 5, 10, 11)
- - 2. The speech recognition system according to claim 1, whereinthe speech recognition section performs speech recognition once again for speech data corresponding to an uncorrected portion of the text data that has not been subjected to correction when the additional registration section makes a new additional registration.
  - 3. The speech recognition system according to claim 1, further comprising:
    - a speaker recognition section that identifies the speaker type based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the speaker type identified by the speaker recognition section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the speaker type as a speech recognition dictionary to be used in the speech recognition section.
  - 4. The speech recognition system according to claim 1, further comprising:
    - a topic field identifying section that identifies a topic field of spoken content based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the topic field identified by the topic field identifying section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the plurality of topic fields as a speech recognition dictionary to be used in the speech recognition section.
  - 5. The speech recognition system according to claim 1, whereinthe phoneme sequence converting section is a phonemic typewriter.
  - 10. The speech recognition system according to claim 2, further comprising:
    - a speaker recognition section that identifies the speaker type based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the speaker type identified by the speaker recognition section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the speaker type as a speech recognition dictionary to be used in the speech recognition section.
  - 11. The speech recognition system according to claim 2, further comprising:
    - a topic field identifying section that identifies a topic field of spoken content based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the topic field identified by the topic field identifying section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the plurality of topic fields as a speech recognition dictionary to be used in the speech recognition section.

6. A program for speech recognition system that is stored in a computer-readable recording medium, the program causing a computer to function as:
- a speech recognition section that converts speech data into text data by using a speech recognition dictionary containing a large volume of word pronunciation data each constituted by a combination of a word and one or more corresponding pronunciations, each pronunciation including one or more phonemes, and that has a function of adding to the text data a start time and a finish time of a word segment in the speech data corresponding to each word included in text data;
  
  a word correcting section that presents competitive candidates for each word in the text data acquired from the speech recognition section, allows each word to be corrected by selecting a correct word from among the competitive candidates for correction if the correct word is included in the competitive candidates, or by manually inputting a correct word if no correct word is included in the competitive candidates;
  
  a phoneme sequence converting section that recognizes the speech data in units of phoneme, converts the recognized speech data into a phoneme sequence composed of a plurality of phonemes, and that has a function of adding to the phoneme sequence a start time and a finish time of each phoneme unit in the speech data corresponding to each phoneme included in the phoneme sequence;
  
  a phoneme sequence extracting section that extracts from the phoneme sequence a phoneme sequence portion composed of one or more phonemes existing in a segment corresponding to a period of the start time and finish time of the word segment of a word corrected by the word correcting section;
  
  a pronunciation determining section that determines the phoneme sequence portion as the pronunciation of the word corrected by the word correcting section; and
  
  an additional registration section that combines the corrected word with the pronunciation determined by the pronunciation determining section as new word pronunciation data and additionally registers the new word pronunciation data in the speech recognition dictionary if it is determined that the corrected word has not been registered in the speech recognition dictionary, or additionally registers the pronunciation determined by the pronunciation determining section in the speech recognition dictionary as another pronunciation of the corrected word if it is determined that the corrected word is a registered word that has already been registered in the speech recognition dictionary.
- View Dependent Claims (7, 8, 9, 12, 13)
- - 7. The program for speech recognition system according to claim 6, whereinthe speech recognition section performs speech recognition once again for speech data corresponding to an uncorrected portion of the text data that has not been subjected to correction when the additional registration section makes a new additional registration.
  - 8. The program for speech recognition system according to claim 6, further causing the computer to function as:
    - a speaker recognition section that identifies the speaker type based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the speaker type identified by the speaker recognition section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the speaker type as a speech recognition dictionary to be used in the speech recognition section.
  - 9. The program for speech recognition system according to claim 6, further causing the computer to function as:
    - a topic field identifying section that identifies a topic field of spoken content based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the topic field identified by the topic field identifying section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the plurality of topic fields as a speech recognition dictionary to be used in the speech recognition section.
  - 12. The program for speech recognition system according to claim 7, further causing the computer to function as:
    - a speaker recognition section that identifies the speaker type based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the speaker type identified by the speaker recognition section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the speaker type as a speech recognition dictionary to be used in the speech recognition section.
  - 13. The program for speech recognition system according to claim 7, further causing the computer to function as:
    - a topic field identifying section that identifies a topic field of spoken content based on the speech data; and
      
      a dictionary selecting section that selects a speech recognition dictionary corresponding to the topic field identified by the topic field identifying section from among a plurality of speech recognition dictionaries that have been previously prepared corresponding to the plurality of topic fields as a speech recognition dictionary to be used in the speech recognition section.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Institute of Advanced Industrial Science and Technology (Government of Japan)
Original Assignee
National Institute of Advanced Industrial Science and Technology (Government of Japan)
Inventors
Ogata, Jun, Goto, Masataka

Granted Patent

US 8,401,847 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 15/065 Adaptation

SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links