Voice recognition system and voice processing system

US 20070050190A1
Filed: 01/04/2006
Published: 03/01/2007
Est. Priority Date: 08/24/2005
Status: Active Grant

First Claim

Patent Images

1. A voice recognition system comprising:

a signal processing unit for converting inputted speech voice data into a feature;

an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance;

a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance;

a voice section detecting unit for detecting voice sections in the speech voice data according to a predetermined voice section criterion;

a priority determining unit for selecting a voice section to be given priority from among the voice sections detected by the voice section detecting unit according to a predetermined priority criterion;

a decoder for calculating a degree of matching with the recognition vocabulary using the feature of the voice section selected by the priority determining unit and the acoustic model; and

a result output unit for outputting a word sequence having the best score in the matching by the decoder as a recognition result;

wherein the priority determining unit uses as the predetermined priority criterion at least one selected from the group consisting of (1) a length of the voice section, (2) a power or an S/N ratio of the voice section, and (3) a chronological order of the voice section.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice recognition system and a voice processing system in which a self-repair utterance can be inputted and recognized accurately as in a conversation between humans in the case where a user makes the self-repair utterance are provided. An signal processing unit for converting speech voice data into a feature, a voice section detecting unit for detecting voice sections in the speech voice data, a priority determining unit for selecting a voice section to be given priority from among the voice sections detected by the voice section detecting unit according to a predetermined priority criterion, and a decoder for calculating a degree of matching with a recognition vocabulary using the feature of the voice section selected by the priority determining unit and an acoustic model are included. The priority determining unit uses as the predetermined priority criterion at least one selected from the group consisting of (1) a length of the voice section, (2) a power or an S/N ratio of the voice section, and (3) a chronological order of the voice section.

Citations

17 Claims

1. A voice recognition system comprising:
- a signal processing unit for converting inputted speech voice data into a feature;
  
  an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance;
  
  a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance;
  
  a voice section detecting unit for detecting voice sections in the speech voice data according to a predetermined voice section criterion;
  
  a priority determining unit for selecting a voice section to be given priority from among the voice sections detected by the voice section detecting unit according to a predetermined priority criterion;
  
  a decoder for calculating a degree of matching with the recognition vocabulary using the feature of the voice section selected by the priority determining unit and the acoustic model; and
  
  a result output unit for outputting a word sequence having the best score in the matching by the decoder as a recognition result;
  
  wherein the priority determining unit uses as the predetermined priority criterion at least one selected from the group consisting of (1) a length of the voice section, (2) a power or an S/N ratio of the voice section, and (3) a chronological order of the voice section.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The voice recognition system according to claim 1, wherein (1) the length of the voice section is used as the predetermined priority criterion, and the priority determining unit selects a predetermined number of voice sections in decreasing order of their length.
  - 7. The voice recognition system according to claim 1, wherein (1) the length of the voice section is used as the predetermined priority criterion, and the priority determining unit selects a predetermined number of voice sections in decreasing order of their length'"'"'s proximity to a predetermined utterance length.
  - 8. The voice recognition system according to claim 1, wherein (1) the length of the voice section is used as the predetermined priority criterion, and the priority determining unit selects a predetermined number of voice sections in decreasing order of their length under a condition that a sum of the lengths of the voice sections does not exceed a predetermined time period.
  - 9. The voice recognition system according to claim 1, wherein (2) the power or the S/N ratio of the voice section is used as the predetermined priority criterion, and the priority determining unit selects a predetermined number of voice sections in decreasing order of their power or S/N ratio.
  - 10. The voice recognition system according to claim 1, wherein (3) the chronological order of the voice section is used as the predetermined priority criterion, and the priority determining unit selects a predetermined number of voice sections from a last voice section on the time series from among a plurality of the voice sections contained in the speech voice data.
  - 11. The voice recognition system according to claim 1, wherein (3) the chronological order of the voice section is used as the predetermined priority criterion, and when a grammar used in the decoder is of M-level hierarchical structure (M is a natural number), the priority determining unit selects M voice sections from a last voice section on the time series from among a plurality of voice sections contained in an inputted voice.
  - 12. The voice recognition system according to claim 1, wherein (1) the length of the voice section and (3) the chronological order of the voice section are used in combination as the predetermined priority criterion, and the priority determining unit selects a plurality of voice sections that are continuous on the time series in reverse chronological order from a last voice section from among a plurality of voice sections contained in an inputted voice so that a sum of lengths of the selected voice sections falls within a predetermined range.

2. A voice recognition system comprising:
- a signal processing unit for converting inputted speech voice data into a feature;
  
  an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance;
  
  a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance;
  
  a decoder for calculating a degree of matching with the recognition vocabulary using the feature and the acoustic model;
  
  a voice section detecting unit for detecting sections corresponding to a word detected by the decoder to be voice sections;
  
  a priority determining unit for selecting a voice section containing a recognition vocabulary to be used preferentially as a recognition result from among the voice sections detected by the voice section detecting unit according to a predetermined priority criterion; and
  
  a result output unit for outputting a recognition word sequence having the best score in the matching by the decoder as the recognition result;
  
  wherein the priority determining unit uses as the predetermined priority criterion at least one selected from the group consisting of (1) a chronological order with respect to a voice section in which a pre-registered specific vocabulary is detected by the decoder, (2) a chronological order with respect to a voice section in which a pre-registered long vowel is detected by the decoder, and (3) a chronological order with respect to a voice section in which an amount of change in the feature obtained by the signal processing unit continues within a predetermined range.
- View Dependent Claims (3)
- - 3. The voice recognition system according to claim 2, wherein the priority determining unit also uses as the predetermined priority criterion at least one selected from the group consisting of (4) a chronological order with respect to a voice section in which the degree of matching calculated by the decoder is lower than a predetermined threshold value, and (5) the degree of matching calculated by the decoder.

4. A voice processing system comprising:
- a voice recognition unit for recognizing a speech vocabulary sequence from inputted speech voice data; and
  
  a voice input unit for performing an input from a user using a recognition result of the speech voice data generated by the voice recognition unit;
  
  wherein the voice recognition unit comprises a signal processing unit for converting the speech voice data into a feature, an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance, a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance, a voice cut-out unit for detecting speech sections in the speech voice data according to a predetermined speech section criterion, a decoder for matching the feature and the acoustic model and calculating a degree of matching between the result of matching and the recognition vocabulary so as to determine a recognition result candidate based on the calculated degree of matching and generate positional information indicating a position of the recognition result candidate within the speech section, and a result output unit for outputting the recognition result candidate determined by the decoder and the positional information to the voice input unit, and the voice input unit comprises a specific vocabulary dictionary storing unit in which information of a specific vocabulary is stored in advance, a specific vocabulary determining unit for determining whether or not the recognition result candidate corresponds to the specific vocabulary by referring to the specific vocabulary dictionary storing unit, and a recognition result selecting unit for selecting a recognition result candidate to be adopted as the recognition result based on the positional information using as a criterion a chronological order with respect to the recognition result candidate corresponding to the specific vocabulary.

5. A voice processing system comprising:
- a voice recognition unit for recognizing a speech vocabulary sequence from inputted speech voice data; and
  
  a voice input unit for performing an input from a user using a recognition result of the speech voice data generated by the voice recognition unit;
  
  wherein the voice recognition unit comprises a signal processing unit for converting the speech voice data into a feature, an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance, a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance, a voice cut-out unit for detecting speech sections in the speech voice data, a decoder for matching the feature and the acoustic model and calculating a degree of matching between the result of matching and the recognition vocabulary so as to determine a recognition result candidate based on the calculated degree of matching and generate positional information indicating a position of the recognition result candidate within the speech section, and a result output unit for outputting the recognition result candidate determined by the decoder and the positional information to the voice input unit, and the voice input unit comprises a speech speed calculating unit for calculating a speech speed of the recognition result candidate based on the positional information, and a recognition result selecting unit for selecting a recognition result candidate to be adopted as the recognition result using the speech speed as a criterion.

13. A recording medium storing a program allowing a computer to execute a signal processing operation of converting inputted speech voice data into a feature;
- a voice section detecting operation of detecting voice sections in the speech voice data according to a predetermined voice section criterion;
  
  a priority determining operation of selecting a voice section to be given priority from among the voice sections detected in the voice section detecting operation according to a predetermined priority criterion;
  
  a matching operation of referring to an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance and a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance and using the feature of the voice section selected in the priority determining operation and the acoustic model, thus calculating a degree of matching with the recognition vocabulary; and
  
  a result output operation of outputting a word sequence having the best score in the matching operation as a recognition result;
  
  wherein in the priority determining operation, the program uses as the predetermined priority criterion at least one selected from the group consisting of (1) a length of the voice section, (2) a power or an S/N ratio of the voice section, and (3) a chronological order of the voice section.

14. A recording medium storing a program allowing a computer to execute a signal processing operation of converting inputted speech voice data into a feature;
- a matching operation of referring to an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance and a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance and using the feature and the acoustic model, thus calculating a degree of matching with the recognition vocabulary;
  
  a voice section detecting operation of detecting voice sections from the speech voice data based on the degree of matching calculated in the matching operation;
  
  a priority determining operation of selecting a voice section containing a recognition vocabulary to be used preferentially as a recognition result from among the voice sections detected in the voice section detecting operation according to a predetermined priority criterion; and
  
  a result output operation of outputting a word sequence having the best score in the matching operation as the recognition result;
  
  wherein in the priority determining operation, at least one selected from the group consisting of (1) a chronological order with respect to a voice section in which a pre-registered specific vocabulary is detected in the matching operation, (2) a chronological order with respect to a voice section in which a pre-registered long vowel is detected in the matching operation, and (3) a chronological order with respect to a voice section in which an amount of change in the feature obtained in the signal processing operation continues within a predetermined range is used as the predetermined priority criterion.
- View Dependent Claims (15)
- - 15. The recording medium according to claim 14, wherein the priority determining operation also uses as the predetermined priority criterion at least one selected from the group consisting of (4) a chronological order with respect to a voice section in which the degree of matching calculated in the matching operation is lower than a predetermined threshold value, and (5) the degree of matching calculated in the matching operation.

16. A recording medium storing a program allowing a computer to realize a function of a voice input unit for performing an input from a user using a recognition result generated by a voice recognition unit for recognizing a speech vocabulary sequence from inputted speech voice data, wherein the voice recognition unit comprises a signal processing unit for converting the speech voice data into a feature, an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance, a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance, a voice cutout unit for detecting speech sections in the speech voice data according to a predetermined speech section criterion, a decoder for matching the feature and the acoustic model and calculating a degree of matching between the result of matching and the recognition vocabulary so as to determine a recognition result candidate based on the calculated degree of matching and generate positional information indicating a position of the recognition result candidate within the speech section, and a result output unit for outputting the recognition result candidate determined by the decoder and the positional information as the recognition result, and the program allows a computer to execute a specific vocabulary determining operation of determining whether or not the recognition result candidate corresponds to a specific vocabulary by referring to a specific vocabulary dictionary storing unit in which information of the specific vocabulary is stored in advance, and a recognition result selecting operation of selecting a recognition result candidate to be adopted as the recognition result based on the positional information using as a criterion a chronological order with respect to the recognition result candidate corresponding to the specific vocabulary.

17. A recording medium storing a program allowing a computer to realize a function of a voice input unit for performing an input from a user using a recognition result generated by a voice recognition unit for recognizing a speech vocabulary sequence from inputted speech voice data, wherein the voice recognition unit comprises a signal processing unit for converting the speech voice data into a feature, an acoustic model storing unit in which an acoustic model obtained by modeling what kind of feature a voice tends to become is stored in advance, a vocabulary dictionary storing unit in which information of a recognition vocabulary is stored in advance, a voice cut-out unit for detecting speech sections in the speech voice data according to a predetermined speech section criterion, a decoder for matching the feature and the acoustic model and calculating a degree of matching between the result of matching and the recognition vocabulary so as to determine a recognition result candidate based on the calculated degree of matching and generate positional information indicating a position of the recognition result candidate within the speech section, and a result output unit for outputting the recognition result candidate determined by the decoder and the positional information as the recognition result, and the program allows a computer to execute a speech speed calculating operation of calculating a speech speed of the recognition result candidate based on the positional information, and a recognition result selecting operation of selecting a recognition result candidate to be adopted as the recognition result using the speech speed as a criterion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Harada, Shouji, Washio, Nobuyuki

Granted Patent

US 7,672,846 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/249
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/088 Word spotting

Voice recognition system and voice processing system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Voice recognition system and voice processing system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links