Similar word discrimination method and its apparatus
First Claim
1. A similar word discrimination method for discriminating words that may be misrecognized because of their similarity, comprising the steps of:
- receiving voice data of input words;
using a learning voice model to obtain a specified output that shows a level of correctness in response to the voice data of the input words;
processing the output to establish a specified period in which the characteristic components of the input words are included in the output, when the output shows a level of correctness of a predetermined amount or greater;
examining the characteristics of the voice data of said input words during the specified period; and
discriminating between the input words and words that are similar to the input words on the basis of the examination.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided which performs word recognition using the dynamic recurrent neural networks (DRNN) model and which is able to discriminate, with high precision, similar words for which misrecognition often occurs. When the vocal sounds of some words are input, the DRNN output corresponding to the input word vocal data is generated by the word detection signal output component using the DRNN word model and encoded into coded data by using a code book. When the DRNN output from the word detection signal output component has a correctness of a predetermined or greater level, a processor establishes a fixed period that includes the characteristic components of the input words in the DRNN output. The processor then examines the code data in the established fixed period. Discrimination of input words and words that are similar to the input words is accomplished on the basis of the examination results.
-
Citations
33 Claims
-
1. A similar word discrimination method for discriminating words that may be misrecognized because of their similarity, comprising the steps of:
-
receiving voice data of input words; using a learning voice model to obtain a specified output that shows a level of correctness in response to the voice data of the input words; processing the output to establish a specified period in which the characteristic components of the input words are included in the output, when the output shows a level of correctness of a predetermined amount or greater; examining the characteristics of the voice data of said input words during the specified period; and discriminating between the input words and words that are similar to the input words on the basis of the examination.
-
-
2. A similar word discrimination method for discriminating words that may be misrecognized because of their similarity, comprising the steps of:
-
receiving voice data of input words; using a learning dynamic recurrent neural networks (DRNN) voice model to obtain a specified DRNN output showing a level of correctness in response to the voice data of input words; processing the DRNN output to establish a specified period in which the characteristic components of the input words are included in the DRNN output, when the DRNN output shows a level of correctness of a predetermined amount or greater; encoding the input word voice data into code data by using a code book; examining the characteristics of the code data of said input words during the specified period; and discriminating between the input words and words that are similar to the input words on the basis of the examination. - View Dependent Claims (3, 4, 5)
-
-
6. A similar word discrimination method for discriminating words that may be misrecognized because of their similarity, comprising the steps of:
-
successively receiving voice data of input words from the speech of multiple speakers; using a learning dynamic recurrent neural networks (DRNN) voice model to obtain a specified DRNN output showing a level of correctness in response to the voice data of each input word; processing the DRNN output to establish a specified period in which the characteristic components of each input words are included in the DRNN output, when the DRNN output shows a level of correctness of a predetermined amount or greater; encoding each input word voice data into code data by using a code book; creating histogram data, including a code histogram, from the coded data that includes characteristics for the specified period of each input word; accumulating standard histogram data by storing histogram data for each input word; comparing the histogram data of each input word with the standard histogram data; and discriminating between the input words and words that are similar to the input words on the basis of the comparison. - View Dependent Claims (7, 8)
-
-
9. A similar word discrimination method for discriminating words that may be misrecognized because of their similarity, comprising the steps of:
-
receiving voice data of input words; creating a learning dynamic recurrent neural networks (DRNN) sub-voice model, that uses a DRNN voice model, to obtain a specified DRNN output for the characteristic components of respective similar words showing a level of correctness in response to the voice data of input words; processing the DRNN output to establish a specified period in which the characteristic components of the input words are included in the DRNN output, when the DRNN output shows a level of correctness of a predetermined amount or greater; examining the characteristics of the voice data of said input words during the specified period; and discriminating between the input words and words that are similar to the input words on the basis of the examination. - View Dependent Claims (10, 11)
-
-
12. A similar word discrimination apparatus for discriminating words that may be misrecognized because of their similarity having a learning voice model that performs recognition processing to obtain specified output showing a level of correctness in response to voice data of input words, comprising:
-
a word detection signal output means that outputs the level of correctness above a predetermined level by means of the voice model that reacts to the vocal data of the input words, when there is vocal input of some words; and a processing means that, when the word detection signal output means generates an output showing a level of correctness above a predetermined level, establishes a specified period that includes characteristic components of the vocal data of the input words, examines the characteristics of the vocal data of the input words during the specified period, and performs discrimination of the input words and the words that are similar to the input words on the basis of the examination results.
-
-
13. A similar word discrimination apparatus for discriminating words that may be misrecognized because of their similarity having a learning dynamic recurrent neural networks (DRNN) voice model that performs recognition processing to obtain specified output showing a level of correctness in response to voice data of input words, comprising:
-
a word detection signal output means that generates a DRNN output corresponding to the input word vocal data using the DRNN voice model, at the time of the vocal input of some words; and a codification means that codifies the input word vocal data using a code book; and a processing means that, when the word detection signal output means generates a DRNN output showing a level of correctness above a predetermined level, establishes a specified period that includes characteristic components of the vocal data of the input words, examines the data encoded by the codification means during the specified period, and performs discrimination of the input words and words that are similar to the input words on the basis of the examination results. - View Dependent Claims (14, 15, 16)
-
-
17. A similar word discrimination apparatus for discriminating words that may be misrecognized because of their similarity having a learning dynamic recurrent neural networks (DRNN) voice model that performs recognition processing to obtain specified output showing a level of correctness in response to voice data of input words, comprising:
-
a word detection signal output means that generates a DRNN output corresponding to the input word vocal data using the DRNN voice model, at the time of the vocal input of some words; and a codification means that codifies the input word vocal data using a code book; a standard histogram storage means that preserves the histogram data created for each word from the code data during a specified period as standard histogram data, the histogram data includes the characteristic components of the respective similar words from among the code data obtained from the speech of multiple speakers for the respective similar words; and a processing means that, when the word detection signal output means generates a DRNN output showing a level of correctness above a predetermined level, establishes the specified period that includes characteristic components of the vocal data of the input words, creates a code histogram for the specified period using code data encoded by the codification means during the specified period, and performs discrimination of the input words and words that are similar to the input words by comparing the histogram data for each word with the standard histogram data. - View Dependent Claims (18, 19)
-
-
20. A similar word discrimination apparatus for discriminating words that may be misrecognized because of their similarity having a learning dynamic recurrent neural networks (DRNN) voice model that performs recognition processing to obtain specified output showing a level of correctness in response to voice data of input words, comprising:
-
a DRNN sub-voice storage means for storing a learning DRNN sub-voice model that generates DRNN output showing a level of correctness for characteristic components of the respective similar words that may be misrecognized; a word detection signal output means that outputs the level of correctness at a predetermined level or greater from the DRNN voice model and from the DRNN sub-voice model in response to the voice data of the input words, when there is vocal input of some words; and a processing means that, when the word detection signal output means generates a DRNN output showing a level of correctness above a predetennined level, establishes a specified period that includes characteristic components of the vocal data of the input words, uses the DRNN sub-voice model to examine the DRNN output characteristics of the vocal data of the input words during the specified period, and performs discrimination of the input words and the words that are similar to the input words on the basis of the examination results. - View Dependent Claims (21, 22)
-
-
23. A similar word discrimination apparatus, comprising:
-
means for receiving voice data of input words; means for using a learning voice model to obtain a specified output that shows a level of correctness in response to the voice data of the input words; means for processing the output to establish a specified period in which the characteristic components of the input words are included in the output, when the output shows a level of correctness of a predetermined amount or greater; means for examining the characteristics of the voice data of said input words during the specified period; and means for discriminating between the input words and words that arc similar to the input words on the basis of the examination.
-
-
24. A similar word discrimination apparatus, comprising:
-
means for receiving voice data of input words; means for using a learning dynamic recurrent neural networks (DRNN) voice model to obtain a specified DRNN output showing a level of correctness in response to the voice data of input words; means for processing the DRNN output to establish a specified period in which the characteristic components of the input words are included in the DRNN output, when the DRNN output shows a level of correctness of a predetermined amount or greater; means for encoding the input word voice data into code data by using a code book; means for examining the characteristics of the code data of said input words during the specified period; and means for discriminating between the input words and words that are similar to the input words on the basis of the examination. - View Dependent Claims (25, 26, 27)
-
-
28. A similar word discrimination apparatus, comprising:
-
means for successively receiving voice data of input words from the speech of multiple speakers; means for using a learning dynamic recurrent neural networks (DRNN) voice model to obtain a specified DRNN output showing a level of correctness in response to the voice data of each input word; means for processing the DRNN output to establish a specified period in which the characteristic components of each input words are included in the DRNN output, when the DRNN output shows a level of correctness of a predetermined amount or greater; means for encoding each input word voice data into code data by using a code book; means for creating histogram data, including a code histogram, from the coded data that includes characteristics for the specified period of each input word; means for accumulating standard histogram data by storing histogram data for each input word; means for comparing the histogram data of each input word with the standard histogram data; and means for discriminating between the input words and words that are similar to the input words on the basis of the comparison. - View Dependent Claims (29, 30)
-
-
31. A similar word discrimination apparatus, comprising:
-
means for receiving voice data of input words; means for creating a learning dynamic recurrent neural networks (DRNN) sub-voice model, that uses a DRNN voice model, to obtain a specified DRNN output for the characteristic components of respective similar words showing a level of correctness in response to the voice data of input words; means for processing the DRNN output to establish a specified period in which the characteristic components of the input words are included in the DRNN output, when the DRNN output shows a level of correctness of a predetermined amount or greater; means for examining the characteristics of the voice data of said input words during the specified period; and means for discriminating between the input words and words that are similar to the input words on the basis of the examination. - View Dependent Claims (32, 33)
-
Specification