Speech recognition method

US 6,321,195 B1
Filed: 04/21/1999
Issued: 11/20/2001
Est. Priority Date: 04/28/1998
Status: Active Grant

First Claim

Patent Images

1. In a telephone modulating an input speech and having a built-in vocoder for encoding a modulated speech signal, a speech recognition method comprising:

a training step of, if a user enters a telephone number and a speech corresponding to said telephone number, performing the encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, and extracting and storing a feature of the detected speech section;

a recognition step of, if an input speech is received, performing encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, extracting a feature of the detected speech section, comparing the extracted feature with features of registered words stored during said training step, and selecting a registered word having a feature most similar to that of the input speech; and

a step of determining a result of the recognition to be right if a similarity of the registered word selected at said recognition step does not exceed a predetermined threshold and automatically dialing a telephone number corresponding to the recognized word, wherein said recognition step comprises extracting LSP parameters that have been encoded at said vocoder and transforming the extracted LSP parameters into pseudo-cepstrums.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to an automated dialing method for mobile telephones. According to the method, a user enters a telephone number via the keypad of the mobile phone, followed by speaking a corresponding codeword into the handset. The voice signal is encoded using the CODEC and vocoder already on board the mobile phone. The speech is divided into frames and each frame analyzed to ascertain its primary spectral features. These features are stored in memory as associated with the numeric keypad sequence. In recognition mode, the user speaks the codeword into the handset, which is analyzed in a like fashion as in training mode. The primary spectral features are compared with those stored in memory. When a match is declared according to preset criteria, the telephone number is automatically dialed by the mobile phone. Time warping techniques may be applied in the analysis to reduce timing variations.

31 Citations

View as Search Results

27 Claims

1. In a telephone modulating an input speech and having a built-in vocoder for encoding a modulated speech signal, a speech recognition method comprising:
- a training step of, if a user enters a telephone number and a speech corresponding to said telephone number, performing the encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, and extracting and storing a feature of the detected speech section;
  
  a recognition step of, if an input speech is received, performing encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, extracting a feature of the detected speech section, comparing the extracted feature with features of registered words stored during said training step, and selecting a registered word having a feature most similar to that of the input speech; and
  
  a step of determining a result of the recognition to be right if a similarity of the registered word selected at said recognition step does not exceed a predetermined threshold and automatically dialing a telephone number corresponding to the recognized word, wherein said recognition step comprises extracting LSP parameters that have been encoded at said vocoder and transforming the extracted LSP parameters into pseudo-cepstrums.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The speech recognition method according to claim 1, wherein said training step comprises:
3. The speech recognition method according to claim 2, wherein, in said third step, a line spectrum pair (LSP) coefficient output from said vocoder is used as the feature.
4. The speech recognition method according to claim 2, wherein said third step comprises the step of storing all encoded data of frames corresponding to the speech section for information of a result of the recognition with voice.
5. The speech recognition method according to claim 1, wherein said pseudo-cepstrum transforming step is defined as the following formula:
- $C_{i} = \frac{\sum_{j = 1}^{M} \cos (2 π i \times l_{i})}{i}$ $i = 1, \dots, N (N; cepstrum order)$ M;
  
  LSP order.
6. The speech recognition method according to claim 1, wherein said recognition step comprises:
- a first step of, if the user enters a destination to be called with voice, modulating the input speech to provide an output to said vocoder, dividing the speech signal into frames, and performing the encoding by the frame;
  
  a second step of detecting only the actually voiced speech section from the input signal, using codebook gain as energy information, said codebook gain being output as the result of the encoding at said first step; and
  
  a third step of, if the speech section is detected at said second step, extracting as features spectrum coefficients of frames corresponding to the speech section output as the result of the encoding, comparing the extracted features with the features of the registered words stored during said training step, and selecting the registered word having the feature most similar to that of the input speech.
7. The speech recognition method according to claim 6, wherein, in said third step, dynamic time warping (DTW) is used in comparing spectrum coefficients extracted from the input speech with spectrum coefficients of each word registered during said training step.
8. The speech recognition method according to claim 7, wherein said dynamic time warping comprises the steps offorming a two-dimensional quadrature coordinate plane having M×
- N trellis points (M is the number of frames of the input speech and N is the number of frames of a registered word) in order to matching two sequences of feature sets of the input speech and the stored registered word;
  
  respectively drawing slant lines having a slope 1 starting from a start trellis point (1, 1) and an end trellis point (M, N) on said two dimensional quadrature coordinate plane and horizontally moving the two slant lines as much as a predetermined value (N2ⁿ, wherein N is the number of frames and n is a natural number) to establish a search section for matching;
  
  calculating a distance between two features at each trellis point in a row within said search section and selecting a path through which a minimum distance between the two features is implemented;
  
  repeating said minimum path selection step with respect to all the rows within said search section; and
  
  dividing a minimum cumulative distance at said end trellis point (M, N) by a sum (M+N) of the two sequence to calculate a final matching score.
9. The speech recognition method according to claim 8, wherein said distance between the two features at each trellis point is calculated such that differences of values corresponding to respective orders of the two features are all summed up and defined as the following equations:
- Initial state;
  
  D_1,1=2d_{1, 1} $Others; D_{m, n} = \min (\begin{matrix} D_{m - 1, n - 1} + 2 d_{m, m} \\ D_{m - 1, n} + d_{m, m} \\ D_{m, n - 1} + d_{m, m} \end{matrix}), 1 \leq m \leq M, 1 \leq m \leq N$
  
  D_m,n;
  
  minimum cumulative distance at the trellis point (m, n)
10. The speech recognition method according to claim 9, wherein a value of the minimum cumulative distance at each trellis point (m, n) is substituted with a maximum integer value if the minimum cumulative distance value goes beyond a range of the integer.
11. The speech recognition method according to claim 10, wherein said trellis point (m, n) in each row within said search section has the minimum cumulative distance value of mth and nth features of the two sequences of a test pattern and reference pattern.
12. The speech recognition method according to claim 11, wherein a new path value of said trellis point (m, n) in each row within said search section is repeatedly generated by way of at least one function of a distance value directly shifting from a previous trellis point (m−
- 1, n−
  
  1) to the present trellis point (m, n) and distance values indirectly shifting from two neighboring trellis points (m−
  
  1, n) and (m, n−
  
  1) to the present trellis point (m, n).
13. The speech recognition method according to claim 12, wherein a minimum cumulative distance value in a very previous row is stored to obtain a minimum cumulative distance value in the present row.
14. The speech recognition method according to claim 1, wherein the recognition step further comprises applying a different pre-selection process to reduce a number of candidates codewords in the recognition step.
15. The speech recognition method according to claim 14, wherein said pre-selection step comprises the step of performing dynamic time warping (DTW) using only a part of spectrum information extracted from each frame to select a predetermined number of registered words having relatively high similarities and subsequently performing the DTW with respect to the selected registered words to finally select a registered word having the highest similarity to the input speech.
16. The speech recognition method according to claim 15, wherein said pre-selection step comprises the step of decreasing orders of the spectrum coefficient extracted from each frame and performing the DTW to select the predetermined number of registered words having relatively high similarities.
17. The speech recognition method according to claim 15, wherein said pre-selection step comprises the step of sub-sampling the frames to reduce the number of frames and performing the DTW to select the predetermined number of registered words having relatively high similarities.
18. The speech recognition method according to claim 15, wherein said pre-selection step comprises the step of decreasing orders of the spectrum coefficient extracted from each frame, sub-sampling the frames, and performing the DTW to select the predetermined number of registered words having relatively high similarities.
19. The speech recognition method according to claim 14, wherein said pre-selection step comprises the step of selecting a predetermined number of registered words having relatively high similarities using a linear matching method and subsequently performing dynamic time warping with respect to the selected registered words to finally select a registered word having the highest similarity to the input speech.

20. In a telephone modulating an input speech and having a built-in vocoder for encoding a modulated speech signal, a speech recognition method comprising:
- a training step of, if a user enters a telephone number and a speech corresponding to said telephone number, performing the encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, and extracting and storing a feature of the detected speech section;
  
  a recognition step of, if an input speech is received, performing encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, extracting a feature of the detected speech section, comparing the extracted feature with features of registered words stored during said training step, and selecting a registered word having a feature most similar to that of the input speech; and
  
  a step of determining a result of the recognition to be right if a similarity of the registered word selected at said recognition step does not exceed a predetermined threshold and automatically dialing a telephone number corresponding to the recognized word, wherein said recognition step comprises extracting stored representations of audio signals encoded by the vocoder and transforming said stored representation of audio signals into pseudo-cepstrums.
- View Dependent Claims (21, 22, 23)
- - 21. The speech recognition method according to claim 20, wherein said stored representations of audio signals are primary coefficients of a spectrum analysis performed by the vocoder.
  - 22. The speech recognition method according to claim 20, wherein said stored representations of audio signals are LSP parameters that have been encoded at said vocoder and transforming the extracted LSP parameters into pseudo-cepstrums.
  - 23. The speech recognition method according to claim 20, wherein said recognition step comprises:

24. In a telephone modulating an input speech and having a built-in vocoder for encoding a modulated speech signal, a speech recognition method comprising:
- a training step of, if a user enters a telephone number and a speech corresponding to said telephone number, performing the encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, and extracting and storing a feature of the detected speech section;
  
  a recognition step of, if an input speech is received, performing encoding at said vocoder, detecting only a speech section using information output as a result of the encoding, extracting a feature of the detected speech section, comparing the extracted feature with features of registered words stored during said training step, and selecting a registered word having a feature most similar to that of the input speech; and
  
  a step of determining a result of the recognition to be right if a similarity of the registered word selected at said recognition step does not exceed a predetermined threshold and automatically dialing a telephone number corresponding to the recognized word, wherein the recognition step further comprises applying a different comparing and selecting criteria in a separate pre-selection process prior to selection of the registered word having the feature most similar to that of the input speech.
- View Dependent Claims (25, 26, 27)
- - 25. The speech recognition method of claim 24, wherein the preselection process further comprises performing DTW using only part of the spectrum information.
  - 26. The speech recognition method of claim 24, wherein the preselection process further comprises using a linear matching method prior to the application of the DTW.
  - 27. The speech recognition method of claim 24, wherein the preselection process further comprises eliminating a number of high order cepstrum coefficients.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LG Electronics, Inc. (LG Corporation)
Original Assignee
LG Electronics, Inc. (LG Corporation)
Inventors
Lee, Yun Keun, Kim, Gi Bak, Lee, Byoung Soo, Lee, Jong Seok
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/295,523
Time in Patent Office

944 Days
Field of Search

704/200, 704/231, 704/241, 704/243, 704/201, 704/203, 704/204, 704/205, 704/206, 704/207, 704/214, 704/208, 704/236
US Class Current

704/241
CPC Class Codes

G10L 15/10   using distance or distortio...

G10L 15/12   using dynamic programming t...

G10L 2015/223   Execution procedure of a sp...

H04M 1/271   controlled by voice recogni...

Speech recognition method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links