Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus

US 9,972,305 B2
Filed: 10/06/2016
Issued: 05/15/2018
Est. Priority Date: 10/16/2015
Status: Active Grant

First Claim

Patent Images

1. An apparatus for normalizing input data of an acoustic model, the apparatus comprising:

a window extractor configured to extract windows of frame data to be input to the acoustic model from frame data of a speech to be recognized; and

a normalizer configured to normalize the frame data to be input to the acoustic model in units of the extracted windows,wherein the normalizer is configured to normalize frames belonging to a current window in consideration of frames belonging to preceding windows of the current window.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus for normalizing input data of an acoustic model includes a window extractor configured to extract windows of frame data to be input to an acoustic model from frame data of a speech to be recognized, and a normalizer configured to normalize the frame data to be input to the acoustic model in units of the extracted windows.

Citations

23 Claims

1. An apparatus for normalizing input data of an acoustic model, the apparatus comprising:
- a window extractor configured to extract windows of frame data to be input to the acoustic model from frame data of a speech to be recognized; and
  
  a normalizer configured to normalize the frame data to be input to the acoustic model in units of the extracted windows,wherein the normalizer is configured to normalize frames belonging to a current window in consideration of frames belonging to preceding windows of the current window.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus of claim 1, wherein the window extractor is further configured to consecutively extract the windows in units of a predetermined number of frames of the frame data of the speech to be recognized while the frame data of the speech to be recognized is being input.
  - 3. The apparatus of claim 1, wherein the normalizer is further configured to normalize frames belonging to the current window together with padding frames added to both sides of the current window.
  - 4. The apparatus of claim 1, wherein the normalizer is further configured to normalize the frames belonging to the current window in consideration of the frames belonging to the preceding windows and frames of training data in response to a total number of the frames belonging to the current window and of the frames belonging to the preceding windows being insufficient for speech recognition.
  - 5. The apparatus of claim 4, wherein the normalizer is further configured to acquire a number of frames corresponding to a difference between the total number of the frames and a reference value from the training data in response to the total number of the frames being less than the reference value.
  - 6. The apparatus of claim 1, wherein the normalizer is further configured to normalize the frame data belonging to the extracted windows so that the frame data belonging to the extracted windows has an average of 0 and a standard deviation of 1.

7. A method of normalizing input data of an acoustic model, the method comprising:
- extracting windows of frame data to be input to the acoustic model from frame data of a speech to be recognized; and
  
  normalizing the frame data to be input to the acoustic model in units of the extracted windows,wherein the normalizing of the frame data comprises normalizing frames belonging to a current window in consideration of frames belonging to preceding windows of the current window.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The method of claim 7, wherein the extracting of the windows comprises consecutively extracting the windows in units of a predetermined number of frames of the frame data of the speech to be recognized while the frame data of the speech to be recognized is being input.
  - 9. The method of claim 7, wherein the normalizing of the frame data comprises normalizing frames belonging to the current window together with padding frames added to both sides of the current window.
  - 10. The method of claim 7, the normalizing of the frame data comprises normalizing the frames belonging to the current window in consideration of the frames belonging to the preceding windows and frames of training data in response to a total number of the frames belonging to the current window and of the frames belonging to the preceding windows being insufficient for speech recognition.
  - 11. The method of claim 10, wherein the normalizing of the frame data comprises:
    - comparing the total number of the frames belonging to the current window and the preceding windows with a reference value in response to the current window being extracted; and
      
      acquiring a number of frames corresponding to a difference between the total number of the frames and the reference value from the training data in response to the total number of the frames being less than the reference value.
  - 12. The method of claim 7, wherein the normalizing of the frame data comprises normalizing the frame data belonging to the extracted windows so that the frame data belonging to the extracted windows has an average of 0 and a standard deviation of 1.
  - 13. A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 7.

14. A speech recognition apparatus comprising:
- a preprocessor configured to;
  
  extract windows of frame data to be input to an acoustic model from frame data of a speech to be recognized; and
  
  normalize the frame data to be input to the acoustic model in units of the extracted windows;
  
  an acoustic score calculator configured to calculate acoustic scores in units of the normalized windows using the acoustic model based on a deep neural network (DNN); and
  
  an interpreter configured to;
  
  interpret the acoustic scores calculated in units of the normalized windows; and
  
  output a recognition result of the speech to be recognized based on the interpreted scores,wherein the preprocessor is further configured to normalize frames belonging to a current window in consideration of frames belonging to preceding windows of the current window.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The speech recognition apparatus of claim 14, wherein the preprocessor is further configured to normalize the frames belonging to the current window in consideration of the frames belonging to the preceding windows and frames of training data in response to a total number of the frames belonging to the current window and of the frames belonging to the preceding windows being insufficient for speech recognition.
  - 16. The speech recognition apparatus of claim 14, wherein the interpreter is further configured to output a recognition result of the current window as a final recognition result of a whole speech to be recognized in response to a predetermined condition being satisfied or an input of a user while input of the speech to be recognized is under way.
  - 17. The speech recognition apparatus of claim 14, wherein the DNN is a bidirectional recurrent deep neural network (BRDNN).
  - 18. The speech recognition apparatus of claim 14, further comprising a language score calculator configured to calculate language scores using a language model;
    - wherein the interpreter is further configured to output the recognition result based on the interpreted scores and the language scores.

19. An apparatus for normalizing input data of an acoustic model, the apparatus comprising:
- a window extractor configured to extract windows of frame data to be input to the acoustic model from frame data of a speech to be recognized; and
  
  a normalizer configured to normalize the frame data to be input to the acoustic model based on results of a determination that an amount of frame data to enable speech recognition is determined sufficient.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The apparatus of claim 19, wherein the normalizer is further configured to normalize the frame data based on frames of all of the extracted windows from a first extracted window to a current extracted window.
  - 21. The apparatus of claim 19, wherein the normalizer is further configured to normalize the frame data based on frames of all of the extracted windows from a first extracted window to a current extracted window and frames of training data.
  - 22. The apparatus of claim 21, wherein a number of the frames of the training data is equal to a difference between a total number of the frames of all of the extracted windows from the first extracted window to the current extracted window and a reference value denoting a minimum number of frames to enable speech recognition.
  - 23. The apparatus of claim 19, wherein the normalizer is further configured to normalize frames of a current extracted window each time a window is extracted.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Song, In Chul, Choi, Young Sang, Na, Hwi Dong
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US15/286,999
Publication Number

US 20170110115A1
Time in Patent Office

586 Days
Field of Search

704231-257, 704270-275
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/16   using artificial neural net...

G10L 21/04   Time compression or expansion

G10L 25/45   characterised by the type o...

Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for normalizing input data of acoustic model and speech recognition apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links