Sound identification utilizing periodic indications

US 10,062,378 B1
Filed: 02/24/2017
Issued: 08/28/2018
Est. Priority Date: 02/24/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by a speech recognition system having at least a processor, the method comprising:

obtaining, by the processor, a frequency spectrum of an audio signal data;

extracting, by the processor, periodic indications from the frequency spectrum;

inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network;

estimating, by the processor, sound identification information from the neural network; and

performing, by the processor, a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information,wherein the neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes, and wherein the method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.

Citations

25 Claims

1. A computer-implemented method performed by a speech recognition system having at least a processor, the method comprising:
- obtaining, by the processor, a frequency spectrum of an audio signal data;
  
  extracting, by the processor, periodic indications from the frequency spectrum;
  
  inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network;
  
  estimating, by the processor, sound identification information from the neural network; and
  
  performing, by the processor, a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information,wherein the neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes, and wherein the method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The computer-implemented method of claim 1, wherein the estimating sound identification includes identifying phoneme information.
  - 3. The computer-implemented method of claim 1, wherein the periodic indications represent fluctuations in the frequency spectrum that periodically appear in the frequency spectrum.
  - 4. The computer-implemented method of claim 1, wherein the periodic indications represent harmonic structure of the audio signal data.
  - 5. The computer-implemented method of claim 1, further comprising normalizing the periodic indications before the inputting into the neural network.
  - 6. The computer-implemented method of claim 5, wherein the normalizing the periodic indications includes maintaining an ordinal scale among a plurality of bands in the periodic indications.
  - 7. The computer-implemented method of claim 6, wherein the normalizing the periodic indications is based on sigmoid normalization or max-variance normalization.
  - 8. The computer-implemented method of claim 1, wherein the components of the frequency spectrum include values relating to powers of the audio signal data in a plurality of frequency bands in the frequency spectrum.
  - 9. The computer-implemented method of claim 8, wherein the inputting the periodic indications and components of the frequency spectrum into a neural network comprises further inputting the first derivation and the second derivation with respect to time of the values relating to powers of the audio signal data in the plurality of frequency bands in the frequency spectrum.
  - 10. The computer-implemented method of claim 1, wherein the neural network is a convolutional neural network or a deep neural network.
  - 11. The computer-implemented method of claim 10, wherein the inputting into the neural network includes inputting the periodic indications, and, the components of the frequency spectrum into a first layer of the neural network.
  - 12. The computer-implemented method of claim 10, further comprising Mel-filtering the periodic indications and the frequency spectrum before the inputting into the neural network.
  - 13. The computer-implemented method of claim 10, wherein the inputting into the neural network includes inputting the periodic indications into a second layer or a subsequent layer of the neural network.
  - 14. The computer-implemented method of claim 13, wherein the neural network is the convolutional neural network, and the convolutional neural network includes one or more convolutional neural network layers, andwherein the inputting into the neural network further includes inputting the periodic indications into a layer that is downstream of the one or more convolutional neural network layers.
  - 15. The computer-implemented method of claim 13, further comprising compressing the periodic indications by reducing a number of dimensions of the periodic indications before the inputting into the neural network.
  - 16. The computer-implemented method of claim 13, further comprising Mel-filtering the periodic indications before the inputting into the neural network.
  - 17. The computer-implemented method of claim 1, further comprising integrating the periodic indications and the components of the frequency spectrum in a subsequent layer with respect to the first layer, from among the plurality of fully-connected network layers, after abstracting the periodic indications and the components of the frequency spectrum.

18. A non-transitory computer program product having instructions embodied therewith, the instructions executable by a speech recognition system that includes a processor or programmable circuitry to cause the processor or programmable circuitry to perform a method comprising:
- obtaining a frequency spectrum of an audio signal data;
  
  extracting periodic indications from the frequency spectrum;
  
  inputting the periodic indications and components of the frequency spectrum into a neural network; and
  
  estimating sound identification information from the neural network; and
  
  performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information,wherein the neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes, and wherein the method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.
- View Dependent Claims (19, 20, 21)
- - 19. The non-transitory computer program product of claim 18, wherein the estimating sound identification includes identifying phoneme information.
  - 20. The non-transitory computer program product of claim 18, wherein the periodic indications represent fluctuations in the frequency spectrum that periodically appear in the frequency spectrum.
  - 21. The non-transitory computer program product of claim 18, wherein the periodic indications represent harmonic structure of the audio signal data.

22. A speech recognition system, comprising:
- a processor; and
  
  one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to;
  
  obtain frequency spectrum of an audio signal data;
  
  extract periodic indications from the frequency spectrum;
  
  input the periodic indications and components of the frequency spectrum into a neural network, wherein the neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes; and
  
  estimate sound identification information from the neural network; and
  
  perform a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information,wherein the neural network is trained by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.
- View Dependent Claims (23, 24, 25)
- - 23. The speech recognition system of claim 22, wherein the estimating sound identification includes identifying phoneme information.
  - 24. The speech recognition system of claim 22, wherein the periodic indications represent fluctuations in the frequency spectrum that periodically appear in the frequency spectrum.
  - 25. The speech recognition system of claim 22, wherein the periodic indications represent harmonic structure of the audio signal data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Fukuda, Takashi, Ichikawa, Osamu, Ramabhadran, Bhuvana
Primary Examiner(s)
Colucci, Michael

Application Number

US15/441,973
Publication Number

US 20180247641A1
Time in Patent Office

550 Days
Field of Search

704201, 704232, 704202, 706 15, 706 14
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/16   using artificial neural net...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/24   the extracted parameters be...

Sound identification utilizing periodic indications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Sound identification utilizing periodic indications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links