Speech recognition using speech characteristic probabilities

US 9,202,470 B2
Filed: 01/31/2013
Issued: 12/01/2015
Est. Priority Date: 09/28/2009
Status: Active Grant

First Claim

Patent Images

1. An acoustic front-end device comprising:

a frame parser receiving an acoustic signal and parsing the received acoustic signal into a plurality of frames;

a plurality of correlators, each of the correlators correlating each of the plurality of frames with one or more acoustic property sets to produce a first set of acoustic property correlations;

a controllerretrieving one or more speech characteristic samples from one or more speech codebooks based on the first set of acoustic property correlations; and

a speech characteristic probability generator configured to;

generate a plurality of speech characteristic probabilities over one or more subsequent frames of the plurality of frames by individually correlating a digital signal component of the subsequent frame with the one or more acoustic property sets to produce a second set of acoustic property correlations; and

a processor configured to interpret the plurality of speech characteristic probabilities to generate at least a language probability and a language syntax bias; and

select a plurality of words from a series of plurality of words based on the language probability and the language syntax bias; and

output the plurality of words through an interface.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.

26 Citations

19 Claims

1. An acoustic front-end device comprising:
- a frame parser receiving an acoustic signal and parsing the received acoustic signal into a plurality of frames;
  
  a plurality of correlators, each of the correlators correlating each of the plurality of frames with one or more acoustic property sets to produce a first set of acoustic property correlations;
  
  a controllerretrieving one or more speech characteristic samples from one or more speech codebooks based on the first set of acoustic property correlations; and
  
  a speech characteristic probability generator configured to;
  
  generate a plurality of speech characteristic probabilities over one or more subsequent frames of the plurality of frames by individually correlating a digital signal component of the subsequent frame with the one or more acoustic property sets to produce a second set of acoustic property correlations; and
  
  a processor configured to interpret the plurality of speech characteristic probabilities to generate at least a language probability and a language syntax bias; and
  
  select a plurality of words from a series of plurality of words based on the language probability and the language syntax bias; and
  
  output the plurality of words through an interface.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The device of claim 1, wherein the controller retrieving one or more speech characteristic samples from the one or more speech codebooks is further configured to:
    - interpret a set of the acoustical property set correlations to produce a first index;
      
      and address the one or more speech codebooks based on the first index to retrieve the one or more speech characteristic samples.
  - 3. The device of claim 1, wherein the one or more speech codebooks store a plurality of the speech characteristic samples comprising a plurality of speech characteristics.
  - 4. The device of claim 3, wherein the plurality of speech characteristics comprises any of:
    - gender, age, nationality, dialect or accent.
  - 5. The device of claim 1, wherein at least one of the one or more speech codebooks comprises any of:
    - sound tables, word tables or syntax tables.
  - 6. The device of claim 1, wherein at least one of the one or more speech codebooks is based on a hidden Markov model.
  - 7. The device of claim 6, wherein the one or more codebooks based on the hidden Markov model provide, in sequence, a plurality of features vectors representing spectral characteristics of speech for a given frame for the plurality of frames.
  - 8. The device of claim 1, wherein the controller analyzes each of the acoustic property set correlations to determine one or more likely speech characteristics of the frame.
  - 9. The device of claim 8, wherein, if the controller when analyzing each of the acoustic property set correlations, for a specified speech characteristic of the one or more likely speech characteristics, cannot determine conclusively that the specified speech characteristic is present, it ignores the specified speech characteristic.
  - 10. The device of claim 2 further comprising:
    - interpreting the second set of acoustical property correlations with respect to the speech characteristic probability for the subsequent frame to produce a second index;
      
      addressing the one or more speech codebooks based on the second index to retrieve a second speech characteristic sample; and
      
      correlating the digital signal component with the second speech characteristic sample to produce the speech characteristic probability for the subsequent frame.
  - 11. The device of claim 10, wherein the processor is further configured to:
    - interpret the plurality of speech characteristic probabilities to further generate a word bias andselect a plurality of words from the series of plurality of words based on the word bias.

12. A method for frame-by-frame analyzing of an acoustic signal to determine speech characteristics comprising:
- configuring a processor to;
  
  parse the acoustic signal into a plurality of frames;
  
  correlate each of the plurality of frames with one or more acoustic property sets to produce a first set of acoustic property correlations and first index;
  
  retrieve one or more first speech characteristic samples from one or more speech codebooks based on the first index; and
  
  generate a plurality of speech characteristic probabilities for a subsequent frame of the plurality of frames by;
  
  individually correlating a digital signal component of the subsequent frame with the one or more acoustical property sets to produce a second set of acoustic property correlationsand a second index;
  
  addressing the one or more speech codebooks based on the second index to retrieve a second speech characteristic sample; and
  
  correlating the digital signal component with the second speech characteristic sample to produce the speech characteristic probability for the subsequent frame; and
  
  retrieving and outputting words based on a language probability and a language syntax bias of the plurality of speech characteristic probabilities.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method claim 12, wherein the retrieving one or more speech characteristic samples from the one or more speech codebooks further comprises:
    - interpreting a third set of acoustic property correlations to produce a third index; and
      
      addressing the one or more speech codebooks based on the third index to retrieve one or more third speech characteristic samples.
  - 14. The method of claim 12 further configuring the processor to analyze each of the acoustic property set correlations to determine one or more likely speech characteristics of the frame.
  - 15. The method of claim 12, wherein the first and second speech characteristic samples comprise a plurality of speech characteristics comprising any of:
    - gender, age, nationality, dialect or accent.
  - 16. The method of claim 12, wherein at least one of the one or more speech codebooks comprises any of:
    - sound tables, word tables or syntax tables.
  - 17. The method of claim 12 further comprising configuring the processor to:
    - interpret the plurality of speech characteristic probabilities to generate a word bias;
      
      select the plurality of words from a series of plurality of words based on the word bias; and
      
      output the one or more words through an interface.

18. An acoustic front-end device comprising:
- a frame parser receiving an acoustic signal and parsing the received acoustic signal into a plurality of frames;
  
  a plurality of correlators, each of the correlators correlating each of the plurality of frames with the one or more acoustic property sets to produce a first set of acoustic property correlations;
  
  a controller, the controller retrieving one or more first speech characteristic samples from one or more speech codebooks based on the acoustic property set correlations;
  
  a speech characteristic probability generator generating speech characteristic probabilities for a subsequent frame of the plurality of frames by;
  
  individually correlating a digital signal component of the subsequent frame with the one or more acoustical property sets to produce a second set of acoustic property correlationsaddressing the one or more speech codebooks based on the second set of acoustic property correlations to retrieve a second speech characteristic sample; and
  
  correlating the digital signal component with the second speech characteristic sample to produce the speech characteristic probability for the subsequent frame; and
  
  a processor, the processor;
  
  interpreting the plurality of speech characteristic probabilities to generate at least a language probability, word bias, and language syntax bias;
  
  selecting a plurality of words from a series of plurality of words based on the word bias and language probability; and
  
  selecting a plurality of language syntaxes from a series of plurality of language syntaxes based on the language probability.
- View Dependent Claims (19)
- - 19. The device of claim 18, wherein at least one of the one or more speech codebooks is based on a hidden Markov model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avago Technologies International Sales Pte Limited (Broadcom, Inc.)
Original Assignee
Broadcom Corporation (Broadcom, Inc.)
Inventors
Seshadri, Nambirajan
Primary Examiner(s)
Lerner, Martin

Application Number

US13/755,975
Publication Number

US 20130151254A1
Time in Patent Office

1,034 Days
Field of Search

704/8, 704/240, 704/246, 704/247, 704/249, 704/250, 704/257
US Class Current

1/1
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/183   using context dependencies,...

G10L 15/28   Constructional details of s...

G10L 25/00   Speech or voice analysis te...

Speech recognition using speech characteristic probabilities

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using speech characteristic probabilities

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links