Systems and methods for identifying speech sound features

US 8,983,832 B2
Filed: 07/02/2009
Issued: 03/17/2015
Est. Priority Date: 07/03/2008
Status: Active Grant

First Claim

Patent Images

1. A method for enhancing a speech sound, said method comprising:

identifying a first consonant-vowel (CV) speech sound from among a plurality of CV sounds;

identifying a second CV speech sound, that is different than the first CV speech sound, from among the plurality of CV sounds;

locating a first feature within the first speech sound, the first feature at least partially encoding the first speech sound, wherein the first feature includes a first time value and a first frequency value that together locate the first feature within the first speech sound;

locating a second feature within the second speech sound, the second feature at least partially encoding the second speech sound, wherein the second feature includes a second time value and a second frequency value that together locate the second feature within the second speech sound and that are different than the first time value and the first frequency value, respectively;

in an electronic device, increasing, based at least in part on the first time value and based at least in part on the first frequency value, the contribution of the first feature to the first speech sound; and

in the electronic device, increasing, based at least in part on the second time value and based at least in part on the second frequency value, the contribution of the second feature to the second speech sound.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for detecting features in spoken speech and processing speech sounds based on the features are provided. One or more features may be identified in a speech sound. The speech sound may be modified to enhance or reduce the degree to which the feature affects the sound ultimately heard by a listener. Systems and methods according to embodiments of the invention may allow for automatic speech recognition devices that enhance detection and recognition of spoken sounds, such as by a user of a hearing aid or other device.

Citations

25 Claims

1. A method for enhancing a speech sound, said method comprising:
- identifying a first consonant-vowel (CV) speech sound from among a plurality of CV sounds;
  
  identifying a second CV speech sound, that is different than the first CV speech sound, from among the plurality of CV sounds;
  
  locating a first feature within the first speech sound, the first feature at least partially encoding the first speech sound, wherein the first feature includes a first time value and a first frequency value that together locate the first feature within the first speech sound;
  
  locating a second feature within the second speech sound, the second feature at least partially encoding the second speech sound, wherein the second feature includes a second time value and a second frequency value that together locate the second feature within the second speech sound and that are different than the first time value and the first frequency value, respectively;
  
  in an electronic device, increasing, based at least in part on the first time value and based at least in part on the first frequency value, the contribution of the first feature to the first speech sound; and
  
  in the electronic device, increasing, based at least in part on the second time value and based at least in part on the second frequency value, the contribution of the second feature to the second speech sound.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, said step of locating said first feature further comprising:
    - generating an importance function for the first speech sound; and
      
      identifying, based on a portion of the importance function, a time at which said first feature occurs in said first speech sound, wherein the portion of the importance function corresponds to the first feature.
  - 3. The method of claim 2, wherein the importance function is at least one of a frequency importance function and a time importance function.
  - 4. The method of claim 1, said step of locating said first feature in the first speech sound further comprising:
    - isolating, within at least one of a certain time range and a certain frequency range, a section of a reference speech sound, wherein the section of the reference speech sound corresponds to one of the first speech sound or the second speech soundbased on a degree of recognition among a plurality of listeners to the isolated section, constructing an importance function describing a contribution of the isolated section to recognition of one of the first speech sound and the second speech sound; and
      
      using the importance function to identify the first feature as encoding the first speech sound or to identify the second feature as encoding the second speech sound.
  - 5. The method of claim 4, wherein the importance function is at least one of a time importance function and a frequency importance function.
  - 6. The method of claim 1, said step of locating the first feature in the first speech sound further comprising:
    - iteratively truncating the first speech sound to identify a time at which the first feature occurs in the first speech sound;
      
      applying at least one frequency filter to identify a frequency range in which the first feature occurs in the first speech sound;
      
      masking the first speech sound to identify a relative intensity at which the first feature occurs in the first speech sound; and
      
      using at least two of the identified time, the identified frequency range, and the identified intensity, to locate the first feature within the first speech sound.
  - 7. The method of claim 1, wherein each of the first speech sound and the second speech sound comprises at least one of /pa, ta, ka, ba, da, ga, fa, θ
    - a, sa, ∫
      
      a, δ
      
      a, va, ca/.
  - 8. The method of claim 6, said step of iteratively truncating the first speech sound further comprising:
    - iteratively truncating the first speech sound at a plurality of step sizes from an onset of the first speech sound;
      
      measuring listener recognition after each truncation; and
      
      upon finding a truncation step size at which the first speech sound is not distinguishable by the listener, identifying the found step size as indicating the location, in time, of the first sound feature.

9. A system for enhancing a speech sound, said system comprising:
- a feature detector configured to;
  
  identify a first consonant-vowel (CV) speech sound from among a plurality of CV sounds;
  
  identify a second CV speech sound, that is different than the first CV speech sound, from among the plurality of CV sounds;
  
  locate, in a speech signal, a first feature that at least partially encodes the first speech sound, wherein the first feature includes a first time value and a first frequency value that together locate the first feature within the first speech sound;
  
  locate a second feature within the second speech sound, the second feature at least partially encoding the second speech sound, wherein the second feature includes a second time value and a second frequency value that together locate the second feature within the second speech sound and that are different than the first time value and the first frequency value, respectively;
  
  a speech enhancer configured to enhance said speech signal by modifying, based on the first time value and the first frequency value, a contribution of the first feature to the first speech sound, and modifying, based on the second time value and the second frequency value, a contribution of the second feature to the second speech sound based on the second time value and the second frequency value; and
  
  an output to provide the enhanced speech signal to a listener.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein modifying the contribution of the first feature to the first speech sound comprises increasing the contribution of the first feature.
  - 11. The system of claim 10, wherein said feature detector is further configured to locate another feature in the first speech sound, and the speech enhancer is further configured to enhance the speech signal by decreasing the contribution of the another feature to the first speech sound, wherein the another feature interferes with recognition of the first speech sound.
  - 12. The system of claim 9, wherein the speech enhancer is configured to enhance, based on a hearing profile of the listener, the speech signal based on a hearing profile of the listener.
  - 13. The system of claim 9, wherein the feature detector is configured to identify, based on a hearing profile of the listener, the first feature based on a hearing profile of the listener.
  - 14. The system of claim 9, said system being implemented in at least one of an automatic speech recognition device, a cochlear implant, a portable electronic device, and a hearing aid.
  - 15. The system of claim 9, said feature detector storing speech feature data generated by a method comprising:
    - iteratively truncating the first speech sound to identify a time at which the first feature occurs in the first speech sound;
      
      applying at least one frequency filter to identify a frequency range in which the first feature occurs in the first speech sound;
      
      masking the first speech sound to identify a relative intensity at which the first feature occurs in the first speech sound; and
      
      using at least two of the identified time, the identified frequency range, and the identified intensity, to locate the first feature within the first speech sound.
  - 16. The system of claim 9, wherein each of the first speech sound and the second speech sound comprises at least one of /pa, ta, ka, ba, da, ga, fa, θ
    - a, sa, ∫
      
      a, δ
      
      a, va, ca/.

17. A method comprising:
- isolating, in time, a section of a speech sound, wherein the speech sound is within a certain frequency range;
  
  measuring recognition, by a plurality of listeners, of the isolated section of the speech soundbased on a degree of recognition among the plurality of listeners,constructing a time importance function and a frequency importance function that describe a contribution of the time-isolated section to recognition of the speech sound; and
  
  in an electronic device, identifying the speech sound from among a plurality of speech sounds, and, based at least in part on the identification of the identified speech sound, using the time importance function and the frequency importance function to identify a first feature that encodes the identified speech sound, wherein the first feature includes a first time value; and
  
  in the electronic device, modifying, based on the first time value, the identified speech sound to increase a contribution of said first feature to the identified speech sound,wherein the plurality of speech sounds comprises /pa, ta, ka, ba, da, ga, fa, θ
  
  a, sa, ∫
  
  a, δ
  
  a, va, ca/.
- View Dependent Claims (18, 19)
- - 18. The method of claim 17 further comprising the steps of:
    - isolating a second section of the identified speech sound within a certain time range;
      
      measuring recognition, by the plurality of listeners, of the second isolated section of the identified speech soundbased on a degree of recognition among the plurality of listeners, constructing a second time importance function that describes a contribution of the second section to recognition of the identified speech sound; and
      
      in the electronic device, using the second time importance function to identify a second feature that encodes the identified speech sound.
  - 19. The method of claim 18 further comprising:
    - in the electronic device, modifying said speech sound to decrease a contribution of said second feature to the speech sound.

20. A system for phone detection, the system comprising:
- an acoustic transducer configured to receive a speech signal, wherein the speech signal is generated in an acoustic domaina feature detector configured to receive the speech signal and to generate a feature signal indicating a temporal location, wherein the temporal location is in the speech signal and is where a speech sound feature occurs; and
  
  a phone detector configured to receive the feature signal and, based on the feature signal, identify, in the acoustic domain, a consonant-vowel (CV) speech sound included in the speech signal, wherein the CV speech sound is identified, by the system, from among a set of CV speech sounds comprising the identified CV speech sound and a plurality of other CV speech sounds, wherein the identified CV speech sound has at least one of a time value and a frequency value, and wherein each of the plurality of other CV speech sounds has a time value or a frequency value which is different than that of the identified CV speech sound wherein the plurality of CV speech sounds comprise /pa, ta, ka, ba, da, ga, fa, θ
  
  a, sa, ∫
  
  a, δ
  
  a, va, ca/.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The system of claim 20, further comprising:
    - a speech enhancer configured to receive the feature signal and, based on the temporal location of the speech sound feature, modify a contribution of the speech sound feature to the speech signal received by said feature detector.
  - 22. The system of claim 21, said speech enhancer configured to modify the contribution of the speech sound feature to the speech signal by increasing the contribution of the speech sound feature to the speech signal.
  - 23. The system of claim 21, said speech enhancer configured to modify the contribution of the speech sound feature to the speech signal by decreasing the contribution of the speech sound feature to the speech signal.
  - 24. The system of claim 20, said system being implemented in at least one of a cochlear implant, a portable electronic device, an automatic speech recognition device, and a hearing aid.
  - 25. The system of claim 20, wherein the location of the speech sound feature is defined by feature location data generated by an analysis of at least two dimensions of the identified speech sound, the at least two dimensions including at least two of time, frequency, and intensity.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Board of Trustees of The University of Illinois
Original Assignee
Board of Trustees of The University of Illinois
Inventors
Allen, Jont B., Li, Feipeng
Primary Examiner(s)
YEN, ERIC L

Application Number

US13/001,856
Publication Number

US 20110153321A1
Time in Patent Office

2,084 Days
Field of Search

704/200.1, 704/233, 704/225
US Class Current

704/225
CPC Class Codes

G10L 21/0264 characterised by the type o...

G10L 21/0364 for improving intelligibility

Systems and methods for identifying speech sound features

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for identifying speech sound features

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links