Systems and methods for identifying speech sound features
First Claim
Patent Images
1. A method for enhancing a speech sound, said method comprising:
- identifying a first consonant-vowel (CV) speech sound from among a plurality of CV sounds;
identifying a second CV speech sound, that is different than the first CV speech sound, from among the plurality of CV sounds;
locating a first feature within the first speech sound, the first feature at least partially encoding the first speech sound, wherein the first feature includes a first time value and a first frequency value that together locate the first feature within the first speech sound;
locating a second feature within the second speech sound, the second feature at least partially encoding the second speech sound, wherein the second feature includes a second time value and a second frequency value that together locate the second feature within the second speech sound and that are different than the first time value and the first frequency value, respectively;
in an electronic device, increasing, based at least in part on the first time value and based at least in part on the first frequency value, the contribution of the first feature to the first speech sound; and
in the electronic device, increasing, based at least in part on the second time value and based at least in part on the second frequency value, the contribution of the second feature to the second speech sound.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for detecting features in spoken speech and processing speech sounds based on the features are provided. One or more features may be identified in a speech sound. The speech sound may be modified to enhance or reduce the degree to which the feature affects the sound ultimately heard by a listener. Systems and methods according to embodiments of the invention may allow for automatic speech recognition devices that enhance detection and recognition of spoken sounds, such as by a user of a hearing aid or other device.
-
Citations
25 Claims
-
1. A method for enhancing a speech sound, said method comprising:
-
identifying a first consonant-vowel (CV) speech sound from among a plurality of CV sounds; identifying a second CV speech sound, that is different than the first CV speech sound, from among the plurality of CV sounds; locating a first feature within the first speech sound, the first feature at least partially encoding the first speech sound, wherein the first feature includes a first time value and a first frequency value that together locate the first feature within the first speech sound; locating a second feature within the second speech sound, the second feature at least partially encoding the second speech sound, wherein the second feature includes a second time value and a second frequency value that together locate the second feature within the second speech sound and that are different than the first time value and the first frequency value, respectively; in an electronic device, increasing, based at least in part on the first time value and based at least in part on the first frequency value, the contribution of the first feature to the first speech sound; and in the electronic device, increasing, based at least in part on the second time value and based at least in part on the second frequency value, the contribution of the second feature to the second speech sound. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for enhancing a speech sound, said system comprising:
-
a feature detector configured to; identify a first consonant-vowel (CV) speech sound from among a plurality of CV sounds; identify a second CV speech sound, that is different than the first CV speech sound, from among the plurality of CV sounds; locate, in a speech signal, a first feature that at least partially encodes the first speech sound, wherein the first feature includes a first time value and a first frequency value that together locate the first feature within the first speech sound; locate a second feature within the second speech sound, the second feature at least partially encoding the second speech sound, wherein the second feature includes a second time value and a second frequency value that together locate the second feature within the second speech sound and that are different than the first time value and the first frequency value, respectively; a speech enhancer configured to enhance said speech signal by modifying, based on the first time value and the first frequency value, a contribution of the first feature to the first speech sound, and modifying, based on the second time value and the second frequency value, a contribution of the second feature to the second speech sound based on the second time value and the second frequency value; and an output to provide the enhanced speech signal to a listener. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method comprising:
-
isolating, in time, a section of a speech sound, wherein the speech sound is within a certain frequency range; measuring recognition, by a plurality of listeners, of the isolated section of the speech sound based on a degree of recognition among the plurality of listeners, constructing a time importance function and a frequency importance function that describe a contribution of the time-isolated section to recognition of the speech sound; and in an electronic device, identifying the speech sound from among a plurality of speech sounds, and, based at least in part on the identification of the identified speech sound, using the time importance function and the frequency importance function to identify a first feature that encodes the identified speech sound, wherein the first feature includes a first time value; and in the electronic device, modifying, based on the first time value, the identified speech sound to increase a contribution of said first feature to the identified speech sound, wherein the plurality of speech sounds comprises /pa, ta, ka, ba, da, ga, fa, θ
a, sa, ∫
a, δ
a, va, ca/. - View Dependent Claims (18, 19)
-
-
20. A system for phone detection, the system comprising:
-
an acoustic transducer configured to receive a speech signal, wherein the speech signal is generated in an acoustic domain a feature detector configured to receive the speech signal and to generate a feature signal indicating a temporal location, wherein the temporal location is in the speech signal and is where a speech sound feature occurs; and a phone detector configured to receive the feature signal and, based on the feature signal, identify, in the acoustic domain, a consonant-vowel (CV) speech sound included in the speech signal, wherein the CV speech sound is identified, by the system, from among a set of CV speech sounds comprising the identified CV speech sound and a plurality of other CV speech sounds, wherein the identified CV speech sound has at least one of a time value and a frequency value, and wherein each of the plurality of other CV speech sounds has a time value or a frequency value which is different than that of the identified CV speech sound wherein the plurality of CV speech sounds comprise /pa, ta, ka, ba, da, ga, fa, θ
a, sa, ∫
a, δ
a, va, ca/. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification