Computer system and computer-implemented process for phonology-based automatic speech recognition
First Claim
1. A machine-implemented process for recognition of a spoken word in an utterance, comprising the steps of:
- receiving a speech signal containing the utterance;
detecting acoustic cues in the speech signal;
detecting, using the detected acoustic cues, the presence of one or more of a set of a small number of phonological elements, wherein a phonological element is a language independent atomic unit derived from one or more acoustic cues;
identifying, using the detected acoustic cues, a location of one or more sub-word units in the speech signal, each sub-word unit consisting of a pair of structural units, each structural unit having at most two positions to which a combination of one or more phonological elements may be associated;
associating each of the detected phonological elements with a position in one of the identified sub-word units and generating a representation of the spoken word in the speech signal indicating the sub-word units and the combination of phonological elements associated with each position in the sub-word units; and
comparing the representation to a lexicon of predetermined representations of words to identify a best match, thereby recognizing the spoken word.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is based on the use of linguistic, especially phonological, knowledge to guide the speech recognition process. A speech signal containing an utterance is received and linguistic cues in the speech signal are detected. From these detected linguistic cues, a symbolic representation of the contents of the speech signal is generated. This symbolic representation comprises at least one word division, wherein each word division consists of an onset-rhyme pair and associated phonological elements. These phonological elements are univalent, may appear in all languages and are distinguishable from each other and directly interpretable in the speech signal. A lexicon of predetermined symbolic representations is provided for words in a particular language. A best match to the generated symbolic representation in found in the lexicon, thereby recognizing the spoken word.
-
Citations
16 Claims
-
1. A machine-implemented process for recognition of a spoken word in an utterance, comprising the steps of:
-
receiving a speech signal containing the utterance; detecting acoustic cues in the speech signal; detecting, using the detected acoustic cues, the presence of one or more of a set of a small number of phonological elements, wherein a phonological element is a language independent atomic unit derived from one or more acoustic cues; identifying, using the detected acoustic cues, a location of one or more sub-word units in the speech signal, each sub-word unit consisting of a pair of structural units, each structural unit having at most two positions to which a combination of one or more phonological elements may be associated; associating each of the detected phonological elements with a position in one of the identified sub-word units and generating a representation of the spoken word in the speech signal indicating the sub-word units and the combination of phonological elements associated with each position in the sub-word units; and comparing the representation to a lexicon of predetermined representations of words to identify a best match, thereby recognizing the spoken word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for recognition of a spoken word in an utterance, comprising:
-
means for receiving an speech signal containing the utterance; means for detecting acoustic cues in the speech signal; means for detecting, using the detected acoustic cues, the presence of one ore more of a set of a small number of phonological elements, wherein a phonological elements is a language independent atomic unit derived from one or more acoustic cues; and means for identifying, using the detected acoustic cues, a location of one or more sub-word units in the speech signal, each sub-word unit consisting of a pair of structural units, each structural unit having at most two positions to which a combination of one or more phonological elements may be associated; means for associating each of the detected phonological elements with a position in one of the identified sub-word units and generating a representation of the spoken word in the speech signal indicating the sub-word units and the combination of phonological elements associated with each position in the sub-word units; and means for comparing the representation to a lexicon of predetermined representations of words, to identify a best match thereby recognizing the spoken word. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for recognition of a spoken word in an utterance, comprising:
-
a phonological element and structure detector having an input for receiving a speech signal containing the utterance and an output providing a representation of the spoken word detected in the speech signal, wherein the representation comprises indications of at least one sub-word unit, each sub-word unit consisting of an a pair of structural units, each structural unit having at most two positions to which a combination of phonological elements may be associated, and indications of the combination of only phonological elements present in and associated with each position, wherein a phonological element is a language independent atomic unit derived from one or more acoustic cues and is either present or not at a point in time in the speech signal; a lexicon of predetermined representations of words, and a lexical matching system having a first input for receiving the representation from the output of the phonological element and structure detector, a second input for receiving predetermined representations from the lexicon and an output providing an indication of the predetermined representation which best matches the representation output by the phonological element and structure detector. - View Dependent Claims (16)
-
Specification