Apparatus, method and system for cross-speaker speech recognition for telecommunication applications
First Claim
1. A method for cross-speaker speech recognition for telecommunication systems, the method comprising:
- (a) receiving incoming speech;
(b) generating a phonetic representation of the incoming speech with a first speaker-independent model having an unconstrained grammar with a plurality of phonemes, in which any second phoneme of the plurality of phonemes may occur following any first phoneme of the plurality of phonemes;
(c) determining a transcription parameter as a first correspondence of the incoming speech to the first speaker-independent model;
(d) selecting a first phoneme pattern, from a plurality of phoneme patterns, utilizing a second speaker-independent model having a grammar constrained by the plurality of phoneme patterns;
(e) determining a recognition parameter as a second correspondence of the incoming speech to the first phoneme pattern; and
(f) determining whether the input speech matches the first phoneme pattern based upon a third correspondence of the transcription parameter with the recognition parameter in accordance with a predetermined criterion.
1 Assignment
0 Petitions
Accused Products
Abstract
The apparatus, method and system of the present invention provide for cross-speaker speech recognition, and are particularly suited for telecommunication applications such as automatic name (voice) dialing, message management, call return management, and incoming call screening. The method of the present invention includes receiving incoming speech, such as an incoming caller name, and generating a phonetic transcription of the incoming speech with a speaker-independent, hidden Markov model having an unconstrained grammar in which any phoneme may follow any other phoneme, followed by determining a transcription parameter as a likelihood of fit of the incoming speech to the speaker-independent model. The method further selects a first phoneme pattern, from a plurality of phoneme patterns, as having a highest likelihood of fit to the incoming speech, utilizing a speaker-independent, hidden Markov model having a grammar constrained by these phoneme patterns, followed by determining a recognition parameter as a likelihood of fit of the incoming speech to the selected, first phoneme pattern. The method then determines whether the input speech matches or collides with the first phoneme pattern based upon a correspondence of the transcription parameter with the recognition parameter in accordance with a predetermined criterion. In the preferred embodiment, this matching or collision determination is made as a function of a confidence ratio, the ratio of the transcription parameter to the recognition parameter, being within or less than a predetermined threshold value.
73 Citations
58 Claims
-
1. A method for cross-speaker speech recognition for telecommunication systems, the method comprising:
-
(a) receiving incoming speech;
(b) generating a phonetic representation of the incoming speech with a first speaker-independent model having an unconstrained grammar with a plurality of phonemes, in which any second phoneme of the plurality of phonemes may occur following any first phoneme of the plurality of phonemes;
(c) determining a transcription parameter as a first correspondence of the incoming speech to the first speaker-independent model;
(d) selecting a first phoneme pattern, from a plurality of phoneme patterns, utilizing a second speaker-independent model having a grammar constrained by the plurality of phoneme patterns;
(e) determining a recognition parameter as a second correspondence of the incoming speech to the first phoneme pattern; and
(f) determining whether the input speech matches the first phoneme pattern based upon a third correspondence of the transcription parameter with the recognition parameter in accordance with a predetermined criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
determining that the input speech matches the first phoneme pattern when the transcription parameter compares with the recognition parameter in accordance with the predetermined criterion; and
determining that the input speech does not match the first phoneme pattern when the transcription parameter does not compare with the recognition parameter in accordance with the predetermined criterion.
-
-
11. The method of claim 1, wherein step (f) further comprises:
-
comparing the transcription parameter to the recognition parameter to form a confidence ratio;
when the confidence ratio is less than a predetermined threshold, determining that the input speech matches the first phoneme pattern; and
when the confidence ratio is not less than the predetermined threshold, determining that the input speech does not match the first phoneme pattern.
-
-
12. The method of claim 1, further comprising generating a name list, wherein generating the name list includes:
-
receiving as incoming speech a first sample of a name and performing steps (b) through (f), inclusive, on the first sample; and
when the first sample does not match the first phoneme pattern, including the phonetic representation of the first sample within the plurality of phoneme patterns.
-
-
13. The method of claim 1, further comprising generating a name list, wherein generating the name list includes:
-
receiving as incoming speech a first sample of a name and performing steps (b) through (f), inclusive, on the first sample;
when the first sample does not match the first phoneme pattern, initially including a phonetic representation of the first sample within the plurality of phoneme patterns, receiving as incoming speech a second sample of the name, and performing steps (b) through (f), inclusive, on the second sample; and
determining whether the second sample matches the first sample and, when the second sample does match the first sample, including the name in the name list and including corresponding phonetic representations of both the first sample and the second sample in the plurality of phoneme patterns.
-
-
14. The method of claim 1, further comprising generating a message list, wherein generating the message list includes:
-
receiving as incoming speech a caller name and performing steps (b) through (f), inclusive, on the caller name;
when the caller name does not match the first phoneme pattern, including the caller name in the message list and indicating that one call has been received from the caller name;
when the caller name does match the first phoneme pattern, incrementing a count of calls received from the caller name.
-
-
15. The method of claim 14, further comprising performing message playback, wherein performing message playback includes:
-
receiving incoming speech;
selecting the first phoneme pattern, from a subset of the a plurality of phoneme patterns corresponding to the message list, as the highest likelihood of fit to the incoming speech; and
playing a first message associated with the first phoneme pattern.
-
-
16. The method of claim 15, further comprising:
when a plurality of messages are associated with the first phoneme pattern, sequentially playing the plurality of messages.
-
17. The method of claim 1, further comprising performing call return, wherein performing call return includes:
-
receiving incoming speech;
selecting the first phoneme pattern, from a subset of the plurality of phoneme patterns corresponding to a name list and a message list, as the highest likelihood of fit to the incoming speech; and
transmitting a telecommunication number associated with the first phoneme pattern.
-
-
18. The method of claim 1, further comprising performing incoming call screening, wherein the plurality of phoneme patterns are predetermined to correspond to a plurality of names on a call screening list of a subscriber, and performing incoming call screening includes:
-
receiving an incoming call leg;
receiving as incoming speech a caller name and performing steps (b) through (f), inclusive, on the caller name;
when the caller name does not match the first phoneme pattern, transferring the incoming call leg to a message system;
when the caller name does match the first phoneme pattern, transferring the incoming call leg to the subscriber.
-
-
19. An apparatus for cross-speaker speech recognition for telecommunication systems, the apparatus comprising:
-
a network interface to receive incoming speech;
a memory, the memory storing a plurality of phoneme patterns; and
a processor coupled to the network interface and to the memory, wherein the processor, when operative, includes instructions to generate a phonetic representation of the incoming speech with a first speaker-independent model having an unconstrained grammar having a plurality of phonemes, in which any second phoneme of the plurality of phonemes may occur following any first phoneme of the plurality of phonemes and determine a transcription parameter as a first correspondence of the incoming speech to the first speaker-independent model;
the processor including further instructions to select a first phoneme pattern, from the plurality of phoneme patterns, utilizing a second speaker-independent model having a grammar constrained by the plurality of phoneme patterns, and to determine a recognition parameter as a second correspondence of the incoming speech to the first phoneme pattern; and
the processor including further instructions to determine whether the input speech matches the first phoneme pattern based upon a third correspondence of the transcription parameter with the recognition parameter in accordance with a predetermined criterion.- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. An system for cross-speaker speech recognition for telecommunication systems, the system comprising:
-
a switch to receive an incoming call leg; and
an adjunct network entity coupled to the switch, wherein the adjunct network entity, when operative, includes instructions to receive incoming speech, generate a phonetic representation of the incoming speech with a first speaker-independent model having an unconstrained grammar having a plurality of phonemes, in which any second phoneme of the plurality of phonemes may occur following any first phoneme of the plurality of phonemes, and determine a transcription parameter as a first correspondence of the incoming speech to the first speaker-independent model;
the adjunct network entity including further instructions to select a first phoneme pattern, from a plurality of phoneme patterns, utilizing a second speaker-independent model having a grammar constrained by the plurality of phoneme patterns, and to determine a recognition parameter as a second correspondence of the incoming speech to the first phoneme pattern; and
the adjunct network entity including further instructions to determine whether the input speech matches the first phoneme pattern based upon a third correspondence of the transcription parameter with the recognition parameter in accordance with a predetermined criterion.- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
-
Specification