Multi-phoneme streamer and knowledge representation speech recognition system and method
First Claim
1. A method of processing phonemes in speech, comprising:
- inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate phonemes based on a plurality of reference cluster sets, each cluster set representing reference phonemes for a cluster type; and
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method related to a new approach to speech recognition that reacts to concepts conveyed through speech. In its fullest implementation, the system and method shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. This is done by using a probabilistically unbiased multi-phoneme recognition process, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance'"'"'s conceptual representation that can be used to produce an adequate response. The invention can be employed for a myriad of applications, such as improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios.
124 Citations
401 Claims
-
1. A method of processing phonemes in speech, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate phonemes based on a plurality of reference cluster sets, each cluster set representing reference phonemes for a cluster type; and
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 400, 401)
-
-
17. A method of processing and recognizing speech, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify a candidate phoneme based on a plurality of reference cluster sets, each cluster set representing reference phonemes for that cluster type;
wherein the step of analyzing includes determining a score or probability of each identified candidate phoneme; and
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step, and wherein the phoneme stream includes or is associated with the determined score or probability of each identified candidate phoneme. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51)
-
-
52. A system for processing an acoustic input of speech, comprising:
-
an input device for inputting an acoustic input comprising digitized speech;
a phoneme recognition processor for processing said digitized acoustic input based on a plurality of reference cluster sets to generate a plurality of candidate phonemes;
wherein the phoneme recognition processor identifies a score or probability for each candidate phoneme; and
wherein at least some of the candidate phonemes are alternative candidate phonemes corresponding to the portion of the acoustic input. - View Dependent Claims (53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84)
-
-
85. A method of processing and recognizing speech, comprising:
-
generating a phoneme stream by processing a digitized speech sample to identify candidate phonemes including at least some alternative candidate phonemes;
permuting candidate phonemes between different time-slices to generate potential words represented by the speech sample; and
generating a list of candidate words for the phoneme stream based on the potential words. - View Dependent Claims (86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116)
-
-
117. A system for processing speech, comprising:
-
means for generating a phoneme stream by processing a digitized speech sample to identify candidate phonemes including at least some alternative candidate phonemes;
a processor for (a) permuting the candidate phonemes to generate potential words represented by the speech sample; and
(b) generating a list of candidate words based on the potential words. - View Dependent Claims (118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142)
-
-
143. A method of processing speech, comprising:
-
inputting an acoustic input comprising digitized speech;
processing said digitized acoustic input to identify a plurality of candidate phonemes;
computing for each candidate phoneme a score or probability;
aggregating at least some of said plurality of candidate phonemes into potential words; and
processing the computed scores or probabilities of the candidate phonemes. - View Dependent Claims (144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159)
-
-
160. A speech processing system, comprising:
-
input means for inputting a digitized speech input;
phoneme recognition means for identifying a plurality of candidate phonemes in said digitized speech input and providing a score or probability for each candidate phoneme;
wherein at least some of the candidate phonemes are alternative candidate phonemes; and
phoneme analysis means for processing said plurality of candidate phonemes into potential words. - View Dependent Claims (161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 398, 399)
-
-
184. A method of processing speech, comprising:
-
processing a speech sample to identify a list of candidate words, wherein at least some of the candidate words are alternative candidate words corresponding to the same or an overlapping portion of the speech sample, permuting at least some of the candidate words to create a plurality of potential syntactic structures; and
selecting one of the potential syntactic structures as corresponding to the speech sample. - View Dependent Claims (185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200)
-
-
201. A speech processing system, comprising:
-
a phoneme recognition unit for identifying candidate phonemes, wherein at least some of the candidate phonemes are alternative candidate phonemes;
a phoneme stream analyzer for identifying a list of candidate words constructed from the candidate phonemes, wherein at least some of the candidate words are alternative candidate words corresponding to the same portion or an overlapping portion of a speech input;
a word permutation unit for permuting the candidate words to create a plurality of potential syntactic structures;
wherein one of the plurality of potential syntactic structures is selected as corresponding to the speech input. - View Dependent Claims (202, 203, 204, 205, 206, 207, 208)
-
-
209. A method of processing speech, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate triphones based on a plurality of reference cluster sets, each cluster set representing reference triphones for a cluster type; and
outputting the identified candidate triphones. - View Dependent Claims (210)
-
-
211. A method of processing speech, comprising:
-
processing a speech input to identify a plurality of syntactic sequences of words, the syntactic sequences of words comprising candidate words, the candidate words and the syntactic sequences of words having at least one associated part of speech;
deriving one or more conceptual representations from at least one of the syntactic sequences of words; and
formulating one or more responses to the speech input based on at least one conceptual representation. - View Dependent Claims (212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245)
-
-
246. A system for processing speech, comprising:
-
means for identifying a plurality of syntactic sequences of words corresponding to a speech input, the syntactic sequences of words comprising candidate words, the candidate words and the syntactic sequences of words having at least one associated part of speech;
means for deriving one or more conceptual representations from at least one of the syntactic sequences of words; and
means for formulating one or more responses to the speech input based on one or more of the conceptual representations. - View Dependent Claims (247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282)
-
-
283. A method of processing speech, comprising:
-
processing a speech input to identify a plurality of syntactic sequences of words, the syntactic sequences of words comprising candidate words, the candidate words and the syntactic sequences of words having at least one associated part of speech;
deriving one or more conceptual representations from at least one of the syntactic sequences of words;
processing at least one of the conceptual representations of at least one of the syntactic sequences of words according to a database of reference conceptual representations; and
formulating one or more responses to the speech input. - View Dependent Claims (284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316)
-
-
317. A system for processing speech, comprising:
-
means for identifying a plurality of syntactic sequences of words corresponding to a speech input, the syntactic sequences of words comprising candidate words, the syntactic sequences of words and candidate words having at least one associated part of speech;
means for deriving one or more conceptual representations from at least one of the syntactic sequences of words;
means for processing at least one of the conceptual representations of the syntactic sequences of words according to a database of reference conceptual representations; and
means for formulating one or more responses to the speech input based on one or more of the conceptual representations. - View Dependent Claims (318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353)
-
-
354. A method for improving accuracy in dictation or transcription, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate phonemes based on a plurality of reference cluster sets, each cluster set representing reference phonemes for a cluster type;
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step;
permuting the candidate phonemes to generate potential words represented by the speech input;
generating a list of candidate words based on the potential words;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures; and
communicating the syntactic sequences of words. - View Dependent Claims (355, 356, 357)
-
-
358. A method for improving accuracy in dictation or transcription, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate words derived from an N-best list of potential words from an application of the HMM technique;
further identifying additional candidate words based on combinations of two or more consecutive N-best list potential words;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures; and
communicating the syntactic sequences of words. - View Dependent Claims (359, 360, 361)
-
-
362. A method for generating punctuation in dictation or transcription, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate phonemes based on a plurality of reference cluster sets, each cluster set representing reference phonemes for a cluster type;
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step;
permuting the candidate phonemes to generate potential words represented by the speech input;
generating a list of candidate words based on the potential words;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures;
generating punctuation based on the syntactic sequences of words; and
communicating the syntactic sequences of words. - View Dependent Claims (363, 364, 365)
-
-
366. A method for generating punctuation in dictation or transcription, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate words derived from an N-best list of potential words from an application of the HMM technique;
further identifying additional candidate words based on combinations of two or more consecutive N-best list potential words;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures;
generating punctuation based on the syntactic sequences of words; and
communicating the syntactic sequences of words. - View Dependent Claims (367, 368, 369)
-
-
370. A method for improving accuracy in dictation or transcription, comprising:
-
inputting an acoustic input of digital speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate words based on the application of the HMM technique;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures;
calculating a conceptual representation for each syntactic sequence of words; and
communicating the syntactic sequence of words related to the first valid calculated conceptual representation. - View Dependent Claims (371, 372, 373)
-
-
374. A method for generating punctuation in dictation or transcription, comprising:
-
inputting an acoustic input of digital speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate words based on the application of the HMM technique;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures;
calculating a conceptual representation for each syntactic sequence of words;
generating punctuation based on the syntactic sequences of words; and
communicating the syntactic sequence of words and punctuation related to the first valid calculated conceptual representation. - View Dependent Claims (375, 376, 377)
-
-
378. A method for improving accuracy in dictation or transcription, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate phonemes based on a plurality of reference cluster sets, each cluster set representing reference phonemes for a cluster type;
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step;
permuting the candidate phonemes to generate potential words represented by the speech input;
generating a list of candidate words based on the potential words;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures;
calculating a conceptual representation for each of the syntactic sequences of words; and
communicating the syntactic sequence of words related to the first valid conceptual representation. - View Dependent Claims (379, 380, 381)
-
-
382. A method for generating punctuation in dictation or transcription, comprising:
-
inputting an acoustic input of digitized speech;
segmenting said digitized acoustic input into a plurality of time-slices;
analyzing each time-slice to identify one or more candidate phonemes based on a plurality of reference cluster sets, each cluster set representing reference phonemes for a cluster type;
outputting a phoneme stream of identified candidate phonemes based on the analysis, wherein at least some time-slices are represented by alternative candidate phonemes based on said analyzing step;
permuting the candidate phonemes to generate potential words represented by the speech input;
generating a list of candidate words based on the potential words;
permuting the candidate words to generate potential syntactic structures while respecting word boundaries of the candidate words;
permuting at least two or more of the candidate words and potential syntactic structures while respecting word boundaries of the candidate words and potential syntactic structures;
generating syntactic sequences of words from the permuted candidate words and potential syntactic structures;
calculating a conceptual representation for each of the syntactic sequences of words;
generating punctuation based on the syntactic sequences of words; and
communicating the syntactic sequence of words and punctuation related to the first valid conceptual representation. - View Dependent Claims (383, 384, 385)
-
-
386. A system for recognizing concepts in speech, comprising:
-
a phoneme recognition unit for identifying candidate phonemes in a digitized input, wherein at least some of the candidate phonemes are alternative candidate phonemes;
a phoneme stream analyzer for identifying a list of candidate words constructed from the candidate phonemes, wherein at least some of the candidate words are alternative candidate words corresponding to the same portion or an overlapping portion of the input;
a word permutation unit for permuting the candidate words to create a plurality of potential syntactic structures, wherein at least one of the plurality of potential syntactic structures is selected as corresponding to the input, wherein further the word permutation unit is further adapted for syntactically validating the potential syntactic structures to render syntactically valid sequences of words;
means for extracting conceptual representations of syntactically valid sequences of words;
means for comparing the conceptual representations to reference data; and
means for communicating one or more successful comparisons of the conceptual representations in relation to the reference data. - View Dependent Claims (387, 388, 389, 390, 391)
-
-
392. A method for recognizing concepts in speech, comprising:
-
identifying candidate phonemes in a digitized input, wherein at least some of the candidate phonemes are alternative candidate phonemes;
identifying a list of candidate words constructed from the candidate phonemes, wherein at least some of the candidate words are alternative candidate words corresponding to the same portion or an overlapping portion of the input;
permuting the candidate words to create a plurality of potential syntactic structures, wherein at least one of the plurality of potential syntactic structures is selected as corresponding to the input;
syntactically validating the potential syntactic structures to render syntactically valid sequences of words;
extracting conceptual representations of syntactically valid sequences of words;
comparing the conceptual representations to reference data; and
communicating one or more successful comparisons of the conceptual representations in relation to the reference data. - View Dependent Claims (393, 394, 395, 396, 397)
-
Specification