System and method for speech recognition using an enhanced phone set
First Claim
1. A system for speech processing, comprising:
- speech data generated from one or more speech sources;
an enhanced phone set that includes acoustic-phonetic symbols and connectors for extending said enhanced phone set, said enhanced phone set including a TIMIT base-phone set and an extended base-phone set, said extended base-phone set including a base phone for representing an articulator noise;
a transcription generated by a transcription process that selects appropriate phones from said enhanced phone set to represent said speech data; and
transformation rules applied to said enhanced phone set to produce a transformed phone dataset, said transformed phone dataset being used in building a phonetic dictionary.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for speech recognition using an enhanced phone set comprises speech data, an enhanced phone set, and a transcription generated by a transcription process. The transcription process selects appropriate phones from the enhanced phone set to represent acoustic-phonetic content of the speech data. The enhanced phone set includes base-phones and composite-phones. A phone dataset includes the speech data and the transcription. The present invention also comprises a transformer that applies transformation rules to the phone dataset to produce a transformed phone dataset. The transformed phone dataset may be utilized in training a speech recognizer, such as a Hidden Markov Model. Various types of transformation rules may be applied to the phone dataset of the present invention to find an optimum transformed phone dataset for training a particular speech recognizer.
24 Citations
51 Claims
-
1. A system for speech processing, comprising:
-
speech data generated from one or more speech sources; an enhanced phone set that includes acoustic-phonetic symbols and connectors for extending said enhanced phone set, said enhanced phone set including a TIMIT base-phone set and an extended base-phone set, said extended base-phone set including a base phone for representing an articulator noise; a transcription generated by a transcription process that selects appropriate phones from said enhanced phone set to represent said speech data; and transformation rules applied to said enhanced phone set to produce a transformed phone dataset, said transformed phone dataset being used in building a phonetic dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for speech processing, comprising:
-
speech data generated from one or more speech sources; an enhanced phone set that includes acoustic-phonetic symbols and connectors for extending said enhanced phone set, said enhanced phone set including a TIMIT base-phone set and an extended base-phone set; a transcription generated by a transcription process that selects appropriate phones from said enhanced phone set to represent said speech data, said acoustic-phonetic symbols being utilized in said transcription process to represent acoustic-phonetic processes of said speech data, said acoustic-phonetic processes represented by said acoustic-phonetic symbols including an epenthetic vowel; and transformation rules applied to said enhanced phone set to produce a transformed phone dataset, said transformed phone dataset being used in building a phonetic dictionary.
-
-
22. A method for speech processing, comprising:
-
generating speech data from one or more speech sources; providing an enhanced phone set that includes acoustic-phonetic symbols and connectors for extending said enhanced phone set, said enhanced phone set including a TIMIT base-phone set and an extended base-phone set, said extended base-phone set including a base phone for representing an articulator noise; producing a transcription using a transcription process that selects appropriate phones from said enhanced phone set to represent said speech data; and applying transformation rules to said enhanced phone set to produce a transformed phone dataset, said transformed phone dataset being used in building a phonetic dictionary. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A method for speech processing, comprising:
-
generating speech data from one or more speech sources; providing an enhanced phone set that includes acoustic-phonetic symbols and connectors for extending said enhanced phone set, said enhanced phone set including a TIMIT base-phone set and an extended base-phone set; producing a transcription using a transcription process that selects appropriate phones from said enhanced phone set to represent said speech data, said acoustic-phonetic symbols being utilized in said transcription process to represent acoustic-phonetic processes of said speech data, said acoustic-phonetic processes represented by said acoustic-phonetic symbols including an epenthetic vowel; and applying transformation rules to said enhanced phone set to produce a transformed phone dataset, said transformed phone dataset being used in building a phonetic dictionary.
-
-
43. A method for speech processing, comprising:
-
providing an enhanced phone set that includes enhanced base-phones for representing input speech data, acoustic-phonetic symbols that represent acoustic-phonetic content of said input speech data, and connectors for extending said enhanced base-phones by selectively connecting said acoustic-phonetic symbols to said enhanced base-phones to create composite enhanced phones, said enhanced base-phones including articulator noises l#, b#, hh#, w#, g#, ly#, ll#, and lq#; producing a detailed transcription of said input speech data using a transcription process that selects appropriate phones from said enhanced phone set to represent said input speech data; and applying transformation rules to said transcription to produce a transformed transcription, said transformed transcription being used to create a phonetic dictionary for a speech recognition process.
-
-
44. A method for speech processing, comprising:
-
providing an enhanced phone set that includes enhanced base-phones for representing input speech data, acoustic-phonetic symbols that represent acoustic-phonetic content of said input speech data, and connectors for extending said enhanced base-phones by selectively connecting said acoustic-phonetic symbols to said enhanced base-phones to create composite enhanced phones; producing a detailed transcription of said input speech data using a transcription process that selects appropriate phones from said enhanced phone set to represent said input speech data, said acoustic-phonetic processes represented by said acoustic-phonetic symbols including an epenthetic vowel; and applying transformation rules to said transcription to produce a transformed transcription, said transformed transcription being used to create a phonetic dictionary for a speech recognition process. - View Dependent Claims (45, 46, 47)
-
-
48. A method for speech processing, comprising:
-
providing an enhanced phone set that includes enhanced base-phones for representing input speech data, acoustic-phonetic symbols that represent acoustic-phonetic content of said input speech data, and connectors for extending said enhanced base-phones by selectively connecting said acoustic-phonetic symbols to said enhanced base-phones to create composite enhanced phones; producing a detailed transcription of said input speech data using a transcription process that selects appropriate phones from said enhanced phone set to represent said input speech data; and applying transformation rules to said transcription to produce a transformed transcription, said transformed transcription being used to create a phonetic dictionary for a speech recognition process, said transformation rules including a first merge rule that merges an original phone bcl and an original phone b into a merged phone b, said transformation rules including a second merge rule that merges an original phone tcl and an original phone t into a merged phone t, said transformation rules including a third merge rule that merges an original phone kcl and an original phone k into a merged phone k.
-
-
49. A method for speech processing, comprising:
-
providing an enhanced phone set that includes enhanced base-phones for representing input speech data, acoustic-phonetic symbols that represent acoustic-phonetic content of said input speech data, and connectors for extending said enhanced base-phones by selectively connecting said acoustic-phonetic symbols to said enhanced base-phones to create composite enhanced phones; producing a detailed transcription of said input speech data using a transcription process that selects appropriate phones from said enhanced phone set to represent said input speech data; and applying transformation rules to said transcription to produce a transformed transcription, said transformed transcription being used to create a phonetic dictionary for a speech recognition process, said transformation rules including a first split rule that splits an original phone em into a first split phone ah and a second split phone m, said transformation rules including a second split rule that splits an original phone or into a first split phone ao and a second split phone r, said transformation rules including a third split rule that splits an original phone al into a first split phone aa and a second split phone l, said transformation rules including a fourth split rule that splits an original phone aa=n into a first split phone aa and a second split phone n.
-
-
50. A method for speech processing, comprising:
-
providing an enhanced phone set that includes enhanced base-phones for representing input speech data, acoustic-phonetic symbols that represent acoustic-phonetic content of said input speech data, and connectors for extending said enhanced base-phones by selectively connecting said acoustic-phonetic symbols to said enhanced base-phones to create composite enhanced phones; producing a detailed transcription of said input speech data using a transcription process that selects appropriate phones from said enhanced phone set to represent said input speech data; and applying transformation rules to said transcription to produce a transformed transcription, said transformed transcription being used to create a phonetic dictionary for a speech recognition process, said transformation rules including a first replace rule that replaces an original phone gg with a replacement phone g, said transformation rules including a second replace rule that replaces an original phone qclq with a replacement phone q, said transformation rules include a third replace rule that replaces an original phone p=v with a replacement phone b.
-
-
51. A method for speech processing, comprising:
-
providing an enhanced phone set that includes enhanced base-phones for representing input speech data, acoustic-phonetic symbols that represent acoustic-phonetic content of said input speech data, and connectors for extending said enhanced base-phones by selectively connecting said acoustic-phonetic symbols to said enhanced base-phones to create composite enhanced phones; producing a detailed transcription of said input speech data using a transcription process that selects appropriate phones from said enhanced phone set to represent said input speech data; and applying transformation rules to said transcription to produce a transformed transcription, said transformed transcription being used to create a phonetic dictionary for a speech recognition process, said transformation rules including a change-in-context rule that replaces an original phone aa=n with a changed-context phone aa<
n m ng.
-
Specification