System and method for cantonese speech recognition using an optimized phone set

US 7,353,172 B2
Filed: 03/24/2003
Issued: 04/01/2008
Est. Priority Date: 03/24/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising:

a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said Cantonese speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and

a processor configured to control said recognizer to thereby perform said Cantonese speech recognition procedure.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.

Citations

43 Claims

1. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising:
- a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said Cantonese speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  a processor configured to control said recognizer to thereby perform said Cantonese speech recognition procedure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 43)
- - 2. The system of claim 1 wherein said input speech data includes Cantonese language data, said optimized phone set being compactly configured to accurately represent said Cantonese language data.
  - 3. The system of claim 1 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
  - 4. The system of claim 1 wherein said optimized phone set conserves processing resources and memory resources while performing said speech recognition procedure.
  - 5. The system of claim 1 wherein said optimized phone set reduces training requirements for performing a recognizer training procedure to initially implement said recognizer.
  - 6. The system of claim 1 wherein said phone strings each include a different series of phones from said optimized phone set, each of said phone strings corresponding to a different word from said vocabulary dictionary.
  - 7. The system of claim 6 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone strings from said vocabulary dictionary to thereby select said one or more recognized words during said speech recognition procedure.
  - 8. The system of claim 1 wherein said optimized phone set includes phones b, d, g, p, t, k, m, n, ng, f, 1, h, z, c, s, w, j, cl, sil, aa, i, u, e, o, yu, oe, eo, a, eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.
  - 9. The system of claim 1 wherein said optimized phone set includes consonantal phones b, d, g, p, t,k, m, n, ng, f, 1, h, z, c, s, w, and j.
  - 10. The system of claim 1 wherein said optimized phone set includes a closure phone “
    - cl” and
      
      a silence phone “
      
      sil”
      
      .
  - 11. The system of claim 1 wherein said optimized phone set includes vocalic phones aa, i, u, e, o, yu, oe, eo, a, eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.
  - 12. The system of claim 1 wherein said optimized phone set represents certain diphthongs by utilizing unified diphthong phones to thereby conserve processing resources and memory resources while providing greater accuracy characteristics for said speech recognition procedure.
  - 13. The system of claim 12 wherein said optimized phone set includes unified diphthong phones eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.
  - 14. The system of claim 1 wherein said optimized phone set represents a certain lip rounding by utilizing a separate lip rounding phone “
    - w”
      
      after a consonantal phone “
      
      g”
      
      .
  - 15. The system of claim 1 wherein said optimized phone set represents a certain lip rounding by utilizing a separate lip rounding phone “
    - w”
      
      after a consonantal phone “
      
      k”
      
      .
  - 20. The system of claim 1 wherein said consonantal phones and said vocalic phones from said optimized phone set are combined to represent syllables from a Cantonese language system.
  - 43. The system of claim 1 wherein said optimized phone set includes only phones b, d, g, p, t, k, m, n, ng, f, l, h, z, c, s, w, j, cl, sil, aa, i, u, e, o, yu, oe, eo, a, eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.

16. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising:
- a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-initial context in which a stop is located at a beginning of a syllable, said optimized phone set responsively utilizing an appropriate consonant phone “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-initial context to represent a corresponding consonant and a preceding closure; and
  
  a processor configured to control said recognizer to thereby perform said Cantonese speech recognition procedure.

17. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising:
- a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-final/midphrase context in which a stop is located at an end of a word in a middle of a phrase, said optimized phone set responsively utilizing an appropriate consonant phone “
  
  P”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-final/midphrase context to represent a corresponding consonant and a preceding closure; and
  
  a processor configured to control said recognizer to thereby perform said Cantonese speech recognition procedure.

18. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising:
- a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-final/phrase-end context in which a stop is located at an end of a word at an end of a phrase, said optimized phone set responsively utilizing a same identical closure phone “
  
  cl”
  
  in said syllable-final/phrase-end context to represent either “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  consonants as a closure only without any subsequent releasing consonant sound; and
  
  a processor configured to control said recognizer to thereby perform said Cantonese speech recognition procedure.

19. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising:
- a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-initial context in which a first stop is located at a beginning of a syllable, a syllable-final/midphrase context in which a second stop is located at an end of a first word in a middle of a phrase, and a syllable-final/phrase-end context in which a third stop is located at an end of a second word at an end of said phrase, said optimized phone set utilizing an appropriate consonant phone “
  
  b”
  
  , “
  
  d”
  
  , “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-initial context to represent a corresponding consonant and a preceding closure, said optimized phone set responsively utilizing said appropriate consonant phone “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-final/midphrase context to represent said corresponding consonant and said preceding closure, said optimized phone set responsively utilizing a same identical closure phone “
  
  cl”
  
  in said syllable-final/phrase-end context to represent either. “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  as a closure only without any subsequent releasing consonant anda processor configured to control said recognizer to thereby perform said Cantonese speech recognition procedure.

21. A method for performing a Cantonese speech recognition procedure with an electronic device, comprising the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said Cantonese speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  controlling said recognizer with a processor to thereby perform said Cantonese speech recognition procedure.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 40)
- - 22. The method of claim 21 wherein said input speech data includes Cantonese language data, said optimized phone set being compactly configured to accurately represent said Cantonese language data.
  - 23. The method of claim 21 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
  - 24. The method of claim 21 wherein said optimized phone set conserves processing resources and memory resources while performing said speech recognition procedure.
  - 25. The method of claim 21 wherein said optimized phone set reduces training requirements for performing a recognizer training procedure to initially implement said recognizer.
  - 26. The method of claim 21 wherein said phone strings each include a different series of phones from said optimized phone set, each of said phone strings corresponding to a different word from said vocabulary dictionary.
  - 27. The method of claim 26 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone strings from said vocabulary dictionary to thereby select said one or more recognized words during said speech recognition procedure.
  - 28. The method of claim 21 wherein said optimized phone set includes phones b, d, g, p, t, k, m, n, ng, f, l, h, z, c, s, w, j, cl, sil, aa, i, u, e, o, yu, oe, eo, a, eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.
  - 29. The method of claim 21 wherein said optimized phone set includes consonantal phones b, d, g, p, t, k, m, n, ng, f, 1, h, z, c, s, w, and j.
  - 30. The method of claim 21 wherein said optimized phone set includes a closure phone “
    - cl” and
      
      a silence phone “
      
      sil”
      
      .
  - 31. The method of claim 21 wherein said optimized phone set includes vocalic phones aa, i, u, e, o, yu, oe, eo, a, eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.
  - 32. The method of claim 21 wherein said optimized phone set represents certain diphthongs by utilizing unified diphthong phones to thereby conserve processing resources and memory resources while providing greater accuracy characteristics for said speech recognition procedure.
  - 33. The method of claim 32 wherein said optimized phone set includes unified diphthong phones eu, aai, aau, ai, au, ei, oi, ou, eoi, ui, and iu.
  - 34. The method of claim 21 wherein said optimized phone set represents a certain lip rounding by utilizing a separate lip rounding phone “
    - w”
      
      after a consonantal phone “
      
      g”
      
      .
  - 35. The method of claim 21 wherein said optimized phone set represents a certain lip rounding by utilizing a separate lip rounding phone “
    - w”
      
      after a consonantal phone “
      
      k”
      
      .
  - 40. The method of claim 21 wherein said consonantal phones and said vocalic phones from said optimized phone set are combined to represent syllables from a Cantonese language system.

36. A method for performing a Cantonese speech recognition procedure with an electronic device, comprising the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-initial context in which a stop is located at a beginning of a syllable, said optimized phone set responsively utilizing an appropriate consonant phone “
  
  b”
  
  , “
  
  d”
  
  , “
  
  g”
  
  , “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-initial context to represent a corresponding consonant and a preceding closure; and
  
  controlling said recognizer with a processor to thereby perform said Cantonese speech recognition procedure.

37. A method for performing a Cantonese speech recognition procedure with an electronic device. comprising the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-final/midphrase context in which a stop is located at an end of a word in a middle of a phrase, said optimized phone set responsively utilizing an appropriate consonant phone “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-final/midphrase context to represent a corresponding consonant and a preceding; and
  
  controlling said recognizer with a processor to thereby perform said Cantonese speech recognition procedure.

38. A method for performing a Cantonese speech recognition procedure with an electronic device, comprising the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-final/phrase-end context in which a stop is located at an end of a word at an end of a phrase, said optimized phone set responsively utilizing a same identical closure phone “
  
  cl”
  
  in said syllable-final/phrase-end context to represent either “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  as a closure only without any subsequent releasing consonant sound; and
  
  controlling said recognize with a processor to thereby perform said Cantonese speech recognition procedure.

39. A method for performing a Cantonese speech recognition procedure with an electronic device, comprising the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said input speech data including a syllable-initial context in which a first stop is located at a beginning of a syllable, a syllable-final/midphrase context in which a second stop is located at an end of a first word in a middle of a phrase, and a syllable-final/phrase-end context in which a third stop is located at an end of a second word at an end of said phrase, said optimized phone set utilizing an appropriate consonant phone “
  
  b”
  
  , “
  
  d”
  
  , “
  
  g”
  
  , “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-initial context to represent a corresponding consonant and a preceding closure, said optimized phone set responsively utilizing an appropriate consonant phone “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  in said syllable-final/midphrase context to represent said corresponding consonant and a preceding closure, said optimized phone set responsively utilizing a same identical closure phone “
  
  cl”
  
  in said syllable-final/phrase-end context to represent either “
  
  p”
  
  , “
  
  t”
  
  , or “
  
  k”
  
  as a closure only without any subsequent releasing consonant sound; and
  
  controlling said recognizer with a processor to thereby perform said Cantonese speech recognition procedure.

41. A computer-readable medium encoded with a computer program for performing a Cantonese speech recognition procedure, by performing the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said Cantonese speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  controlling said recognizer with a processor to thereby perform said Cantonese speech recognition procedure.

42. A system for performing a Cantonese speech recognition procedure with an electronic device, comprising the steps of:
- means for comparing input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, one or more of said phone strings including more than two phones from said consonantal phones and said vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal, phones and said vocalic phones, said optimized phone set representing sounds of a Cantonese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said means for comparing thus performing said Cantonese speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  means for controlling said means for comparing to thereby perform said Cantonese speech recognition procedure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ironworks Patents LLC, Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Emonts, Michael, Menendez-Pidal, Xavier, Olorenshaw, Lex
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Siedler, Dorothy S

Application Number

US10/395,352
Publication Number

US 20040193418A1
Time in Patent Office

1,835 Days
Field of Search

704/251, 704/253, 704/254, 704/256.1, 704/256.2
US Class Current

704/254
CPC Class Codes

G10L 15/187 Phonemic context, e.g. pron...

G10L 2015/025 Phonemes, fenemes or fenone...

System and method for cantonese speech recognition using an optimized phone set

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for cantonese speech recognition using an optimized phone set

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links