System and method for effectively implementing a Mandarin Chinese speech recognition dictionary

US 7,353,174 B2
Filed: 03/31/2003
Issued: 04/01/2008
Est. Priority Date: 03/31/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A system for performing a speech recognition procedure with an electronic device, comprising:

a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, each of said phone strings being implemented as a sequence of phonemes that are serially configured, said optimized phone set being implemented in a compact manner by utilizing an allophone variation technique that maps different pronunciations of said input speech data to a respective one of said phone strings, said vocabulary dictionary being implemented by utilizing one or more dictionary optimization techniques, said optimized phone set representing sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and

a processor configured to control said recognizer to thereby perform said speech recognition procedure.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.

8 Citations

View as Search Results

41 Claims

1. A system for performing a speech recognition procedure with an electronic device, comprising:
- a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, each of said phone strings being implemented as a sequence of phonemes that are serially configured, said optimized phone set being implemented in a compact manner by utilizing an allophone variation technique that maps different pronunciations of said input speech data to a respective one of said phone strings, said vocabulary dictionary being implemented by utilizing one or more dictionary optimization techniques, said optimized phone set representing sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  a processor configured to control said recognizer to thereby perform said speech recognition procedure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 41)
- - 2. The system of claim 1 wherein said input speech data includes Mandarin Chinese language data, said optimized phone set being compactly configured to accurately represent said Mandarin Chinese language data.
  - 3. The system of claim 1 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
  - 4. The system of claim 1 wherein said optimized phone set conserves processing resources and memory resources while performing said speech recognition procedure.
  - 5. The system of claim 1 wherein said phone strings each include a different series of phones from said optimized phone set, each of said phone strings corresponding to a different respective word from said vocabulary dictionary.
  - 6. The system of claim 5 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone strings from said vocabulary dictionary to thereby select said one or more recognized words during said speech recognition procedure.
  - 7. The system of claim 1 wherein said allophonic technique maps a plurality of allophones or phonemes for said different pronunciations to a single phone string of a corresponding dictionary entry.
  - 8. The system of claim 7 wherein said plurality of allophones or phonemes includes pronunciation variations for said dictionary entry based upon geographic pronunciation variations.
  - 9. The system of claim 7 wherein said optimized phone set is implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones.
  - 10. The system of claim 1 wherein said regional variation technique maps regional variations of said input speech data to a corresponding entry in said vocabulary dictionary.
  - 11. The system of claim 10 wherein each of said regional variations of said input speech data exhibits a significant pronunciation variation depending upon a geographical region, said significant pronunciation variation being determined to exceed a pre-defined acceptable variation threshold.
  - 12. The system of claim 11 wherein said regional variations include Mandarin Chinese pronunciation variations from Northern Mandarin Chinese and from Southern Mandarin Chinese.
  - 13. The system of claim 1 wherein said vocabulary dictionary is implemented as a unified dictionary set that includes first dictionary entries that correspond to a first specific regional pronunciation variation of a particular spoken language, and second dictionary entries that correspond to a second specific regional pronunciation variation of said particular spoken language, said first specific regional pronunciation including a Northern Mandarin Chinese pronunciation variation, said second specific regional pronunciation including a Southern Mandarin Chinese pronunciation variation.
  - 14. The system of claim 1 wherein said vocabulary dictionary includes and merges separate entries for free phonemic or allophonic variations that have alternative pronunciations which are not due to regional variations.
  - 15. The system of claim 14 wherein said free phonemic variations comprise a series of pronunciation variation pairs that include a na4˜
    - nei4 pair, a zhe4˜
      
      zhei4 pair, a shui2˜
      
      shei2 pair, a he2˜
      
      han2 pair, and a he2˜
      
      huo2 pair.
  - 16. The system of claim 1 wherein said vocabulary dictionary includes and merges separate dictionary entries for South-North Mandarin dialectal variation pairs in which a final r may be pronounced in Northern China, while said final r may not be utilized in Southern China.
  - 17. The system of claim 16 wherein said North-South Mandarin dialectal variation pairs include a shi4˜
    - shir4 pair, a bian1˜
      
      bianr1 pair, a pian4˜
      
      pianr4 pair, a ge1˜
      
      ger1 pair, a dian3˜
      
      dianr3 pair, a tian1˜
      
      tianr1 pair, a gou3˜
      
      gour3 pair, a ban4˜
      
      banr4 pair, a qiu2˜
      
      qiur2 pair, a wan2˜
      
      wan2 pair, and a zhao1˜
      
      zhao1 pair.
  - 18. The system of claim 1 wherein an affricate technique is employed for implementing said vocabulary dictionary to include and merge an alternative Southern Mandarin Chinese pronunciation of an affricate s^{grave over ( )} with a phoneme t^, because, in Southern China, said affricate s^{grave over ( )} is pronounced closer to said phone t^, said affricate technique thus handling both a Northern Mandarin pronunciation and a Southern Mandarin pronunciation of said affricate s^{grave over ( )}.
  - 19. The system of claim 1 wherein a fricative technique is employed for implementing said vocabulary dictionary to include and merge an alternative Southern Mandarin Chinese pronunciation of a fricative s^ with a phoneme s, because, in Southern China, said fricative s^ is pronounced closer to said phone s, said fricative technique thus handling both a Northern Mandarin pronunciation and a Southern Mandarin pronunciation of said fricative s^.
  - 41. The system of claim 1 wherein said optimized phone set includes only phones b, p, m, f, d, t, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s, n, ng, y, w, yu, a, o, e, i, u, and yu.

20. A method for performing a speech recognition procedure with an electronic device, comprising the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, each of said phone strings being implemented as a sequence of phonemes that are serially configured, said optimized phone set being implemented in a compact manner by utilizing an phonemic and allophonic variation technique that maps different pronunciations of said input speech data to a respective one of said phone strings, said vocabulary dictionary being implemented by utilizing one or more dictionary optimization techniques, said optimized phone set representing sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  controlling said recognizer with a processor to thereby perform said speech recognition procedure.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 21. The method of claim 20 wherein said input speech data includes Mandarin Chinese language data, said optimized phone set being compactly configured to accurately represent said Mandarin Chinese language data.
  - 22. The method of claim 20 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
  - 23. The method of claim 20 wherein said optimized phone set conserves processing resources and memory resources while performing said speech recognition procedure.
  - 24. The method of claim 20 wherein said phone strings each include a different series of phones from said optimized phone set, each of said phone strings corresponding to a different respective word from said vocabulary dictionary.
  - 25. The method of claim 24 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone strings from said vocabulary dictionary to thereby select during said speech recognition procedure.
  - 26. The method of claim 20 wherein said allophonic variation technique maps a plurality of allophones or phonemes for said different pronunciations to a single phone string of a corresponding dictionary entry.
  - 27. The method of claim 26 wherein said plurality of allophones or phonemes include pronunciation variations for said corresponding dictionary entry based upon geographic pronunciation variations.
  - 28. The method of claim 26 wherein said optimized phone set is implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones.
  - 29. The method of claim 20 wherein said regional variation technique maps regional variations of said input speech data to a corresponding entry in said vocabulary dictionary.
  - 30. The method of claim 29 wherein each of said regional variations of said input speech data exhibits a significant pronunciation variation depending upon a geographical region, said significant pronunciation variation being determined to exceed a pre-defined acceptable variation threshold.
  - 31. The method of claim 30 wherein said regional variations include Mandarin Chinese pronunciation variations from Northern Mandarin Chinese and from Southern Mandarin Chinese.
  - 32. The method of claim 20 wherein said vocabulary dictionary is implemented as a unified dictionary set that includes first dictionary entries that correspond to a first specific regional pronunciation variation of a particular spoken language, and second dictionary entries that correspond to a second specific regional pronunciation variation of said particular spoken language, said first specific regional pronunciation including a Northern Mandarin Chinese pronunciation variation, said second specific regional pronunciation including a Southern Mandarin Chinese pronunciation variation.
  - 33. The method of claim 20 wherein said vocabulary dictionary includes and merges separate entries for free phonemic or allophonic variations that have alternative pronunciations, which are not due to regional variations.
  - 34. The method of claim 33 wherein said free phonemic variations comprise a series of pronunciation variation pairs that include a na4˜
    - nei4 pair, a zhe4˜
      
      zhei4 pair, a shui2˜
      
      shei2 pair, a he2˜
      
      han2 pair, and a he2˜
      
      huo2 pair.
  - 35. The method of claim 20 wherein said vocabulary dictionary includes and merges separate dictionary entries for South-North Mandarin dialectal variation pairs in which a final r may be pronounced in Northern China, while said final r may not be utilized in Southern China.
  - 36. The method of claim 35 wherein said North-South Mandarin dialectal variation pairs include a shi4˜
    - shir4 pair, a bian1˜
      
      bianr 1 pair, a pian4˜
      
      pianr4 pair, a ge1˜
      
      ger1 pair, a dian3˜
      
      dianr3 pair, a tian1˜
      
      tianr1 pair, a gou3˜
      
      gour3 pair, a ban4˜
      
      banr4 pair, a qiu2˜
      
      qiur2 pair, a wan2˜
      
      wanr2 pair, and a zshao1 pair.
  - 37. The method of claim 20 wherein an affricate technique is employed for implementing said vocabulary dictionary to include and merge an alternative Southern Mandarin Chinese pronunciation of an affricate s^{grave over ( )} with a phone t^, because, in Southern China, said affricate s^{grave over ( )} is pronounced closer to said phone t^, said affricate technique thus handling both a Northern Mandarin pronunciation and a Southern Mandarin pronunciation of said affricate s^{grave over ( )}.
  - 38. The method of claim 20 wherein a fricative technique is employed for implementing said vocabulary dictionary to include and merge an alternative Southern Mandarin Chinese pronunciation of a fricative s^ with a phone s, because, in Southern China, said fricative s^ is pronounced closer to said phone s, said fricative technique thus handling both a Northern Mandarin pronunciation and a Southern Mandarin pronunciation of said fricative s^.

39. A computer-readable medium encoded with a computer program for performing a speech recognition, by performing the steps of:
- configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, each of said phone strings being implemented as a sequence of phonemes that are serially configured, said optimized phone set being implemented in a compact manner by utilizing a phonemic and allophonic variation technique that maps different pronunciations of said input speech data to a respective one of said phone strings, said vocabulary dictionary being implemented by utilizing one or more dictionary optimization techniques, said optimized phone set representing sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said recognizer thus performing said Cantonese speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  controlling said recognizer with a processor to thereby perform said speech recognition procedure.

40. A system for performing a speech recognition procedure with an electronic device, comprising:
- means for comparing input speech data to phone strings from a vocabulary dictionary to thereby generate and output one or more recognized words from said vocabulary dictionary, said vocabulary dictionary being implemented according to an optimized phone set, each of said phone strings being implemented as a sequence of phonemes that are serially configured, said optimized phone set being implemented in a compact manner by utilizing a phonemic and allophonic variation technique that maps different pronunciations of said input speech data to a respective one of said phone strings, said vocabulary dictionary being implemented by utilizing one or more dictionary optimization techniques, said optimized phone set representing sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set, said means for comparing thus performing said speech recognition procedure without utilizing any type of tone data to thereby output said one or more recognized words as a final speech recognition result; and
  
  means for controlling said means for comparing to thereby perform said speech recognition procedure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ironworks Patents LLC, Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Menendez-Pidal, Xavier, Duan, Lei, Lu, Jingwen, Olorenshaw, Lex
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Siedler, Dorothy S

Application Number

US10/403,747
Publication Number

US 20040193417A1
Time in Patent Office

1,828 Days
Field of Search

704/251, 704/253, 704/254, 704/256.1, 704/256.2
US Class Current

704/254
CPC Class Codes

G10L 15/187 Phonemic context, e.g. pron...

G10L 25/15 the extracted parameters be...

System and method for effectively implementing a Mandarin Chinese speech recognition dictionary

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

41 Claims

Specification

Use Cases

Quick Links

Others

System and method for effectively implementing a Mandarin Chinese speech recognition dictionary

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

41 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others