System and method for speech recognition utilizing a merged dictionary

US 7,181,396 B2
Filed: 03/24/2003
Issued: 02/20/2007
Est. Priority Date: 03/24/2003
Status: Active Grant

First Claim

Patent Images

1. A system for performing a speech recognition procedure, comprising:

a sound sensor that converts a spoken utterance into input speech data;

a recognizer configured to compare said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented by utilizing a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having an assertive particle context to indicate said intended mood of said initial speaker of said input speech data; and

a processor configured to control said recognizer to perform said speech recognition procedure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention comprises a system and method for speech recognition utilizing a merged dictionary, and may include a recognizer that is configured to compare input speech data to a series of dictionary entries from the merged dictionary to detect a recognized phrase or command. The merged dictionary may be implemented by utilizing a merging technique that maps two or more related phrases or commands with similar meanings to a single one of the dictionary entries. The recognizer may thus achieve more accurate speech recognition accuracy by merging phrases or commands which might otherwise be erroneously mistaken for each other.

12 Citations

View as Search Results

38 Claims

1. A system for performing a speech recognition procedure, comprising:
- a sound sensor that converts a spoken utterance into input speech data;
  
  a recognizer configured to compare said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented by utilizing a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having an assertive particle context to indicate said intended mood of said initial speaker of said input speech data; and
  
  a processor configured to control said recognizer to perform said speech recognition procedure.
- View Dependent Claims (2)
- - 2. The system of claim 1 wherein said assertive particle context includes no final particle after said one of said two or more related phrases.

3. A system for performing a speech recognition procedure, comprising:
- a sound sensor that converts a spoken utterance into input speech data;
  
  a recognizer configured to compare said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented by utilizing a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having a neutral particle context to indicate said intended mood of said initial speaker of said input speech data; and
  
  a processor configured to control said recognizer to perform said speech recognition procedure.
- View Dependent Claims (4)
- - 4. The system of claim 3 wherein said neutral particle context includes a final particle “
    - aa3”
      
      after said one of said two or more related phrases.

5. A system for performing a speech recognition procedure, comprising:
- a sound sensor that converts a spoken utterance into input speech data;
  
  a recognizer configured to compare said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented by utilizing a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having a polite particle context to indicate said intended mood of said initial speaker of said input speech data; and
  
  a processor configured to control said recognizer to perform said speech recognition procedure.
- View Dependent Claims (6)
- - 6. The system of claim 5 wherein said polite particle context includes a final particle “
    - laa1”
      
      after said one of said two or more related phrases.

7. A system for performing a speech recognition procedure, comprising:
- a sound sensor that converts a spoken utterance into input speech data;
  
  a recognizer configured to compare said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented by utilizing a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merged dictionary being implemented to include dictionary entries that represent phone strings of a Cantonese language without utilizing corresponding tonal information as part of said phone strings; and
  
  a processor configured to control said recognizer to perform said speech recognition procedure.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 8. The system of claim 7 wherein said input speech data includes Cantonese language data, said merged dictionary being configured to accurately represent a pre-determined recognition vocabulary for analyzing said Cantonese language data.
  - 9. The system of claim 7 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
  - 10. The system of claim 7 wherein said merging technique of said merged dictionary prevents said recognizer from mistaking one of said related phrases for another of said related phrases during said speech recognition procedure.
  - 11. The system of claim 7 wherein each of said dictionary entries includes a command and an associated phone string that indicates pronunciation characteristics of said command.
  - 12. The system of claim 11 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone string from each of said commands in said vocabulary dictionary to thereby select a recognized word.
  - 13. The system of claim 7 wherein said merging technique is based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data.
  - 14. The system of claim 13 wherein each of said two or more related phrases includes a command followed by said particle context.
  - 15. The system of claim 14 wherein said merged dictionary maps each of said two or more related phrases to a merged dictionary entry corresponding to a polite particle context.
  - 16. The system of claim 14 wherein said particle context includes individual particles “
    - a”
      
      , “
      
      aa”
      
      , “
      
      laa”
      
      , “
      
      lo”
      
      , “
      
      o”
      
      , “
      
      ga”
      
      , and “
      
      ge”
      
      .
  - 17. The system of claim 7 wherein said merging technique is not utilized with a command that is followed by an altering particle because said altering particle substantially changes what said command means.
  - 18. The system of claim 17 wherein said altering particle includes at least one of a “
    - maa”
      
      particle and a “
      
      ne”
      
      particle that follow said command.
  - 19. The system of claim 7 wherein said merging technique is utilized to map two or more related phrases with non-similar pronunciations but similar meanings to a single one of said dictionary entries.

20. A method for performing a speech recognition procedure, comprising:
- converting a spoken utterance into input speech data by using a sound sensor;
  
  utilizing a recognizer for comparing said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented with a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having an assertive particle context to indicate said intended mood of said initial speaker of said input speech data.
- View Dependent Claims (21)
- - 21. The method of claim 20 wherein said assertive particle context includes no final particle after said one of said two or more related phrases.

22. A method for performing a speech recognition procedure, comprising:
- converting a spoken utterance into input speech data by using a sound sensor;
  
  utilizing a recognizer for comparing said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented with a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having a neutral particle context to indicate said intended mood of said initial speaker of said input speech data.
- View Dependent Claims (23)
- - 23. The method of claim 22 wherein said neutral particle context includes a final particle “
    - aa3”
      
      after said one of said two or more related phrases.

24. A method for performing a speech recognition procedure, comprising:
- converting a spoken utterance into input speech data by using a sound sensor;
  
  utilizing a recognizer for comparing said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented with a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merging technique being based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data, each of said two or more related phrases including a command followed by said particle context, one of said two or more related phrases having a polite particle context to indicate said intended mood of said initial speaker of said input speech data.
- View Dependent Claims (25)
- - 25. The method of claim 24 wherein said polite particle context includes a final particle “
    - laa1”
      
      after said one of said two or more related phrases.

26. A method for performing a speech recognition procedure, comprising:
- converting a spoken utterance into input speech data by using a sound sensor;
  
  utilizing a recognizer for comparing said input speech data to dictionary entries from a merged dictionary, said merged dictionary being implemented with a merging technique that maps two or more related phrases with similar meanings to a single one of said dictionary entries, said two or more related phrases each having a different final particle that does not alter a basic shared meaning of said two or more related phrases, said merged dictionary being implemented to include dictionary entries that represent phone strings of a Cantonese language without utilizing corresponding tonal information as part of said phone strings.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 27. The method of claim 26 wherein said merging technique is utilized to map two or more related phrases with non-similar pronunciations but similar meanings to a single one of said dictionary entries.
  - 28. The method of claim 26 wherein said input speech data includes Cantonese language data, said merged dictionary being configured to accurately represent a pre-determined recognition vocabulary for analyzing said Cantonese language data.
  - 29. The method of claim 26 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
  - 30. The method of claim 26 wherein said merging technique of said merged dictionary prevents mistaking one of said related phrases for another of said related phrases during said speech recognition procedure.
  - 31. The method of claim 26 wherein each of said dictionary entries includes a command and an associated phone string that indicates pronunciation characteristics of said command.
  - 32. The method of claim 31 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone string from each of said commands in said vocabulary dictionary to thereby select a recognized word.
  - 33. The method of claim 26 wherein said merging technique is based upon a particle context from each of said two or more related phrases, said particle context indicating an intended mood of an initial speaker of said input speech data.
  - 34. The method of claim 33 wherein each of said.two or more related phrases includes a command followed by said particle context.
  - 35. The method of claim 34 wherein said merged dictionary maps each of said two or more related phrases to a merged dictionary entry corresponding to a polite particle context.
  - 36. The method of claim 34 wherein said particle context includes individual particles “
    - a”
      
      , “
      
      aa”
      
      , “
      
      laa”
      
      , “
      
      lo”
      
      , “
      
      o”
      
      , “
      
      ga”
      
      , and “
      
      ge”
      
      .
  - 37. The method of claim 26 wherein said merging technique is not utilized with a command that is followed by an altering particle because said altering particle substantially changes what said command means.
  - 38. The method of claim 37 wherein said altering particle includes at least one of a “
    - maa”
      
      particle and a “
      
      ne”
      
      particle that follow said command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Emonts, Michael, Menendez-Pidal, Xavier, Olorenshaw, Lex
Primary Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US10/395,492
Publication Number

US 20040193416A1
Time in Patent Office

1,429 Days
Field of Search

704/9, 704/10, 704243-244, 704/251, 704/255, 704/256
US Class Current

704/251
CPC Class Codes

G10L 15/187 Phonemic context, e.g. pron...

System and method for speech recognition utilizing a merged dictionary

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

12 Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for speech recognition utilizing a merged dictionary

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links