Robot apparatus, method and device for recognition of letters or characters, control program and recording medium

US 7,088,853 B2
Filed: 12/31/2002
Issued: 08/08/2006
Est. Priority Date: 05/02/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A robot apparatus acting autonomously based on an inner state of the robot apparatus, comprising:

storage means for speech recognition, as a dictionary for speech recognition, having stored therein the relationship of correspondence between a word and the pronunciation information thereof;

word sound expression storage means, as a table for word sound expressions, having stored therein the relationship of correspondence between the word and the word reading expressing letters thereof;

imaging means for photographing an object;

image recognition means for extracting the predetermined patterns of images from the image photographed by said imaging means;

sound collecting means for acquiring the surrounding sound;

speech recognition means for recognizing the speech from the sound collected by said sound collecting means;

reading information generating means for conferring plural word reading expressing letters, inferred from the predetermined patterns of images extracted by said image recognition means, based on said table for word sound expressions, and for generating the pronunciation information corresponding to the reading for each of the plural word reading expressing letters or characters thus conferred; and

storage controlling means for comparing the pronunciation information generated by said reading information generating means to the pronunciation information of the speech recognized by said speech recognition means and newly storing the closest information of pronunciation in said dictionary for speech recognition as being the pronunciation information corresponding to the pattern recognition result extracted by said image recognition means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A plural number of letters or characters, inferred from the results of letter/character recognition of an image photographed by a CCD camera (20), a plural number of kana readings inferred from the letters or characters and the way of pronunciation corresponding to the kana readings are generated in an pronunciation information generating unit (150) and the plural readings obtained are matched to the pronunciation from the user acquired by a microphone (23) to specify one kana reading and the way of pronunciation (reading) from among the plural generated candidates.

Citations

17 Claims

1. A robot apparatus acting autonomously based on an inner state of the robot apparatus, comprising:
- storage means for speech recognition, as a dictionary for speech recognition, having stored therein the relationship of correspondence between a word and the pronunciation information thereof;
  
  word sound expression storage means, as a table for word sound expressions, having stored therein the relationship of correspondence between the word and the word reading expressing letters thereof;
  
  imaging means for photographing an object;
  
  image recognition means for extracting the predetermined patterns of images from the image photographed by said imaging means;
  
  sound collecting means for acquiring the surrounding sound;
  
  speech recognition means for recognizing the speech from the sound collected by said sound collecting means;
  
  reading information generating means for conferring plural word reading expressing letters, inferred from the predetermined patterns of images extracted by said image recognition means, based on said table for word sound expressions, and for generating the pronunciation information corresponding to the reading for each of the plural word reading expressing letters or characters thus conferred; and
  
  storage controlling means for comparing the pronunciation information generated by said reading information generating means to the pronunciation information of the speech recognized by said speech recognition means and newly storing the closest information of pronunciation in said dictionary for speech recognition as being the pronunciation information corresponding to the pattern recognition result extracted by said image recognition means.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The robot apparatus as recited in claim 1 further comprising:
    - transient storage means, as a transient dictionary, for transiently storing the correspondence between a plurality of letters extracted from said image and a plurality of information of pronunciation conferred to said letters or characters.
  - 3. The robot apparatus as recited in claim 1 further comprising:
    - word information storage means, as word attribute table, having stored therein the word information including a word, word reading expressing letters for said word and an attribute of said word, said storage controlling means causing said word attribute to be stored in said dictionary for speech recognition in association with said letter or character being newly registered and the pronunciation information of said letter or character.
  - 4. The robot apparatus as recited in claim 3 further comprising:
    - dialog management means for generating a reply to the speech recognized by said speech recognition means;
      
      said dialog management means employing said word attribute in reply rules to the speech.
  - 5. The robot apparatus as recited in claim 1 wherein said speech recognition means recognizes the speech based on the hidden Markov model method.

6. A letter/character recognition device comprising:
- storage means for speech recognition, as a dictionary for speech recognition, having stored therein the relationship of correspondence between a word and the pronunciation information thereof;
  
  word sound expression storage means, as a table for word sound expressions, having stored therein the relationship of correspondence between the word and the word reading expressing letters thereof;
  
  imaging means for photographing an object;
  
  image recognition means for extracting the predetermined patterns of images from the image photographed by said imaging means;
  
  sound collecting means for acquiring the surrounding sound;
  
  speech recognition means for recognizing the speech from the sound collected by said sound collecting means;
  
  reading information generating means for conferring plural word reading expressing letters, inferred from the predetermined patterns of images extracted by said image recognition means, based on said table for word sound expressions, and for generating the pronunciation information for each of the plural word reading expressing letters or characters thus conferred; and
  
  storage controlling means for comparing the pronunciation information generated by said reading information generating means to the speech information of the speech recognized by said speech recognition means and newly storing the closest information of pronunciation in said dictionary for speech recognition as being the pronunciation information corresponding to the pattern recognition result extracted by said image recognition means.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The letter/character recognition device as recited in claim 6 further comprising:
    - transient storage means, as a transient dictionary, for transiently storing the correspondence between a plurality of letters extracted from said image and a plurality of information of pronunciation conferred to said letters or characters.
  - 8. The letter/character recognition device as recited in claim 6 further comprising:
    - word information storage means, as word attribute table, having stored therein the word information including a word, word reading expressing letters for said word and an attribute of said word, said storage controlling means causing said word attribute to be stored in said dictionary for speech recognition in association with said letter or character being newly registered and the pronunciation information of said letter or character.
  - 9. The letter/character recognition device as recited in claim 8 further comprising:
    - dialog management means for generating a reply to the speech recognized by said speech recognition means;
      
      said dialog management means employing said word attribute in reply rules to the speech.
  - 10. The letter/character recognition device as recited in claim 6 wherein said speech recognition means recognizes the speech based on the hidden Markov model method.

11. A letter/character recognition method comprising:
- an imaging step of imaging an object;
  
  an image recognition step of extracting predetermined patterns of images from an image photographed by said imaging step;
  
  a sound collecting step of collecting the surrounding sound;
  
  a speech recognition step of recognizing the speech from the sound acquired by said sound collecting step;
  
  a reading information generating step of conferring plural word reading expressing letters, inferred from the predetermined patterns of images extracted by said image recognition step, based on a table for word sound expressions, having stored therein the relationship of correspondence between a word and a sound expressing letter/character for said word, and for generating the pronunciation information for each of the plural word reading expressing letters or characters thus conferred; and
  
  a storage controlling step of comparing the pronunciation information generated by said reading information generating means to the speech information of the speech recognized by said speech recognition step and newly storing the closest information of pronunciation in said dictionary for speech recognition as being the pronunciation information corresponding to the pattern recognition result extracted by said image recognition step.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The letter/character recognition method as recited in claim 11 further comprising:
    - a transient storage step of transiently storing the correspondence between a plurality of letters extracted from said image and a plurality of information of pronunciation conferred to said letters or characters as a transient dictionary.
  - 13. The letter/character recognition method as recited in claim 11 wherein, in said storage controlling step, a word attribute is stored in said dictionary for speech recognition in association with said letter or character being newly registered and the pronunciation information of said letter or character.
  - 14. The letter/character recognition method as recited in claim 13 further comprising:
    - a dialog management step for generating a reply to the speech recognized by said speech recognition step;
      
      said dialog management step employing said word attribute in reply rules to the speech.
  - 15. The letter/character recognition method as recited in claim 11 wherein said speech recognition step recognizes the speech based on the hidden Markov model method.

16. A computer readable medium having stored therein a control program for having a robot apparatus executean imaging step of imaging an object;
- an image recognition step of extracting predetermined patterns of images from an image photographed by said imaging step;
  
  a sound collecting step of collecting the surrounding sound;
  
  a speech recognition step of recognizing the speech from the sound acquired by said sound collecting step;
  
  an pronunciation information generating step of conferring plural word reading expressing letters, inferred from the predetermined patterns of images extracted by said image recognition step, based on a table for word sound expressions, having stored therein the relationship of correspondence between a word and a sound expressing letter/character for said word, and for generating, for each of the plural word reading expressing letters or characters thus conferred, the pronunciation information; and
  
  a storage step of comparing the pronunciation information generated by said reading information generating means to the speech information of the speech recognized by said speech recognition step and newly storing the closest information of pronunciation in said dictionary for speech recognition as being the pronunciation information corresponding to the pattern recognition result extracted by said image recognition step.

17. A computer readable medium having recorded therein a control program for having a robot apparatus executean imaging step of imaging an object;
- an image recognition step of extracting predetermined patterns of images from an image photographed by said imaging step;
  
  a sound collecting step of collecting the surrounding sound;
  
  a speech recognition step of recognizing the speech from the sound acquired by said sound collecting step;
  
  an pronunciation information generating step of conferring plural word reading expressing letters, inferred from the predetermined patterns of images extracted by said image recognition step, based on a table for word sound expressions, having stored therein the relationship of correspondence between a word and a sound expressing letter/character for said word, and for generating the pronunciation information for each of the plural word reading expressing letters or characters thus conferred; and
  
  a storage step of comparing the pronunciation information generated by said pronunciation information generating step to the speech information of the speech recognized by said speech recognition step and newly storing the closest information of pronunciation in said dictionary for speech recognition as being the pronunciation information corresponding to the pattern recognition result extracted by said image recognition step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Minamino, Katsuki, Sabe, Kohtaro, Kawamoto, Kenta, Hiroe, Atsuo, Ohashi, Takeshi
Primary Examiner(s)
Chen, Wenpeng
Assistant Examiner(s)
SHAH, UTPAL D

Application Number

US10/336,201
Publication Number

US 20030152261A1
Time in Patent Office

1,316 Days
Field of Search

382/153, 382/114, 704/251, 704/260
US Class Current

382/153
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/047   Architecture of speech synt...

G10L 15/06   Creation of reference templ...

G10L 15/24   Speech recognition using no...

G10L 2015/0631   Creating reference template...

Robot apparatus, method and device for recognition of letters or characters, control program and recording medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Robot apparatus, method and device for recognition of letters or characters, control program and recording medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links