Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method

US 10,242,666 B2
Filed: 04/17/2015
Issued: 03/26/2019
Est. Priority Date: 04/17/2014
Status: Active Grant

First Claim

Patent Images

1. A method of performing a dialogue between a humanoid robot and at least one user comprising the following steps, carried out iteratively by said humanoid robot:

i) acquiring a plurality of input signals from respective sensors, at least one said sensor being a sound sensor and at least one other sensor being a motion or image sensor;

ii) interpreting the acquired signals to recognize a plurality of events generated by said user, selected from a group comprising;

the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression;

iii) determining a response of said humanoid robot, comprising at least one event selected from a group comprising;

the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression, said determining being performed by applying a set of rules, each said rule associating a set of input events to a response of the robot;

iv) generating said or each said event;

wherein at least some of said rules applied at said step iii) associate a response to a combination of at least two events jointly generated by said user and recognized at said step ii), of which at least one is not a word or sentence uttered by said user, andif the response determined during step iii) is or comprises at least the utterance of a word or sentence, executing a step iii-a) of performing a syntactic analysis of a sentence to be uttered by the robot to determine at least one word to be animated depending on a function of the at least one word within a structure of said sentence and determining an animation accompanying said response as a function of said analysis.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of performing dialog between a humanoid robot and user comprises: i) acquiring input signals from respective sensors, at least one being a sound sensor and another being a motion or image sensor; ii) interpreting the signals to recognize events generated by the user, including: the utterance of a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression; iii) determining a response of the humanoid robot, comprising an event such as: the utterance of a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression; iv) generating, an event by the humanoid robot; wherein step iii) comprises determining the response from events jointly generated by the user and recognized at step ii), of which at least one is not words uttered by the user. A computer program product and humanoid robot for carrying out the method is provided.

76 Citations

View as Search Results

25 Claims

1. A method of performing a dialogue between a humanoid robot and at least one user comprising the following steps, carried out iteratively by said humanoid robot:
- i) acquiring a plurality of input signals from respective sensors, at least one said sensor being a sound sensor and at least one other sensor being a motion or image sensor;
  
  ii) interpreting the acquired signals to recognize a plurality of events generated by said user, selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression;
  
  iii) determining a response of said humanoid robot, comprising at least one event selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression, said determining being performed by applying a set of rules, each said rule associating a set of input events to a response of the robot;
  
  iv) generating said or each said event;
  
  wherein at least some of said rules applied at said step iii) associate a response to a combination of at least two events jointly generated by said user and recognized at said step ii), of which at least one is not a word or sentence uttered by said user, andif the response determined during step iii) is or comprises at least the utterance of a word or sentence, executing a step iii-a) of performing a syntactic analysis of a sentence to be uttered by the robot to determine at least one word to be animated depending on a function of the at least one word within a structure of said sentence and determining an animation accompanying said response as a function of said analysis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 10, 11, 23, 24, 25)
- - 2. The method according to claim 1, wherein at least some of said rules applied at said step iii) determine a response comprising at least two events generated jointly by said humanoid robot, of which at least one is not the utterance of a word or sentence.
  - 3. The method according to claim 1, wherein, at said step iii, said response of humanoid robot is determined based on at least one parameter selected from:
    - a dialogue context, the identity of the user, an internal state of said humanoid robot.
  - 4. The method according to claim 3, further comprising a step of modifying the value of said or of at least one said parameter according to said at least one event recognized at said step ii) or determined in said step iii).
  - 5. The method according to claim 1, wherein said step ii) comprises searching a match between an acquired signal and an event belonging to a list of expected events stored in a memory of said humanoid robot, or accessible by it, said searching being carried out by successively using a plurality of matching methods with increasing complexity until an event is recognized with a confidence score greater than a predetermined value, or after the highest complexity recognition method has been used.
  - 6. The method according to claim 5, wherein the used matching methods are selected depending on a context of dialogue.
  - 7. The method according to claim 5, wherein said matching methods include, by order of increasing complexity:
    - the search for an exact match, the search for an approximate match, the search for a phonetic correspondence—
      
      only in the case of voice recognition—
      
      and the search for a semantic correspondence.
  - 10. The method according to claim 5, wherein said list of expected events is selected, among a plurality of said lists, depending on a dialogue context.
  - 11. The method according to claim 1 wherein said step iii) comprises determining a response to a set of events, including the absence of words uttered by said user or identified gestures, by applying rules belonging to a predefined subset, called proactive rules.
  - 23. A computer program product comprising program code instructions for executing the method according to claim 1 when said program is executed by at least one processor embedded on a humanoid robot, said robot comprising:
    - a plurality of sensors operatively connected to said or at least one processor and comprising at least one sound sensor and at least one image or movement sensor, to acquire respective input signals;
      
      a speech synthesis module controlled by said or at least one said processor to utter words or sentence; and
      
      a set of actuators driven by said or at least one said processor enabling said robot to perform a plurality of movements or gestures.
  - 24. A humanoid robot comprising:
    - at least one embedded processor;
      
      a sensor assembly operatively connected to said or at least one said processor and comprising at least one sound sensor and at least one image or movement sensor, to acquire respective input signals;
      
      a speech synthesis module driven by said or at least one said processor to utter words or sentences, anda set of actuators driven by said or at least one said processor enabling said robot to perform a plurality of movements or gestures;
25. The humanoid robot comprising:
- at least one embedded processor;
  
  a sensor assembly operatively connected to said or at least one said processor and comprising at least one sound sensor and at least one image or movement sensor, to acquire respective input signals;
  
  a speech synthesis module driven by said or at least one said processor to utter words or sentences, anda set of actuators driven by said or at least one said processor enabling said robot to perform a plurality of movements or gestures, further comprising a device for connection to at least one remote server, said or at least one said processor being programmed or configured to cooperate with said or at least one said remote server to carry out the method according to claim 1.

8. A method of performing a dialogue between a humanoid robot and at least one user comprising the following steps, carried out iteratively by said humanoid robot:
- i) acquiring a plurality of input signals from respective sensors, at least one said sensor being a sound sensor and at least one other sensor being a motion or image sensor;
  
  ii) interpreting the acquired signals to recognize a plurality of events generated by said user, selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression;
  
  iii) determining a response of said humanoid robot, comprising at least one event selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression, said determining being performed by applying a set of rules, each said rule associating a set of input events to a response of the robot;
  
  iv) generating said or each said event;
  
  a step of phonetic transcription of a set of sounds acquired by a sound sensor;
  
  a step of simplifying and smoothing the resulting phonetic transcription;
  
  calculating an edit distance between said simplified and smoothed phonetic transcription and a plurality of entries, obtained by simplifying and smoothing a predefined set of words in natural language; and
  
  choosing a natural language word of said predefined set, corresponding to the entry with the lowest edit distance from said simplified and smoothed phonetic transcription, whereinat least some of said rules applied at said step iii) associate a response to a combination of at least two events jointly generated by said user and recognized at said step ii), of which at least one is not a word or sentence uttered by said user,said step ii) comprises searching a match between an acquired signal and an event belonging to a list of expected events stored in a memory of said humanoid robot, or accessible by it, said searching being carried out by successively using a plurality of matching methods with increasing complexity until an event is recognized with a confidence score greater than a predetermined value, or after the highest complexity recognition method has been used, andsaid matching methods include, by order of increasing complexity;
  
  the search for an exact match, the search for an approximate match, the search for a phonetic correspondence—
  
  only in the case of voice recognition—
  
  and the search for a semantic correspondence.
- View Dependent Claims (9)
- - 9. The method according to claim 8 wherein said simplifying and smoothing comprises:
    - replacing phonemes prone to confusion by a single phoneme;
      
      removing vowels other than vowels at the beginning of words and nasal vowels, andremoving breaks between words.

12. A method of performing a dialogue between a humanoid robot and at least one user comprising the following steps, carried out iteratively by said humanoid robot:
- i) acquiring a plurality of input signals from respective sensors, at least one said sensor being a sound sensor and at least one other sensor being a motion or image sensor;
  
  ii) interpreting the acquired signals to recognize a plurality of events generated by said user, selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression;
  
  iii) determining a response of said humanoid robot, comprising at least one event selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression, said determining being performed by applying a set of rules, each said rule associating a set of input events to a response of the robot;
  
  iv) generating said or each said event; and
  
  if the response determined during step iii) is or comprises at least the utterance of a word or sentence, the execution of a step iii-a) of performing linguistic analysis of the words or sentences to be uttered and determining an animation accompanying said response as a function of said analysis, said step iii-a comprises the substeps of;
  
  α
  
  ) identifying at least one word of the response to be animated;
  
  β
  
  ) determining a concept and expressiveness, called one-off expressiveness, associated with said or each said word to be animated; and
  
  γ
  
  ) choosing from a list of animations stored in a memory of said humanoid robot, or accessible by it, an animation based on said concept and said one-off expressiveness, whereinat least some of said rules applied at said step iii) associate a response to a combination of at least two events jointly generated by said user and recognized at said step ii), of which at least one is not a word or sentence uttered by said user.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method according to claim 12, wherein said substep α
    - comprises performing a syntactic analysis of a sentence to be uttered to determine each or said word to be animated depending on its function within a structure of said sentence.
  - 14. The method according to claim 12, wherein, in said substep β
    - , said one-off expressiveness is determined based on at least one parameter selected from;
      
      an expressiveness of the word, an expressiveness of one or more other words related to it, and an overall expressiveness of the entire response.
  - 15. The method according to claim 12, wherein each animation of said list is associated with one or more concepts and has a specific expressiveness, said substep γ
    - including choosing within said list the animation associated with the concept determined in said substep β and
      
      having a specific expressiveness closest to said one-off expressiveness.
  - 16. The method according to claim 15 further comprising the following sub step:
    - δ
      
      ) determining an expressiveness, called final expressiveness, based on said specific expressiveness and said one-off expressiveness.
  - 17. The method according to claim 12, wherein either said one-off or said final expressiveness determines at least one parameter chosen among a speed and an amplitude of at least one gesture of said animation.

18. A method of performing a dialogue between a humanoid robot and at least one user comprising the following steps, carried out iteratively by said humanoid robot:
- i) acquiring a plurality of input signals from respective sensors, at least one said sensor being a sound sensor and at least one other sensor being a motion or image sensor;
  
  ii) interpreting the acquired signals to recognize a plurality of events generated by said user, selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression;
  
  iii) determining a response of said humanoid robot, comprising at least one event selected from a group comprising;
  
  the utterance of at least a word or sentence, an intonation of voice, a gesture, a body posture, a facial expression, said determining being performed by applying a set of rules, each said rule associating a set of input events to a response of the robot;
  
  iv) generating said or each said event; and
  
  the following steps, implemented iteratively by said robot simultaneously with said steps i) to iv);
  
  A) determining the position of at least a portion of the body of said user relative to a reference frame fixed to the said robot; and
  
  B) driving at least one actuator of said robot to maintain the distance between said robot or an element thereof and said at least one or said body part of said user within a predefined range of values, whereinat least some of said rules applied at said step iii) associate a response to a combination of at least two events jointly generated by said user and recognized at said step ii), of which at least one is not a word or sentence uttered by said user.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The method according to claim 18 wherein said step B) further comprises driving at least one actuator of said robot to maintain an orientation of the robot with respect to said user in a predetermined angular range.
  - 20. The method according to claim 18 wherein the steps implemented iteratively by said robot simultaneously with said steps i to iv) further comprising the step of:
    - C) driving said or at least one said actuator to cause said pseudo-random displacements of the robot while maintaining said distance in said predetermined range of values and, where appropriate, said orientation in said predetermined angular range.
  - 21. The method according to claim 18 wherein the steps implemented iteratively by said robot simultaneously with said steps i to iv) further comprising the step of:
    - D) performing a semantic analysis of an ongoing dialogue between said user and said humanoid robot and, in accordance with said analysis, changing said predetermined range of distance values and, where appropriate, said predetermined angular range.
  - 22. The method according to claim 18 wherein said step A) comprises determining the position of a lower body of said user relative to said reference frame fixed to the said robot.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SoftBank Robotics Europe SAS
Original Assignee
SoftBank Robotics Europe SAS
Inventors
Monceaux, Jerome, Gate, Gwennael, Houssin, David, Barbieri, Gabriele, Martin, Jocelyn, Testard, Jean, Gourdin, Ilmo
Primary Examiner(s)
Sample, Jonathan L

Application Number

US15/300,226
Publication Number

US 20170148434A1
Time in Patent Office

1,439 Days
Field of Search

700245-264
US Class Current
CPC Class Codes

B25J 11/0015   Face robots, animated artif...

B25J 11/005   Manipulators for mechanical...

B25J 13/003   by means of an audio-respon...

B25J 9/1694   characterised by use of sen...

G06F 40/211   Syntactic parsing, e.g. bas...

G06N 3/008   based on physical entities ...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/32   Multiple recognisers used i...

G10L 2015/025   Phonemes, fenemes or fenone...

Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

76 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

76 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links