Voice interaction device, voice interaction method, voice interaction program, and robot

US 10,650,815 B2
Filed: 12/06/2017
Issued: 05/12/2020
Est. Priority Date: 12/14/2016
Status: Active Grant

First Claim

Patent Images

1. A device performing voice interaction with a plurality of users, the device comprising:

a sensor obtaining image data of an area around the device;

a microphone obtaining audio of the area around the device;

a speaker;

a processor; and

a non-transitory memory storing thereon a computer program, which when executed by the processor, causes the processor to perform operations includingstoring a plurality of image data corresponding to the plurality of users, the plurality of users including an adult and a child;

identifying a person contained in the obtained image data based on the obtained image data and the stored plurality of image data, and outputting user information indicating the identified person;

extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database;

first determining, based on the user information and the first database, whether the adult and the child are conversing, and determining that the adult and the child are conversing when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values;

second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time;

extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database;

selecting from the at least one candidate topic one topic to provide to the adult and the child;

generating voice data containing the one topic; and

outputting the generated voice data via the speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A topic providing device includes a candidate topic extractor, a provided topic determiner, a voice synthesizer, and a speaker. When a determination is made that a parent and child are conversing and that there is a need to provide a new topic to the parent and child, based on a conversation history database and a child activity database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, the candidate topic extractor extracts at least one candidate topic that corresponds to the at least one activity name in the child activity database and does not correspond to an activity name included in text data recorded in a first database. From the at least one candidate topic, the provided topic determiner selects one topic to provide to the parent and the child. The voice synthesizer generates voice data containing the one topic. The speaker outputs the voice data.

10 Citations

13 Claims

1. A device performing voice interaction with a plurality of users, the device comprising:
- a sensor obtaining image data of an area around the device;
  
  a microphone obtaining audio of the area around the device;
  
  a speaker;
  
  a processor; and
  
  a non-transitory memory storing thereon a computer program, which when executed by the processor, causes the processor to perform operations includingstoring a plurality of image data corresponding to the plurality of users, the plurality of users including an adult and a child;
  
  identifying a person contained in the obtained image data based on the obtained image data and the stored plurality of image data, and outputting user information indicating the identified person;
  
  extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database;
  
  first determining, based on the user information and the first database, whether the adult and the child are conversing, and determining that the adult and the child are conversing when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values;
  
  second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time;
  
  extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database;
  
  selecting from the at least one candidate topic one topic to provide to the adult and the child;
  
  generating voice data containing the one topic; and
  
  outputting the generated voice data via the speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The device according to claim 1, whereinthe second database further stores movement amount information indicating an amount of movement corresponding to the activity name, audio level information indicating an audio level corresponding to the activity name, and date information indicating a date corresponding to the activity name,in the extracting, specifying the newest activity name based on the second database and extracting, as the at least one candidate topic, at least one second activity name different from the newest activity name and the at least one activity name included in the text data, andin the selecting, selecting, as the one topic, a third activity name from the at least one second activity name based on a first movement amount corresponding to the newest activity name, a first audio level corresponding to the newest activity name, a second movement amount corresponding to the at least one second activity name among the activity names, and a second audio level corresponding to the at least one second activity name.
  - 3. The device according to claim 2, whereinin the selecting, selecting, as the third activity name, the second activity name having the largest sum calculated according to the following formula:
    - (A−
      
      B)²+(C−
      
      D)²where A represents the first movement amount, B represents the second movement amount, C represents the first audio level, and D represents the second audio level.
  - 4. The device according to claim 2, whereinin the extracting, extracting, as the at least one candidate topic, at least one second activity name different from the newest activity name and the at least one activity name included in the text data, the at least one second activity name being recorded in a second predetermined period of time.
  - 5. The device according to claim 2, whereinthe movement amount information is a value obtained by multiplying a first coefficient by the movement amount, andthe audio level information is a value obtained by multiplying a second coefficient by the audio level.
  - 6. The device according to claim 2, whereinin the generating, based on the second database, when a third movement amount corresponding to the third activity name is equal to or greater than a first threshold value generating the voice data containing a second key phrase and, based on the second database, when the third movement amount corresponding to the third activity name is less than the first threshold value, generating the voice data containing a third key phrase.
  - 7. The device according to claim 6,wherein the second key phrase and the third key phrase contain phrasing providing feedback on the child'"'"'s engagement level in the third activity name, anda meaning indicated by the second key phrase is the opposite of a meaning indicated by the third key phrase.
  - 8. The device according to claim 2, whereinin the generating based on the second database, when a third audio level corresponding to the third activity name is equal to or greater than a first threshold value, generating the voice data containing a second key phrase and, based on the second database, when the third audio level corresponding to the third activity name is less than the first threshold value, generating the voice data containing a third key phrase.
  - 9. The device according to claim 8, whereinthe second key phrase and the third key phrase contain phrasing providing feedback on the child'"'"'s engagement level in the third activity name, anda meaning indicated by the second key phrase is the opposite of a meaning indicated by the third key phrase.
  - 10. The device according to claim 1, whereinthe feature value contains a voice-print of a speaker from whom a voice issues.
  - 11. The device according to claim 1, whereinthe first key phrase includes wording that indicates the topic.
  - 12. A robot comprising:
    - the device according to claim 1;
      
      a casing incorporating the device; and
      
      a displacement mechanism displacing the casing.

13. A method in a device performing voice interaction with a plurality of users, wherein the device includes a processor and a non-transitory memory, the method comprising:
- obtaining image data of an area around the device via a sensor;
  
  obtaining audio of the area around the device via a microphone;
  
  identifying a person contained in the obtained image data based on the obtained image data and a plurality of image data stored in a memory storing a plurality of image data corresponding to the plurality of users, and outputting user information indicating the identified person, the plurality of users including an adult and a child;
  
  extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database;
  
  first determining, based on the user information and the first database, whether the adult and the child are conversing, and when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values, determining that the adult and the child are conversing;
  
  second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time;
  
  extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database;
  
  selecting from the at least one candidate topic one topic to provide to the adult and the child;
  
  generating voice data containing the one topic; and
  
  outputting the generated voice data via a speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Management Co., Ltd. (Panasonic Holdings Corporation)
Original Assignee
Panasonic Intellectual Property Management Co., Ltd. (Panasonic Holdings Corporation)
Inventors
Higuchi, Seiya, Kunitake, Yuji, Ota, Yusaku, Miyazaki, Ryouta
Primary Examiner(s)
Shin, Seong-Ah A

Application Number

US15/834,030
Publication Number

US 20180166076A1
Time in Patent Office

888 Days
Field of Search

704 9, 704235
US Class Current
CPC Class Codes

B25J 11/0005   Manipulators having means f...

B25J 13/003   by means of an audio-respon...

B25J 9/0003   Home robots, i.e. small rob...

G06F 16/00   Information retrieval; Data...

G06F 16/3329   Natural language query form...

G06F 18/254   of classification results, ...

G06F 3/167   Audio in a user interface, ...

G06V 10/809   of classification results, ...

G06V 20/10   Terrestrial scenes scenes u...

G06V 40/171   Local features and componen...

G06V 40/172   Classification, e.g. identi...

G10L 13/00   Speech synthesis; Text to s...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 25/21   the extracted parameters be...

G10L 25/48   specially adapted for parti...

Y10S 901/01 : Mobile robot

Y10S 901/46 : Sensing device

View All

Voice interaction device, voice interaction method, voice interaction program, and robot

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

10 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Voice interaction device, voice interaction method, voice interaction program, and robot

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links