Voice interaction device, voice interaction method, voice interaction program, and robot
First Claim
1. A device performing voice interaction with a plurality of users, the device comprising:
- a sensor obtaining image data of an area around the device;
a microphone obtaining audio of the area around the device;
a speaker;
a processor; and
a non-transitory memory storing thereon a computer program, which when executed by the processor, causes the processor to perform operations includingstoring a plurality of image data corresponding to the plurality of users, the plurality of users including an adult and a child;
identifying a person contained in the obtained image data based on the obtained image data and the stored plurality of image data, and outputting user information indicating the identified person;
extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database;
first determining, based on the user information and the first database, whether the adult and the child are conversing, and determining that the adult and the child are conversing when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values;
second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time;
extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database;
selecting from the at least one candidate topic one topic to provide to the adult and the child;
generating voice data containing the one topic; and
outputting the generated voice data via the speaker.
1 Assignment
0 Petitions
Accused Products
Abstract
A topic providing device includes a candidate topic extractor, a provided topic determiner, a voice synthesizer, and a speaker. When a determination is made that a parent and child are conversing and that there is a need to provide a new topic to the parent and child, based on a conversation history database and a child activity database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, the candidate topic extractor extracts at least one candidate topic that corresponds to the at least one activity name in the child activity database and does not correspond to an activity name included in text data recorded in a first database. From the at least one candidate topic, the provided topic determiner selects one topic to provide to the parent and the child. The voice synthesizer generates voice data containing the one topic. The speaker outputs the voice data.
10 Citations
13 Claims
-
1. A device performing voice interaction with a plurality of users, the device comprising:
-
a sensor obtaining image data of an area around the device; a microphone obtaining audio of the area around the device; a speaker; a processor; and a non-transitory memory storing thereon a computer program, which when executed by the processor, causes the processor to perform operations including storing a plurality of image data corresponding to the plurality of users, the plurality of users including an adult and a child; identifying a person contained in the obtained image data based on the obtained image data and the stored plurality of image data, and outputting user information indicating the identified person; extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database; first determining, based on the user information and the first database, whether the adult and the child are conversing, and determining that the adult and the child are conversing when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values; second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time; extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database; selecting from the at least one candidate topic one topic to provide to the adult and the child; generating voice data containing the one topic; and outputting the generated voice data via the speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method in a device performing voice interaction with a plurality of users, wherein the device includes a processor and a non-transitory memory, the method comprising:
-
obtaining image data of an area around the device via a sensor; obtaining audio of the area around the device via a microphone; identifying a person contained in the obtained image data based on the obtained image data and a plurality of image data stored in a memory storing a plurality of image data corresponding to the plurality of users, and outputting user information indicating the identified person, the plurality of users including an adult and a child; extracting a voice from the obtained audio, extracting a feature value of the voice and text data corresponding to the voice, and associating the text data with the feature value and recording the associated text data and feature value in a first database; first determining, based on the user information and the first database, whether the adult and the child are conversing, and when the adult and the child are the identified persons and the feature value contains a plurality of mutually dissimilar feature values, determining that the adult and the child are conversing; second determining, based on the first database, whether there is a need to provide a new topic to the adult and the child when the adult and the child are determined to be conversing, and determining that there is a need to provide a new topic to the adult and the child when a first key phrase is contained in the text data indicating the conversation between the adult and the child during a current predetermined period of time; extracting at least one candidate topic based on the first database and a second database when providing the topic is determined to be necessary, the second database storing at least one activity name indicating an activity the child was engaged in for a first predetermined period of time, which is earlier than the current predetermined period of time, the at least one candidate topic corresponding to the at least one activity name in the second database and not corresponding to the at least one activity name included in the text data indicating the conversation between the adult and the child during the current predetermined period of time recorded in the first database; selecting from the at least one candidate topic one topic to provide to the adult and the child; generating voice data containing the one topic; and outputting the generated voice data via a speaker.
-
Specification