Dialogue act estimation method, dialogue act estimation apparatus, and storage medium

US 10,460,721 B2
Filed: 06/07/2017
Issued: 10/29/2019
Est. Priority Date: 06/23/2016
Status: Active Grant

First Claim

Patent Images

1. A dialogue act estimation method, in a dialogue act estimation system, comprising:

acquiring sounds by a microphone in a terminal;

determining, by a processor in the terminal, whether the acquired sounds are uttered sentences of one or more speakers or noise;

outputting the uttered sentences to communication transmitter only when the processor determines that the acquired sounds are uttered sentences of the one or more speakers and are not noise;

converting the uttered sentences of the one or more speakers to one or more formatted communication signals when the processor determines that the acquired sounds are uttered sentences of the one or more speakers;

transmitting the one or more formatted communication signals from the terminal over a communication network to a server;

receiving the one or more formatted communication signals by the server;

converting the received one or more formatted communication signals by a processor in the server to the uttered sentences of the one or more speakers;

acquiring first training data by the server from the converted uttered sentences of the one or more speakers indicating, in a mutually associated manner, text data of a first sentence that can be a current uttered sentence, text data of a second sentence that can be an uttered sentence immediately previous to the first sentence, first speaker change information indicating whether a speaker of the first sentence is the same as a speaker of the second sentence, and dialogue act information indicating a class of the first sentence;

learning an association between the current uttered sentence and the dialogue act information by applying the first training data to a model;

storing a result of the learning as learning result information in a memory in the server;

acquiring dialogue data including text data of a third sentence of a current uttered sentence uttered by a user, text data of a fourth sentence of an uttered sentence immediately previous to the third sentence, and second speaker change information indicating whether the speaker of the third sentence is the same as a speaker of the fourth sentence;

estimating a dialogue act to which the third sentence is classified by applying the dialogue data to the model based on the learning result information; and

generating a correct response to the uttered sentences of the one or more speakers,wherein the model includesa first model that outputs a first feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker identification information, the second speaker identification information, and a first weight parameter, anda second model that outputs a second feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker change information, and a second weight parameter,wherein the first model determines the first feature vector from the first sentence and the second sentence according to a first RNN-LSTM (Recurrent Neural Network-Long Short Term Memory) having the first weight parameter dependent on the first speaker identification information and the second speaker identification information, andwherein the second model determines the second feature vector from the first sentence and the second sentence according to a second RNN-LSTM having the second weight parameter dependent on first speaker change information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dialogue act estimation method, in a dialogue act estimation apparatus, includes acquiring first training data indicating, in a mutually associated manner, text data of a first sentence that can be a current uttered sentence, and text data of a second sentence that can be an uttered sentence immediately previous to the first sentence. The method also includes speaker change information indicating whether a speaker of the first sentence is the same as a speaker of the second sentence, and dialogue act information indicating a class of the first sentence. The method further includes learning an association between the current uttered sentence and the dialogue act information by applying the first training data to a model, and storing a result of the learning as learning result information in a memory.

4 Citations

6 Claims

1. A dialogue act estimation method, in a dialogue act estimation system, comprising:
- acquiring sounds by a microphone in a terminal;
  
  determining, by a processor in the terminal, whether the acquired sounds are uttered sentences of one or more speakers or noise;
  
  outputting the uttered sentences to communication transmitter only when the processor determines that the acquired sounds are uttered sentences of the one or more speakers and are not noise;
  
  converting the uttered sentences of the one or more speakers to one or more formatted communication signals when the processor determines that the acquired sounds are uttered sentences of the one or more speakers;
  
  transmitting the one or more formatted communication signals from the terminal over a communication network to a server;
  
  receiving the one or more formatted communication signals by the server;
  
  converting the received one or more formatted communication signals by a processor in the server to the uttered sentences of the one or more speakers;
  
  acquiring first training data by the server from the converted uttered sentences of the one or more speakers indicating, in a mutually associated manner, text data of a first sentence that can be a current uttered sentence, text data of a second sentence that can be an uttered sentence immediately previous to the first sentence, first speaker change information indicating whether a speaker of the first sentence is the same as a speaker of the second sentence, and dialogue act information indicating a class of the first sentence;
  
  learning an association between the current uttered sentence and the dialogue act information by applying the first training data to a model;
  
  storing a result of the learning as learning result information in a memory in the server;
  
  acquiring dialogue data including text data of a third sentence of a current uttered sentence uttered by a user, text data of a fourth sentence of an uttered sentence immediately previous to the third sentence, and second speaker change information indicating whether the speaker of the third sentence is the same as a speaker of the fourth sentence;
  
  estimating a dialogue act to which the third sentence is classified by applying the dialogue data to the model based on the learning result information; and
  
  generating a correct response to the uttered sentences of the one or more speakers,wherein the model includesa first model that outputs a first feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker identification information, the second speaker identification information, and a first weight parameter, anda second model that outputs a second feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker change information, and a second weight parameter,wherein the first model determines the first feature vector from the first sentence and the second sentence according to a first RNN-LSTM (Recurrent Neural Network-Long Short Term Memory) having the first weight parameter dependent on the first speaker identification information and the second speaker identification information, andwherein the second model determines the second feature vector from the first sentence and the second sentence according to a second RNN-LSTM having the second weight parameter dependent on first speaker change information.
- View Dependent Claims (2, 3, 4)
- - 2. The dialogue act estimation method according to claim 1,wherein the first training data further includes first speaker identification information indicating the speaker of the first sentence and second speaker identification information indicating the speaker of the second sentence.
  - 3. The dialogue act estimation method according to claim 2,wherein the model includes a third model that outputs a posterior probability of a dialogue act corresponding to the first sentence based on the first feature vector, the second feature vector, and a third weight parameter, andwherein the learning is performed by performing error backpropagation between the posterior probability and the dialogue act information indicated by the first training data.
  - 4. The dialogue act estimation method according to claim 1,wherein the first training data is acquired from a corpus in which two or more pieces of training data are accumulated.

5. A dialogue act estimation system, comprising:
- a microphone in a terminal that acquires sounds;
  
  a processor in the terminal, thatdetermines whether the acquired sounds are uttered sentences of one or more speakers or noise,outputs the uttered sentences only when the processor determines that the acquired sounds are uttered sentences of the one or more speakers and are not noise,converts the uttered sentences of the one or more speakers to one or more formatted communication signals when the processor determines that the acquired sounds are uttered sentences of the one or more speakers, andtransmits the one or more formatted communication signals from the terminal over a communication network; and
  
  a server, thatreceives the one or more formatted communication signals;
  
  converts the received one or more formatted communication signals to the uttered sentences of the one or more speakers; and
  
  acquires first training data from the converted uttered sentences of the one or more speakers indicating, in a mutually associated manner, text data of a first sentence that can be a current uttered sentence, text data of a second sentence that can be an uttered sentence immediately previous to the first sentence, first speaker change information indicating whether a speaker of the first sentence is the same as a speaker of the second sentence, and dialogue act information indicating a class of the first sentence;
  
  learns an association between the current uttered sentence and the dialogue act information by applying the first training data to a model; and
  
  stores a result of the learning as learning result information in a memory,acquires dialogue data including text data of a third sentence of a current uttered sentence uttered by a user, text data of a fourth sentence of an uttered sentence immediately previous to the third sentence, and second speaker change information indicating whether the speaker of the third sentence is the same as a speaker of the fourth sentence;
  
  estimates a dialogue act to which the third sentence is classified by applying the dialogue data to the model based on the learning result information; and
  
  generates a correct response to the uttered sentences of the one or more speakers,wherein the model includesa first model that outputs a first feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker identification information, the second speaker identification information, and a first weight parameter, anda second model that outputs a second feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker change information, and a second weight parameter,wherein the first model determines the first feature vector from the first sentence and the second sentence according to a first RNN-LSTM (Recurrent Neural Network-Long Short Term Memory) having the first weight parameter dependent on the first speaker identification information and the second speaker identification information, andwherein the second model determines the second feature vector from the first sentence and the second sentence according to a second RNN-LSTM having the second weight parameter dependent on first speaker change information.

6. A plurality of non-transitory storage mediums storing computer-readable programs, the programs causing a plurality of computers to execute a process including:
- acquiring sounds by a microphone in a terminal;
  
  determining, by a processor in the terminal, whether the acquired sounds are uttered sentences of one or more speakers or noise;
  
  outputting the uttered sentences to communication transmitter only when the processor determines that the acquired sounds are uttered sentences of the one or more speakers and are not noise;
  
  converting the uttered sentences of the one or more speakers to one or more formatted communication signals when the processor determines that the acquired sounds are uttered sentences of the one or more speakers;
  
  transmitting the one or more formatted communication signals from the terminal over a communication network to a server;
  
  receiving the one or more formatted communication signals by the server;
  
  converting the received one or more formatted communication signals by the server to the uttered sentences of the one or more speakers;
  
  acquiring first training data by the server from the converted uttered sentences of the one or more speakers indicating, in a mutually associated manner, text data of a first sentence that can be a current uttered sentence, text data of a second sentence that can be an uttered sentence immediately previous to the first sentence, first speaker change information indicating whether a speaker of the first sentence is the same as a speaker of the second sentence, and dialogue act information indicating a class of the first sentence;
  
  learning an association between the current uttered sentence and the dialogue act information by applying the first training data to a model;
  
  storing a result of the learning as learning result information in a memory in the server;
  
  acquiring dialogue data including text data of a third sentence of a current uttered sentence uttered by a user, text data of a fourth sentence of an uttered sentence immediately previous to the third sentence, and second speaker change information indicating whether the speaker of the third sentence is the same as a speaker of the fourth sentence;
  
  estimating a dialogue act to which the third sentence is classified by applying the dialogue data to the model based on the learning result information; and
  
  generating a correct response to the uttered sentences of the one or more speakers,wherein the model includesa first model that outputs a first feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker identification information, the second speaker identification information, and a first weight parameter, anda second model that outputs a second feature vector based on the text data of the first sentence, the text data of the second sentence, the first speaker change information, and a second weight parameter,wherein the first model determines the first feature vector from the first sentence and the second sentence according to a first RNN-LSTM (Recurrent Neural Network-Long Short Term Memory) having the first weight parameter dependent on the first speaker identification information and the second speaker identification information, andwherein the second model determines the second feature vector from the first sentence and the second sentence according to a second RNN-LSTM having the second weight parameter dependent on first speaker change information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Management Co., Ltd. (Panasonic Holdings Corporation)
Original Assignee
Panasonic Intellectual Property Management Co., Ltd. (Panasonic Holdings Corporation)
Inventors
Ushio, Takashi
Primary Examiner(s)
Roberts, Shaun

Application Number

US15/615,856
Publication Number

US 20170372694A1
Time in Patent Office

874 Days
Field of Search

704231, 704232
US Class Current
CPC Class Codes

G06F 16/353   into predefined classes

G06F 40/35   Discourse or dialogue repre...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/084   Backpropagation, e.g. using...

G10L 15/063   Training

G10L 15/16   using artificial neural net...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 17/22   Interactive procedures; Man...

G10L 21/0272   Voice signal separating

Dialogue act estimation method, dialogue act estimation apparatus, and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

4 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Dialogue act estimation method, dialogue act estimation apparatus, and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

4 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links