Learning of dialogue states and language model of spoken information system
First Claim
1. A method of classifying a plurality of sequences of symbols to form a plurality of sets of sequences of symbols, the method comprising:
- a) determining a distance between each sequence and each other sequence in said plurality of sequences in dependence upon a set of semantically insignificant symbol sequences and a set of equivalent symbol sequence pairs; and
b) grouping the plurality of sequences into a plurality of sets in dependence upon said distances;
wherein the symbols are words transcribed from call center operator speech signals spoken to a caller during an inquiry by the caller to the call center.
1 Assignment
0 Petitions
Accused Products
Abstract
In this invention dialogue states for a dialogue model are created using a training corpus of example human—human dialogues. Dialogue states are modelled at the turn level rather than at the move level, and the dialogue states are derived from the training corpus. The range of operator dialogue utterances is actually quite small in many services and therefore may be categorized into a set of predetermined meanings. This is an important assumption which is not true of general conversation, but is often true of conversations between telephone operators and people. Phrases are specified which have specific substitution and deletion penalties, for example the two phrases “I would like to” and “can I” may be specified as a possible substitution with low or zero penalty. Thus allows common equivalent phrases are given low substitution penalties. Insignificant phrases such as ‘erm’ are given low or zero deletion penalties.
95 Citations
19 Claims
-
1. A method of classifying a plurality of sequences of symbols to form a plurality of sets of sequences of symbols, the method comprising:
-
a) determining a distance between each sequence and each other sequence in said plurality of sequences in dependence upon a set of semantically insignificant symbol sequences and a set of equivalent symbol sequence pairs; and
b) grouping the plurality of sequences into a plurality of sets in dependence upon said distances;
wherein the symbols are words transcribed from call center operator speech signals spoken to a caller during an inquiry by the caller to the call center. - View Dependent Claims (2, 3, 4, 5, 6, 12)
-
-
7. An apparatus for classifying a plurality of sequences of symbols to form a plurality of sets of sequences of symbols, the symbols being words transcribed from call center operator speech signals spoken to a caller during an inquiry from the caller to the call center, the apparatus comprising:
-
a store for storing a set of semantically insignificant symbol sequences;
a store for storing a set of equivalent symbol sequence pairs;
determining means connected to receive the transcribed call center operator speech signals and further arranged to determine a distance between each sequence and each other sequence in said plurality of sequences in dependence upon the set of semantically insignificant symbol sequences and the set of equivalent symbol sequence pairs; and
means for grouping the plurality of sequences into a plurality of sets in dependence upon said distances. - View Dependent Claims (8, 9, 10, 11, 13)
-
-
14. A method of classifying a plurality of sequences of words to form a plurality of sets of sequences of words, the method comprising:
-
transcribing the plurality of sequences of words from call center operator speech signals spoken to a caller during an inquiry by the caller to the call center;
determining a distance between each sequence of words and each other sequence of words in said plurality of sequences; and
grouping the plurality of sequences of words into a plurality of sets in dependence upon said distances. - View Dependent Claims (15, 16, 17)
-
-
18. An apparatus for classifying a plurality of sequences of words to form a plurality of sets of sequences of words, the apparatus comprising:
-
transcribing means for transcribing the plurality of sequences of words from call center operator speech signals spoken to a caller during an inquiry by the caller to the call center;
determining means connected to receive the transcribed call center operator speech signals and further arranged to determine a distance between each sequence and each other semantically insignificant symbol sequences and the set of equivalent symbol sequence pairs; and
means for grouping the plurality of sequences into a plurality of sets in dependence upon said distances. - View Dependent Claims (19)
-
Specification