Conversational data mining

US 6,665,644 B1
Filed: 08/10/1999
Issued: 12/16/2003
Est. Priority Date: 08/10/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for collecting, in a data warehouse, data associated with a voice of a voice system user, said method comprising the steps of:

(a) conducting a conversation with the voice system user via at least one of a human operator and a voice-enabled machine system;

(b) capturing a speech waveform associated with utterances spoken by the voice system user during said conversation;

(c) digitizing said speech waveform to provide a digitized speech waveform;

(d) extracting, from said digitized speech waveform, at least one acoustic feature which is correlated with at least one user attribute, said at least one user attribute including at least one of;

(d-1) gender of the user;

(d-2) age of the user;

(d-3) accent of the user;

(d-4) native language of the user;

(d-5) dialect of the user;

(d-6) socioeconomic classification of the user;

(d-7) educational level of the user; and

(d-8) emotional state of the user;

(e) storing attribute data corresponding to said acoustic feature which is correlated with said at least one user attribute, together with at least one identifying indicia, in the data warehouse in a form to facilitate subsequent data mining thereon;

(f) repeating steps (a)-(e) for a plurality of additional conversations, with additional users, to provide a collection of stored data including the attribute data and identifying indicia; and

(g) mining the collection of stored data to provide information for modifying underlying business logic of the voice system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for collecting data associated with the voice of a voice system user includes conducting a plurality of conversations with a plurality of voice system users. For each conversation, a speech waveform is captured and digitized, and at least one acoustic feature is extracted. The features are correlated with at least one attribute such as gender, age, accent, native language, dialect, socioeconomic classification, educational level and emotional state. Attribute data and at least one identifying indicia are stored for each user in a data warehouse, in a form to facilitate subsequent data mining thereon. The resulting collection of stored data is then mined to provide information for modifying underlying business logic of the voice system. An apparatus suitable for carrying out the method includes a dialog management unit, an audio capture module, an acoustic from end, a processing module and a data warehouse. Appropriate method steps can be implemented by a digital computer running a suitable program stored on a program storage device.

463 Citations

31 Claims

1. A method for collecting, in a data warehouse, data associated with a voice of a voice system user, said method comprising the steps of:
- (a) conducting a conversation with the voice system user via at least one of a human operator and a voice-enabled machine system;
  
  (b) capturing a speech waveform associated with utterances spoken by the voice system user during said conversation;
  
  (c) digitizing said speech waveform to provide a digitized speech waveform;
  
  (d) extracting, from said digitized speech waveform, at least one acoustic feature which is correlated with at least one user attribute, said at least one user attribute including at least one of;
  
  (d-1) gender of the user;
  
  (d-2) age of the user;
  
  (d-3) accent of the user;
  
  (d-4) native language of the user;
  
  (d-5) dialect of the user;
  
  (d-6) socioeconomic classification of the user;
  
  (d-7) educational level of the user; and
  
  (d-8) emotional state of the user;
  
  (e) storing attribute data corresponding to said acoustic feature which is correlated with said at least one user attribute, together with at least one identifying indicia, in the data warehouse in a form to facilitate subsequent data mining thereon;
  
  (f) repeating steps (a)-(e) for a plurality of additional conversations, with additional users, to provide a collection of stored data including the attribute data and identifying indicia; and
  
  (g) mining the collection of stored data to provide information for modifying underlying business logic of the voice system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein step (e) comprises storing with at least one identifying indicia which comprises a time stamp.
  - 3. The method of claim 1, wherein step (d) includes extracting at least one of fundamental frequency, variation in fundamental frequency, running average pitch, running pitch variance, pitch jitter, running energy variance, speech rate and shimmer as at least one emotional state feature which is correlated with the emotional state of the user.
  - 4. The method of claim 3, further comprising the additional step of normalizing said at least one emotional state feature.
  - 5. The method of claim 1, further comprising the additional step of processing said at least one acoustic feature to determine said at least one user attribute, wherein said attribute data in step (e) comprises at least a value of said user attribute.
  - 6. The method of claim 5, further comprising the additional step of automatically refining said processing step in response to storage of additional attribute data in the data warehouse.
  - 7. The method of claim 1, wherein step (e) comprises storing said attribute data as at least one substantially raw acoustic feature.
  - 8. The method of claim 1, wherein step (d) includes extracting at least MEL cepstra, further comprising the additional steps of recognizing speech of the user based on said MEL cepstra;
    - transcribing said speech; and
      
      examining said speech for at least one of word choice and vocabulary to determine at least one of educational level of the user, socioeconomic classification of the user, and dialect of the user.
  - 9. The method of claim 1, further comprising the additional step of:
10. The method of claim 9, wherein said modifying in step (h) comprises at least one of:
- real-time changing of business logic of the voice system; and
  
  real-time modifying of the voice system response, as compared to an expected response of the voice system without said modifying.
11. The method of claim 3, further comprising the additional steps of:
- examining said at least one emotional state feature to determine if the user is in a jovial emotional state; and
  
  offering the user at least one of a product and a service in response to said jovial emotional state.
12. The method of claim 11, further comprising the additional steps of:
- determining at least one user attribute other than emotional state; and
  
  tailoring said at least one of a product and a service in response to said at least one user attribute other than emotional state.
13. The method of claim 3, further comprising the additional steps of:
- examining said at least one emotional state feature to determine if the user is in a jovial emotional state; and
  
  performing a marketing study on the user in response to said jovial emotional state.
14. The method of claim 13, further comprising the additional steps of:
- determining at least one user attribute other than emotional state; and
  
  tailoring said market study in response to said at least one user attribute other than emotional state.
15. The method of claim 3, wherein the voice system is a substantially automatic interactive voice response (IVR) system, further comprising the additional steps of:
- examining said at least one emotional state feature to determine if the user is in at least one of a disgusted, contemptuous, fearful and angry emotional state; and
  
  switching said user from said IVR to a human operator in response to said at least one of a disgusted, contemptuous, fearful and angry emotional state.
16. The method of claim 3, wherein the voice system is a hybrid interactive voice response (IVR) system, further comprising the additional steps of:
- examining said at least one emotional state feature to determine if the user is in at least one of a disgusted, contemptuous, fearful and angry emotional state; and
  
  switching said user from a low-level human operator to a higher-level human supervisor in response to said at least one of a disgusted, contemptuous, fearful and angry emotional state.
17. The method of claim 3, wherein the voice system is a substantially automatic interactive voice response (IVR) system, further comprising the additional steps of:
- examining said at least one emotional state feature to determine if the user is in a confused emotional state; and
  
  switching said user from said IVR to a human operator in response to said confused emotional state.

18. An apparatus for collecting data associated with a voice of a user, said apparatus comprising:
- (a) a dialog management unit which conducts a conversation with the user;
  
  (b) an audio capture module which is coupled to said dialog management unit and which captures a speech waveform associated with utterances spoken by the user during the conversation;
  
  (c) an acoustic front end which is coupled to said audio capture module and which is configured to;
  
  receive and digitize the speech waveform to provide a digitized speech waveform; and
  
  extract, from the digitized speech waveform, at least one acoustic feature which is correlated with at least one user attribute, said at least one user attribute including at least one of;
  
  (c-1) gender of the user;
  
  (c-2) age of the user;
  
  (c-3) accent of the user;
  
  (c-4) native language of the user;
  
  (c-5) dialect of the user;
  
  (c-6) socioeconomic classification of the user;
  
  (c-7) educational level of the user; and
  
  (c-8) emotional state of the user;
  
  (d) a processing module which is coupled to said acoustic front end and which analyzes said at least one acoustic feature to determine said at least one user attribute; and
  
  (e) a data warehouse which is coupled to said processing module and which stores said at least one user attribute, together with at least one identifying indicia, in a form for subsequent data mining thereon;
  
  wherein;
  
  said dialog management unit is configured to conduct a plurality of additional conversations with additional users;
  
  said audio capture module is configured to capture a plurality of additional speech waveforms associated with utterances spoken by said additional users during said plurality of additional conversations;
  
  said acoustic front end is configured to receive and digitize said plurality of additional speech waveforms to provide a plurality of additional digitized speech waveforms, and is further configured to extract, from said plurality of additional digitized speech waveforms, a plurality of additional acoustic features, each correlated with at least one attribute of one of said additional users;
  
  said processing module is configured to analyze said additional acoustic features to determine a plurality of additional user attributes;
  
  said data warehouse is configured to store said plurality of additional user attributes, each together with at least one additional identifying indicia, in said form for said subsequent data mining; and
  
  said processing module and said data warehouse are configured to mine the stored user attributes and identifying indicia to provide information for modifying underlying business logic of the apparatus.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 19. The apparatus of claim 18, wherein said audio capture module comprises one of an analog to digital converter board, an interactive voice response (IVR) system and a microphone.
  - 20. The apparatus of claim 18, wherein said dialog management unit comprises a telephone interactive voice response (IVR) system.
  - 21. The apparatus of claim 20, wherein said processing module comprises a processor portion of said IVR.
  - 22. The apparatus of claim 18, wherein said processing module comprises a separate general purpose computer with appropriate software.
  - 23. The apparatus of claim 18, wherein said processing module comprises an application specific circuit.
  - 24. The apparatus of claim 18, wherein said processing module comprises at least an emotional state classifier.
  - 25. The apparatus of claim 24, wherein said processing module further comprises at least:
26. The apparatus of claim 25, further comprising a post processor which is coupled to said data warehouse and which is configured to transcribe user utterances and to perform keyword spotting thereon.
27. The apparatus of claim 18, wherein said processing module is configured to modify behavior of the apparatus, in real time, based on said at least one user attribute.
28. The apparatus of claim 27, wherein said processing module modifies behavior of the apparatus, at least in part, by prompting a human operator thereof.
29. The apparatus of claim 27, wherein said processing module comprises a processor portion of an interactive voice response (IVR) system and wherein said processor module modifies behavior of the apparatus, at least in part, by modifying business logic of the IVR.

30. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for collecting, in a data warehouse, data associated with a voice of a voice system user, said method steps comprising:
- (a) reading digital data corresponding to a speech waveform associated with utterances spoken by the voice system user during a conversation between the voice system user and at least one of a human operator and a voice-enabled machine system;
  
  (b) extracting, from said digital data, at least one acoustic feature which is correlated with at least one user attribute, said at least one user attribute including at least one of;
  
  (b-1) gender of the user;
  
  (b-2) age of the user;
  
  (b-3) accent of the user;
  
  (b-4) native language of the user;
  
  (b-5) dialect of the user;
  
  (b-6) socioeconomic classification of the user;
  
  (b-7) educational level of the user; and
  
  (b-8) emotional state of the user; and
  
  (c) storing attribute data corresponding to said acoustic feature which is correlated with said at least one user attribute, together with at least one identifying indicia, in the data warehouse in a form to facilitate subsequent data mining thereon;
  
  (d) repeating steps (a)-(c) for a plurality of additional conversations, with additional users, to provide a collection of stored data including suitable attribute data and identifying indicia for each conversation; and
  
  (e) mining the collection of stored data to provide information for modifying underlying business logic of the voice system.
- View Dependent Claims (31)
- - 31. The program storage device of claim 30, wherein said method steps further comprise:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Kanevsky, Dimitri, Maes, Stephane Herman, Sorensen, Jeffrey Scott
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/371,400
Time in Patent Office

1,589 Days
Field of Search

704/275, 704/246, 704/207, 704/251, 704/270, 704/271, 704/273, 379/88.01, 379/88.16
US Class Current

704/275
CPC Class Codes

G10L 17/26   Recognition of special voic...

H04M 2201/40   using speech recognition sp...

H04M 3/51   Centralised call answering ...

Conversational data mining

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

463 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Conversational data mining

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

463 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links