System and method for machine-mediated human-human conversation

US 9,741,338 B2
Filed: 12/09/2015
Issued: 08/22/2017
Est. Priority Date: 12/06/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

generating a conversation context model based on user utterances and facial recognition data, wherein the conversation context model comprises a model of a speech dialog occurring between a speech dialog system and a speaker;

continuously comparing the speech dialog to the conversation context model, to yield a context similarity score;

modifying the context similarity score based on a head orientation of the speaker, to yield a modified context similarity score;

when the modified context similarity score is above a threshold, incorporating a current user utterance into the conversation context model for use in the speech dialog; and

when the modified context similarity score is one of equaling the threshold and below the threshold, suppressing the current user utterance such that the current user utterance is not incorporated into the conversation context model and the speech dialog produces speech as though the current user utterance is not in the conversation context model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context.

34 Citations

View as Search Results

20 Claims

1. A method comprising:
- generating a conversation context model based on user utterances and facial recognition data, wherein the conversation context model comprises a model of a speech dialog occurring between a speech dialog system and a speaker;
  
  continuously comparing the speech dialog to the conversation context model, to yield a context similarity score;
  
  modifying the context similarity score based on a head orientation of the speaker, to yield a modified context similarity score;
  
  when the modified context similarity score is above a threshold, incorporating a current user utterance into the conversation context model for use in the speech dialog; and
  
  when the modified context similarity score is one of equaling the threshold and below the threshold, suppressing the current user utterance such that the current user utterance is not incorporated into the conversation context model and the speech dialog produces speech as though the current user utterance is not in the conversation context model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further comprising:
    - computing an n-gram distribution for the user utterances in the conversation context model;
      
      computing a perplexity of the current user utterances based on the n-gram distribution;
      
      when, based on a perplexity threshold, the current user utterance is a continuation of the conversation context model, incorporating the current user utterance into the conversation context model and updating the n-gram distribution based on the user utterance; and
      
      when, based on the perplexity threshold, the current user utterance is not a continuation of the conversation context model, discarding the current user utterance.
  - 3. The method of claim 1, wherein comparing the current user utterance to the conversation context model is based on a task model associated with a specific task.
  - 4. The method of claim 3, wherein the task model indicates one of a conversation structure, a grammar, and a dictionary.
  - 5. The method of claim 1, wherein the conversation context model is generated using a combination of speech recognition and a non-speech context source.
  - 6. The method of claim 1, wherein user utterances are monitored without being triggered by a “
    - push-to-talk”
      
      event.
  - 7. The method of claim 6, wherein the “
    - push-to-talk”
      
      event comprises explicit signaling to control a microphone.
  - 8. The method of claim 1, wherein the conversation context model describes a human-human dialog.
  - 9. The method of claim 1, wherein the conversation context model describes a human-machine dialog.
  - 10. The method of claim 1, further comprising applying a noise suppression mechanism when monitoring the user utterances.
  - 11. The method of claim 1, further comprising determining that the current user utterance is not intended to trigger the speech processing because of a deviation from the conversation context model which exceeds the threshold.
  - 12. The method of claim 11, wherein the threshold is based on one of a user, a topic, the conversation context model, confidence scores, and background noise.

13. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  generating a conversation context model based on user utterances and facial recognition data, wherein the conversation context model comprises a model of a speech dialog occurring between a speech dialog system and a speaker;
  
  continuously comparing the speech dialog to the conversation context model, to yield a context similarity score;
  
  modifying the context similarity score based on a head orientation of the speaker, to yield a modified context similarity score;
  
  when the modified context similarity score is above a threshold, incorporating a current user utterance into the conversation context model for use in the speech dialog; and
  
  when the modified context similarity score is one of equaling the threshold and below the threshold, suppressing the current user utterance such that the current user utterance is not incorporated into the conversation context model and the speech dialog produces speech as though the current user utterance is not in the conversation context model.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system of claim 13, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising:
    - computing an n-gram distribution for the user utterances in the conversation context model;
      
      computing a perplexity of the current user utterance based on the n-gram distribution;
      
      when, based on a perplexity threshold, the current user utterance is a continuation of the conversation context model, incorporating the current user utterance into the conversation context model and updating the n-gram distribution based on the current user utterance; and
      
      when, based on the perplexity threshold, the current user utterance is not a continuation of the conversation context model, discarding the current user utterance.
  - 15. The system of claim 14, wherein comparing the current user utterance to the conversation context model is based on a task model associated with a specific task.
  - 16. The system of claim 15, wherein the task model indicates one of a conversation structure, a grammar, and a dictionary.
  - 17. The system of claim 14, wherein the conversation context model is generated using a combination of speech recognition and a non-speech context source.
  - 18. The system of claim 14, wherein user utterances are monitored without being triggered by a “
    - push-to-talk”
      
      event.
  - 19. The system of claim 14, wherein the conversation context model describes a human-human dialog.

20. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- generating a conversation context model based on user utterances and facial recognition data, wherein the conversation context model comprises a model of a speech dialog occurring between a speech dialog system and a speaker;
  
  continuously comparing the speech dialog to the conversation context model, to yield a context similarity score;
  
  modifying the context similarity score based on a head orientation of the speaker, to yield a modified context similarity score;
  
  when the modified context similarity score is above a threshold, incorporating a current user utterance into the conversation context model for use in the speech dialog; and
  
  when the modified context similarity score is one of equaling the threshold and below the threshold, suppressing the current user utterance such that the current user utterance is not incorporated into the conversation context model and the speech dialog produces speech as though the current user utterance is not in the conversation context model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bangalore, Srinivas
Primary Examiner(s)
Godbold, Douglas

Application Number

US14/963,479
Publication Number

US 20160093296A1
Time in Patent Office

622 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/075   supervised, i.e. under mach...

G10L 15/1822   Parsing for meaning underst...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/20   Speech recognition techniqu...

G10L 15/25   using position of the lips,...

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

G10L 25/84   for discriminating voice fr...

System and method for machine-mediated human-human conversation

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

System and method for machine-mediated human-human conversation

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others