Metric for automatic assessment of conversational responses

US 9,967,211 B2
Filed: 05/31/2015
Issued: 05/08/2018
Est. Priority Date: 05/31/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for automatic assessment of machine generated responses, said method comprising:

extracting candidate context-message-response n-tuples, by an extraction component of a computing device, from at least one source of conversational data;

forming a set of multi-reference responses selected from the candidate context-message-response n-tuples extracted by the extraction component;

calculating an assessment metric for the machine generated response, by at least one processor, based on the set of multi-reference responses; and

generating a metric score for the machine generated response based on the assessment metric, by the at least one processor, the metric score indicating a quality of the machine-generated response relative to the set of multi-reference responses.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Examples are generally directed towards automatic assessment of machine generated conversational responses. Context-message-response n-tuples are extracted from at least one source of conversational data to generate a set of multi-reference responses. A response in the set of multi-reference responses includes it context-message data pair and rating. The rating indicates a quality of the response relative to the context-message data pair. A response assessment engine generates a metric score for a machine-generated response based on an assessment metric and the set of multi-reference responses. The metric score indicates a quality of the machine-generated conversational response relative to a user-generated message and a context of the user-generated message. A response generation system of a computing device, such as a digital assistant, is optimized and adjusted based on the metric score to improve the accuracy, quality, and relevance of responses output to the user.

Citations

20 Claims

1. A computer-implemented method for automatic assessment of machine generated responses, said method comprising:
- extracting candidate context-message-response n-tuples, by an extraction component of a computing device, from at least one source of conversational data;
  
  forming a set of multi-reference responses selected from the candidate context-message-response n-tuples extracted by the extraction component;
  
  calculating an assessment metric for the machine generated response, by at least one processor, based on the set of multi-reference responses; and
  
  generating a metric score for the machine generated response based on the assessment metric, by the at least one processor, the metric score indicating a quality of the machine-generated response relative to the set of multi-reference responses.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computer-implemented method of claim 1, wherein extracting candidate context-message-response n-tuples from at least one source of conversational data and forming a set of multi-reference responses further comprises:
    - extracting candidate context-message-response n tuples from the at least one source of conversational data, wherein individual candidate context-message-response n-tuples comprise a human-generated message, a conversational context, and a reference response corresponding to the human-generated message.
  - 3. The computer-implemented method of claim 2, further comprising:
    - selecting a response from the extracted candidate context-message-response n tuples based on a context of a message associated with the response to form a reference response in the set of multi-reference responses, wherein a message associated with the reference response corresponds to the selected human-generated message.
  - 4. The computer-implemented method of claim 2, further comprising:
    - selecting a response from the extracted candidate context-message-response n-tuples based on conversational context of the response to form a reference response in the set of multi-reference responses, wherein the conversational context associated with the reference response corresponds to the conversational context of the machine-generated response.
  - 5. The computer-implemented method of claim 4, wherein a conversational context of a message comprises linguistic context data and non-linguistic context data, wherein the linguistic context data comprises message-response data pairs preceding the selected message and the selected machine-generated response in a conversation.
  - 6. The computer-implemented method of claim 2, further comprising:
    - extracting the candidate context-message-response n-tuples from the at least one source of conversational data via a network connection, wherein the at least one source of conversational data is at least one of a social media source, wherein the social media source provides conversational data in at least one format, wherein a format of conversational data comprises a text format, an audio format, or a visual format.
  - 7. The computer-implemented method of claim 1, wherein a rating of individual multi-reference responses in the set of multi-reference responses is a human-generated rating, and further comprising:
    - accessing the rating of the individual multi-reference responses in the set of multi-reference responses, wherein the rating indicates a quality of the individual multi-references responses relative to a reference multi-reference response.
  - 8. The computer-implemented method of claim 1, further comprising:
    - determining a rating for individual multi-reference responses in the set of multi-reference responses is a rating on a scale other than a negative one to positive one scale, normalizing the rating to form a normalized rating within a range from negative one to positive one.
  - 9. The computer-implemented method of claim 1, wherein the set of multi-reference responses is a test set of multi-reference responses, and further comprising:
    - training the response assessment engine based on a training set of multi-reference context-response-message n-tuples extracted from the at least one source of conversational data, wherein training the response assessment engine further comprises calculating the assessment metric based on the training set of multi-reference context-message-response n-tuples to train a set of weights associated with the response assessment engine.
  - 10. The computer-implemented method of claim 1, wherein the metric score is a score within a scale from zero to one, and wherein generating the metric score further comprises:
    - calculating an amount of word sequence overlap between the machine-generated response and a reference response in the set of multi-reference responses, wherein an overlap of zero indicates no words in common between the machine-generated response and the reference response, and wherein an overlap of one indicates the machine-generated response is identical to the reference response.
  - 11. The computer-implemented method of claim 10, further comprising:
    - on determining an overlap between the machine-generated response and the references response, determining a rating of the reference response;
      
      increasing the metric score on determining the rating of the reference response is a positive rating; and
      
      decreasing the metric score on determining the rating of the reference response is a negative rating.

12. A system for automatic assessment of machine generated responses, said system comprising:
- at least one processor; and
  
  a memory storage device associated with the at least one processor, the memory storage device comprising a memory area storing a response assessment engine, wherein the at least one processor executes the response assessment engine to;
  
  calculate an assessment metric for at least one machine-generated response, based on a set of multi-reference responses, a set of ratings and contextual data being associated with the set of multi-reference responses;
  
  generate at least one metric score indicating a quality of the at least one machine-generated response relative to at least one multi-reference response from the set of multi-reference responses; and
  
  update a set of parameters associated with the response generation system based on the at least one metric score.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The system of claim 12, wherein the metric score is a score within a scale from zero to one, and wherein the at least one processor further executes the response assessment engine to:
    - calculate an amount of word sequence overlap between the machine-generated response and a reference response in the set of multi-reference responses, wherein an overlap of zero indicates no words in common between the machine-generated response and the reference response, and wherein an overlap of one indicates the machine-generated response is identical to the reference response.
  - 14. The system of claim 12, wherein the at least one processor further executes the response assessment engine to:
    - identify an amount of overlap between the machine-generated response and a reference response;
      
      increase a metric score of the machine-generated response on determining a rating of the reference response is a positive rating; and
      
      decrease the metric score of the machine-generated response on determining the rating of the reference response is a negative rating.
  - 15. The system of claim 12, wherein the at least one processor further executes the response assessment engine to:
    - generate a first metric score associated with a first machine-generated response;
      
      update the set of parameters in response to the first machine-generated response to form a modified set of parameters;
      
      generate a second metric score associated with a second machine-generated response; and
      
      update the modified set of parameters based on the second metric score, wherein the set of parameters are incrementally adjusted to increase metric scores.
  - 16. The system of claim 12, wherein the at least one processor further executes the response assessment engine to:
    - calculate the assessment metric based on a training set of multi-reference context-message-response n-tuples.

17. One or more computer storage media embodying computer-executable components, said components comprising:
- an extraction component that when executed causes at least one processor to;
  
  extract a plurality of candidate context-message-response n-tuples from at least one source of conversational data; and
  
  select at least one candidate context-message-response n-tuple from the plurality of candidate context-message-response n-tuples associated with a machine-generated response to form a set of multi-reference responses; and
  
  a response assessment engine that when executed causes at least one processor to;
  
  generate a metric score for the machine-generated response based on the set of multi-reference responses, a conversational context of the machine-generated response, and an assessment metric, the metric score indicating a quality of the machine-generated response relative to the set of multi-reference responses.
- View Dependent Claims (18, 19, 20)
- - 18. The computer storage media of claim 17, wherein the at least one source of conversational data is at least one of a social media source, wherein the social media source provides conversational data in at least one format, wherein a format of conversational data comprises a text format, an audio format, or a visual format.
  - 19. The computer storage media of claim 17, wherein the response assessment engine, when executed, further causes at least one processor to:
    - select a response from the plurality of candidate context-message-response n-tuples based on the conversational context of the response to form a reference response in the set of multi-reference responses, wherein the conversational context associated with the reference response corresponds to the conversational context of the machine-generated response, wherein the conversational context comprises linguistic context data and non-linguistic context data, wherein the linguistic context data comprises message-response data pairs preceding the selected message and the machine-generated response in a conversation.
  - 20. The computer storage media of claim 17, wherein individual multi-reference responses in the set of multi-reference responses includes a rating, and wherein the response assessment engine, when executed, further causes at least one processor to:
    - normalize the rating to form a normalized rating within a range from negative one to positive one, wherein a negative value rating indicates that a multi-reference response in the set of multi-reference responses is sub-optimal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Galley, Michel, Sordoni, Alessandro, Brockett, Christopher John, Gao, Jianfeng, Dolan, William Brennan, Ji, Yangfeng, Auli, Michael, Mitchell, Margaret Ann, Quirk, Christopher Brian
Primary Examiner(s)
Sall, El Hadji

Application Number

US14/726,569
Publication Number

US 20160352657A1
Time in Patent Office

1,073 Days
Field of Search

709206, 709224, 709217, 709203
US Class Current
CPC Class Codes

G06F 40/56   Natural language generation

H04L 51/02   using automatic reactions o...

H04L 51/10   Multimedia information

H04L 51/226   Delivery according to prior...

Metric for automatic assessment of conversational responses

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Metric for automatic assessment of conversational responses

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links