Training a natural language processing model with information retrieval model annotations

US 9,536,522 B1
Filed: 12/30/2013
Issued: 01/03/2017
Est. Priority Date: 12/30/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by a data processing apparatus, the method comprising:

obtaining a training data set comprising at least one training example, the at least one training example being annotated with at least one training example natural language processing tag;

adding a training example information retrieval model annotation to the at least one training example in the training data set to obtain an annotated training data set;

training a natural language processing model on the annotated training data set to obtain a trained natural language processing model, wherein training comprises training the natural language processing model based on both the training example natural language processing tag and the training example information retrieval model annotation of the at least one training example, and wherein training the natural language processing model further comprises;

generating a part-of-speech tag for at least one word in the at least one training example,generating a confidence score for the part-of-speech tag, andfiltering out the part-of-speech tag if the confidence score for the part-of-speech tag is below a threshold;

receiving a search query or a potential search result;

adding an information retrieval model annotation to the search query or the potential search result;

applying the trained natural language processing model to the search query or the potential search result to obtain a prediction, wherein applying the trained natural language processing model to obtain the prediction comprises using the information retrieval model annotation added to the search query or the potential search result; and

using the prediction to retrieve information relevant to the search query or to determine relevance of the potential search result.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and techniques are provided for training a natural language processing model with information retrieval model annotations. A natural language processing model may be trained, through machine learning, using training examples that include part-of-speech tagging and annotations added by an information retrieval model. The natural language processing model may generate part-of-speech, parse-tree, beginning, inside, and outside label, mention chunking, and named-entity recognition predictions with confidence scores for text in the training examples. The information retrieval model annotations and part-of-speech tagging in the training example may be used to determine the accuracy of the predictions, and the natural language processing model may be adjusted. After training, the natural language processing model may be used to make predictions for novel input, such as search queries and potential search results. The search queries and potential search results may have information retrieval model annotations.

77 Citations

View as Search Results

15 Claims

1. A computer-implemented method performed by a data processing apparatus, the method comprising:
- obtaining a training data set comprising at least one training example, the at least one training example being annotated with at least one training example natural language processing tag;
  
  adding a training example information retrieval model annotation to the at least one training example in the training data set to obtain an annotated training data set;
  
  training a natural language processing model on the annotated training data set to obtain a trained natural language processing model, wherein training comprises training the natural language processing model based on both the training example natural language processing tag and the training example information retrieval model annotation of the at least one training example, and wherein training the natural language processing model further comprises;
  
  generating a part-of-speech tag for at least one word in the at least one training example,generating a confidence score for the part-of-speech tag, andfiltering out the part-of-speech tag if the confidence score for the part-of-speech tag is below a threshold;
  
  receiving a search query or a potential search result;
  
  adding an information retrieval model annotation to the search query or the potential search result;
  
  applying the trained natural language processing model to the search query or the potential search result to obtain a prediction, wherein applying the trained natural language processing model to obtain the prediction comprises using the information retrieval model annotation added to the search query or the potential search result; and
  
  using the prediction to retrieve information relevant to the search query or to determine relevance of the potential search result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein the at least one training example comprises text and wherein the at least one training example natural language processing tag comprises a part-of-speech tag.
  - 3. The computer-implemented method of claim 1, wherein the training example is a training example search query or a training example potential search result.
  - 4. The computer-implemented method of claim 3, wherein the at least one training example comprises the training example search query, and wherein the training example search query is received from a user and wherein the at least one training example natural language processing tag comprises a part-of-speech tag of the training example search query.
  - 5. The computer-implemented method of claim 1, wherein the training example information retrieval model annotation comprises an annotation identifying at least one word as at least one of a multi-word expression, a phrase, and a proper name.
  - 6. The computer-implemented method of claim 1, wherein the training example information retrieval model annotation comprises an annotation used by an information retrieval model for search query analysis.
  - 7. The computer-implemented method of claim 1, wherein adding the training example information retrieval model annotation to the at least one training example further comprises adding the training example information retrieval model annotation to the at least one training example with at least one component of the natural language processing model.
  - 8. The computer-implemented method of claim 1, wherein the prediction comprises at least one part-of-speech prediction.

9. A computer-implemented method performed by a data processing apparatus, the method comprising:
- generating a trained natural language processing model based on receiving natural language training on a training data set, wherein the training data set comprises a plurality of training examples, each training example comprising text, natural language processing tags, and information retrieval model annotations, the information retrieval model annotations being generated by an information retrieval model;
  
  wherein receiving natural language training on the training data set comprises;
  
  extracting information retrieval features from one of the training examples in the training data set based on the information retrieval model annotations,predicting a part-of-speech tag for at least one word in the training example,generating a confidence score for the predicted part-of-speech tag, andfiltering the predicted part-of-speech tag if the confidence score is below a threshold;
  
  receiving a target document comprising text and at least one information retrieval model annotation, the at least one information retrieval model annotation being generated by the information retrieval model;
  
  generating a prediction and an additional confidence score for the prediction for at least one word in the text of the target document, wherein generating the prediction and the additional confidence score for the prediction comprises applying the text of the target document and the information retrieval model annotation to the trained natural language processing model to generate the prediction; and
  
  using the prediction in performing one or more further actions relevant to the target document.
- View Dependent Claims (10, 11)
- - 10. The computer-implemented method of claim 9, wherein the target document is one of a search query and a potential search result.
  - 11. The computer-implemented method of claim 9, further comprising increasing the confidence score for a mention-chunking prediction that corresponds to a mention-chunk identified by an information retrieval model annotation in the target document.

12. A system comprising:
- one or more computers and one or more storage devices storing instructions which are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  obtaining a training data set comprising at least one training example, the at least one training example being annotated with at least one training example natural language processing tag;
  
  adding a training example information retrieval model annotation to the at least one training example in the training data set to obtain an annotated training data set;
  
  training a natural language processing model on the annotated training data set to obtain a trained natural language processing model, wherein the operation of training comprises training the natural language processing model based on both the training example natural language processing tag and the training example information retrieval model annotation of the at least one training example;
  
  wherein training the natural language processing model on the annotated training data set comprises;
  
  extracting information retrieval features from the at least one training example in the annotated training data set based on the information retrieval model annotation,predicting part-of-speech tags for at least one word in the at least one training example,generating a confidence score for the predicted part-of-speech tags, andfiltering the predicted part-of-speech tags if the confidence score is below a threshold;
  
  receiving a target document comprising text and at least one information retrieval model annotation; and
  
  generating, for at least one word in the text, a prediction and an additional confidence score for the prediction with the natural language processing model, wherein the operation of generating the prediction and the additional confidence score comprises using the information retrieval model annotation with the natural language processing model to generate the prediction.
- View Dependent Claims (13, 14)
- - 13. The system of claim 12, wherein the target document is one of a search query and a potential search result.
  - 14. The system of claim 12, wherein the at least one training example comprises text and wherein the at least one training example natural language processing tag comprises a part-of-speech tag.

15. A computer-implemented method performed by a data processing apparatus, the method comprising:
- generating a trained natural language processing model based on receiving natural language training on a training data set, wherein the training data set comprises a plurality of training examples, each training example comprising text, natural language processing tags, and information retrieval model annotations, the information retrieval model annotations being generated by an information retrieval model;
  
  receiving a target document comprising text and at least one information retrieval model annotation, the at least one information retrieval model annotation being generated by the information retrieval model;
  
  generating a prediction and a confidence score for the prediction for at least one word in the text of the target document, wherein generating the prediction and the confidence score for the prediction comprises applying the text of the target document and the information retrieval model annotation to the trained natural language processing model to generate the prediction;
  
  increasing the confidence score for a mention-chunking prediction that corresponds to a mention-chunk identified by an information retrieval model annotation in the target document; and
  
  using the prediction in performing one or more further actions relevant to the target document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Koo, Terry Yang-Hoe, Hall, Keith, Das, Dipanjan, Pereira, Fernando
Primary Examiner(s)
He, Jialong

Application Number

US14/143,011
Time in Patent Office

1,100 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/3332   Query translation

G06F 16/3344   using natural language anal...

G06N 20/00   Machine learning

G06N 5/02   Knowledge representation; S...

G10L 15/063   Training

G10L 15/1822   Parsing for meaning underst...

Training a natural language processing model with information retrieval model annotations

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

77 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Training a natural language processing model with information retrieval model annotations

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

77 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links