Training a natural language processing model with information retrieval model annotations
First Claim
1. A computer-implemented method performed by a data processing apparatus, the method comprising:
- obtaining a training data set comprising at least one training example, the at least one training example being annotated with at least one training example natural language processing tag;
adding a training example information retrieval model annotation to the at least one training example in the training data set to obtain an annotated training data set;
training a natural language processing model on the annotated training data set to obtain a trained natural language processing model, wherein training comprises training the natural language processing model based on both the training example natural language processing tag and the training example information retrieval model annotation of the at least one training example, and wherein training the natural language processing model further comprises;
generating a part-of-speech tag for at least one word in the at least one training example,generating a confidence score for the part-of-speech tag, andfiltering out the part-of-speech tag if the confidence score for the part-of-speech tag is below a threshold;
receiving a search query or a potential search result;
adding an information retrieval model annotation to the search query or the potential search result;
applying the trained natural language processing model to the search query or the potential search result to obtain a prediction, wherein applying the trained natural language processing model to obtain the prediction comprises using the information retrieval model annotation added to the search query or the potential search result; and
using the prediction to retrieve information relevant to the search query or to determine relevance of the potential search result.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and techniques are provided for training a natural language processing model with information retrieval model annotations. A natural language processing model may be trained, through machine learning, using training examples that include part-of-speech tagging and annotations added by an information retrieval model. The natural language processing model may generate part-of-speech, parse-tree, beginning, inside, and outside label, mention chunking, and named-entity recognition predictions with confidence scores for text in the training examples. The information retrieval model annotations and part-of-speech tagging in the training example may be used to determine the accuracy of the predictions, and the natural language processing model may be adjusted. After training, the natural language processing model may be used to make predictions for novel input, such as search queries and potential search results. The search queries and potential search results may have information retrieval model annotations.
77 Citations
15 Claims
-
1. A computer-implemented method performed by a data processing apparatus, the method comprising:
-
obtaining a training data set comprising at least one training example, the at least one training example being annotated with at least one training example natural language processing tag; adding a training example information retrieval model annotation to the at least one training example in the training data set to obtain an annotated training data set; training a natural language processing model on the annotated training data set to obtain a trained natural language processing model, wherein training comprises training the natural language processing model based on both the training example natural language processing tag and the training example information retrieval model annotation of the at least one training example, and wherein training the natural language processing model further comprises; generating a part-of-speech tag for at least one word in the at least one training example, generating a confidence score for the part-of-speech tag, and filtering out the part-of-speech tag if the confidence score for the part-of-speech tag is below a threshold; receiving a search query or a potential search result; adding an information retrieval model annotation to the search query or the potential search result; applying the trained natural language processing model to the search query or the potential search result to obtain a prediction, wherein applying the trained natural language processing model to obtain the prediction comprises using the information retrieval model annotation added to the search query or the potential search result; and using the prediction to retrieve information relevant to the search query or to determine relevance of the potential search result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method performed by a data processing apparatus, the method comprising:
-
generating a trained natural language processing model based on receiving natural language training on a training data set, wherein the training data set comprises a plurality of training examples, each training example comprising text, natural language processing tags, and information retrieval model annotations, the information retrieval model annotations being generated by an information retrieval model; wherein receiving natural language training on the training data set comprises; extracting information retrieval features from one of the training examples in the training data set based on the information retrieval model annotations, predicting a part-of-speech tag for at least one word in the training example, generating a confidence score for the predicted part-of-speech tag, and filtering the predicted part-of-speech tag if the confidence score is below a threshold; receiving a target document comprising text and at least one information retrieval model annotation, the at least one information retrieval model annotation being generated by the information retrieval model; generating a prediction and an additional confidence score for the prediction for at least one word in the text of the target document, wherein generating the prediction and the additional confidence score for the prediction comprises applying the text of the target document and the information retrieval model annotation to the trained natural language processing model to generate the prediction; and using the prediction in performing one or more further actions relevant to the target document. - View Dependent Claims (10, 11)
-
-
12. A system comprising:
- one or more computers and one or more storage devices storing instructions which are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
obtaining a training data set comprising at least one training example, the at least one training example being annotated with at least one training example natural language processing tag; adding a training example information retrieval model annotation to the at least one training example in the training data set to obtain an annotated training data set; training a natural language processing model on the annotated training data set to obtain a trained natural language processing model, wherein the operation of training comprises training the natural language processing model based on both the training example natural language processing tag and the training example information retrieval model annotation of the at least one training example; wherein training the natural language processing model on the annotated training data set comprises; extracting information retrieval features from the at least one training example in the annotated training data set based on the information retrieval model annotation, predicting part-of-speech tags for at least one word in the at least one training example, generating a confidence score for the predicted part-of-speech tags, and filtering the predicted part-of-speech tags if the confidence score is below a threshold; receiving a target document comprising text and at least one information retrieval model annotation; and generating, for at least one word in the text, a prediction and an additional confidence score for the prediction with the natural language processing model, wherein the operation of generating the prediction and the additional confidence score comprises using the information retrieval model annotation with the natural language processing model to generate the prediction. - View Dependent Claims (13, 14)
- one or more computers and one or more storage devices storing instructions which are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
-
15. A computer-implemented method performed by a data processing apparatus, the method comprising:
-
generating a trained natural language processing model based on receiving natural language training on a training data set, wherein the training data set comprises a plurality of training examples, each training example comprising text, natural language processing tags, and information retrieval model annotations, the information retrieval model annotations being generated by an information retrieval model; receiving a target document comprising text and at least one information retrieval model annotation, the at least one information retrieval model annotation being generated by the information retrieval model; generating a prediction and a confidence score for the prediction for at least one word in the text of the target document, wherein generating the prediction and the confidence score for the prediction comprises applying the text of the target document and the information retrieval model annotation to the trained natural language processing model to generate the prediction; increasing the confidence score for a mention-chunking prediction that corresponds to a mention-chunk identified by an information retrieval model annotation in the target document; and using the prediction in performing one or more further actions relevant to the target document.
-
Specification