METHODS FOR GENERATING NATURAL LANGUAGE PROCESSING SYSTEMS
First Claim
1. A method for generating a natural language model, the method comprising:
- ingesting, by a natural language platform comprising at least one processor coupled to at least one memory, training data representative of documents to be analyzed by the natural language model;
generating, by the natural language platform and based on topical content within the training data, a hierarchical data structure, the hierarchical data structure comprising at least two topical nodes, wherein at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the training data within which the training data is to be subdivided into;
selecting among the training data, by the natural language platform, a plurality of documents to be annotated;
generating, by the natural language platform, at least one annotation prompt for each document among the plurality of documents to be annotated, said annotation prompt configured to elicit an annotation about said document indicating which node among the at least two topical nodes of the hierarchal data structure said document is to be classified into;
causing display of, by the natural language platform, at least one annotation prompt for each document among the plurality of documents to be annotated;
receiving, by the natural language platform, for each document among the plurality of documents to be annotated, the annotation in response to the displayed annotation prompt; and
generating, by the natural language platform, the natural language model using an adaptive machine learning process configured to determine, among the received annotations, patterns for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.
12 Assignments
0 Petitions
Accused Products
Abstract
Methods are presented for generating a natural language model. The method may comprise: ingesting training data representative of documents to be analyzed by the natural language model, generating a hierarchical data structure comprising at least two topical nodes within which the training data is to be subdivided into by the natural language model, selecting a plurality of documents among the training data to be annotated, generating an annotation prompt for each document configured to elicit an annotation about said document indicating which node among the at least two topical nodes said document is to be classified into, receiving the annotation based on the annotation prompt; and generating the natural language model using an adaptive machine learning process configured to determine patterns among the annotations for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.
-
Citations
20 Claims
-
1. A method for generating a natural language model, the method comprising:
-
ingesting, by a natural language platform comprising at least one processor coupled to at least one memory, training data representative of documents to be analyzed by the natural language model; generating, by the natural language platform and based on topical content within the training data, a hierarchical data structure, the hierarchical data structure comprising at least two topical nodes, wherein at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the training data within which the training data is to be subdivided into; selecting among the training data, by the natural language platform, a plurality of documents to be annotated; generating, by the natural language platform, at least one annotation prompt for each document among the plurality of documents to be annotated, said annotation prompt configured to elicit an annotation about said document indicating which node among the at least two topical nodes of the hierarchal data structure said document is to be classified into; causing display of, by the natural language platform, at least one annotation prompt for each document among the plurality of documents to be annotated; receiving, by the natural language platform, for each document among the plurality of documents to be annotated, the annotation in response to the displayed annotation prompt; and generating, by the natural language platform, the natural language model using an adaptive machine learning process configured to determine, among the received annotations, patterns for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for updating a natural language model, the method comprising:
-
utilizing the natural language model to identify topical content of untested data and to classify said untested data into at least two topical nodes of a hierarchical data structure according to the identified topical content of the untrained data, the hierarchical data structure comprising the at least two topical nodes, wherein the at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the untested data within which the untested data is to be subdivided into; determining that the natural language model classifies at least a subset of the untested data into the at least two topical nodes with a low degree of certainty; and modifying the natural language model with updated data, the updated data comprising a subset of the untested data that the natural language model has classified with a low degree of certainty.
-
-
16. A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to perform operations comprising:
-
ingesting training data representative of documents to be analyzed by a natural language model; generating a hierarchical data structure, the hierarchical data structure comprising at least two topical nodes, wherein the at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the training data within which the training data is to be subdivided into; selecting among the training data a plurality of documents to be annotated; generating at least one annotation prompt for each document among the plurality of documents to be annotated, said annotation prompt configured to elicit an annotation about said document indicating which node among the at least two topical nodes of the hierarchal data structure said document is to be classified into; causing display of the at least one annotation prompt for each document among the plurality of documents to be annotated; receiving for each document among the plurality of documents to be annotated, the annotation in response to the displayed annotation prompt; and generating the natural language model using an adaptive machine learning process configured to determine, among the received annotations, patterns for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure. - View Dependent Claims (17, 18, 19, 20)
-
Specification