Methods for generating natural language processing systems

US 10,127,214 B2
Filed: 12/09/2015
Issued: 11/13/2018
Est. Priority Date: 12/09/2014
Status: Active Grant

First Claim

Patent Images

1. A method for generating a natural language model, the method comprising:

ingesting, by a natural language platform comprising at least one processor coupled to at least one memory, training data representative of documents to be analyzed by the natural language model, wherein the training data includes at least one of a first document and a portion of the first document;

generating, by the natural language platform and based on topical content within the training data, a hierarchical data structure, the hierarchical data structure comprising at least two topical nodes, wherein the at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the training data within which the training data is to be subdivided into;

selecting among the training data, by the natural language platform, a plurality of documents to be annotated;

determining, by the natural language platform, for each document among the plurality of documents, a level of ambiguity in interpreting said document that the natural language platform is trying to resolve, wherein the level of ambiguity is dependent upon information currently possessed by the natural language platform;

generating, by the natural language platform, an annotation prompt for each document among the plurality of documents to be annotated, said annotation prompt being dynamically generated as either a first level prompt corresponding to a first level of specificity or a second level prompt corresponding to a second level of specificity, wherein both the first level prompt and the second level prompt comprise a human readable textual instruction generated by the natural language platform worded according to the first level of specificity or the second level of specificity, and the first level prompt and the second level prompt are presented alternatively,the first level of specificity and the second level of specificity corresponding to the level of ambiguity of said document,said annotation prompt configured to elicit an annotation about said document designed to resolve said level of ambiguity and indicating which node among the at least two topical nodes of the hierarchical data structure said document is to be classified into,wherein the first level of specificity comprises a first level of true-or-false question and the second level of specificity comprises a multiple-choice question comprising at least three options, wherein the first level of specificity corresponds to a lower level of ambiguity than the second level of specificity;

causing display of, by the natural language platform, the annotation prompt for each document among the plurality of documents to be annotated;

receiving, by the natural language platform, for each document among the plurality of documents to be annotated, the annotation in response to the displayed annotation prompt; and

generating, by the natural language platform, the natural language model using an adaptive machine learning process configured to determine, among the received annotations, patterns for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods are presented for generating a natural language model. The method may comprise: ingesting training data representative of documents to be analyzed by the natural language model, generating a hierarchical data structure comprising at least two topical nodes within which the training data is to be subdivided into by the natural language model, selecting a plurality of documents among the training data to be annotated, generating an annotation prompt for each document configured to elicit an annotation about said document indicating which node among the at least two topical nodes said document is to be classified into, receiving the annotation based on the annotation prompt; and generating the natural language model using an adaptive machine learning process configured to determine patterns among the annotations for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.

40 Citations

View as Search Results

19 Claims

1. A method for generating a natural language model, the method comprising:
- ingesting, by a natural language platform comprising at least one processor coupled to at least one memory, training data representative of documents to be analyzed by the natural language model, wherein the training data includes at least one of a first document and a portion of the first document;
  
  generating, by the natural language platform and based on topical content within the training data, a hierarchical data structure, the hierarchical data structure comprising at least two topical nodes, wherein the at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the training data within which the training data is to be subdivided into;
  
  selecting among the training data, by the natural language platform, a plurality of documents to be annotated;
  
  determining, by the natural language platform, for each document among the plurality of documents, a level of ambiguity in interpreting said document that the natural language platform is trying to resolve, wherein the level of ambiguity is dependent upon information currently possessed by the natural language platform;
  
  generating, by the natural language platform, an annotation prompt for each document among the plurality of documents to be annotated, said annotation prompt being dynamically generated as either a first level prompt corresponding to a first level of specificity or a second level prompt corresponding to a second level of specificity, wherein both the first level prompt and the second level prompt comprise a human readable textual instruction generated by the natural language platform worded according to the first level of specificity or the second level of specificity, and the first level prompt and the second level prompt are presented alternatively,the first level of specificity and the second level of specificity corresponding to the level of ambiguity of said document,said annotation prompt configured to elicit an annotation about said document designed to resolve said level of ambiguity and indicating which node among the at least two topical nodes of the hierarchical data structure said document is to be classified into,wherein the first level of specificity comprises a first level of true-or-false question and the second level of specificity comprises a multiple-choice question comprising at least three options, wherein the first level of specificity corresponds to a lower level of ambiguity than the second level of specificity;
  
  causing display of, by the natural language platform, the annotation prompt for each document among the plurality of documents to be annotated;
  
  receiving, by the natural language platform, for each document among the plurality of documents to be annotated, the annotation in response to the displayed annotation prompt; and
  
  generating, by the natural language platform, the natural language model using an adaptive machine learning process configured to determine, among the received annotations, patterns for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising:
    - testing, by the natural language platform, performance of the natural language model using a subset of the documents among the training data that received annotations.
  - 3. The method of claim 2, further comprising:
    - computing, by the natural language platform, a performance metric of the natural language model, based on results of the testing; and
      
      determining whether the natural language model satisfies at least one performance criterion based on the computed performance metric.
  - 4. The method of claim 3, further comprising:
    - performing, by the natural language platform, one or more optimization techniques configured to improve performance of the natural language platform, in response to determining that the natural language platform fails to satisfy the at least one performance criterion based on the computed performance metric.
  - 5. The method of claim 4, wherein the one or more optimization techniques comprises at least one of:
    - a feature selection process, a padding and rebalancing process of the natural language model, a pruning process of the natural language model, a feature discovery process, a smoothing process of the natural language model, or a model interpolation process.
  - 6. The method of claim 3, further comprising:
    - determining that the natural language platform fails to satisfy the at least one performance criterion based on the computed performance metric;
      
      in response to said determining;
      
      identifying a topical node among the two or more topical nodes of the hierarchical data structure that the natural language model fails to accurately categorize documents into;
      
      selecting a second plurality of documents to be annotated that were not previously annotated, the second plurality comprising documents associated with said topical node that the natural language model failed to accurately categorize documents into;
      
      generating a second set of annotation prompts comprising an annotation prompt for each document among the second plurality of documents to be annotated, said annotation prompt among the second set comprising a human readable textual communication that is configured to elicit an annotation about said document to improve the natural language model in accurately categorizing documents into said topical node;
      
      causing display of the second set of annotation prompts for documents among the second plurality of documents to be annotated;
      
      receiving, by the natural language platform, for documents among the second plurality of documents to be annotated, a second set of annotations in response to the second set of displayed annotation prompts; and
      
      generating, by the natural language platform, a refined natural language model using the adaptive machine learning process and based on the hierarchical data structure, the training data and the second set of annotations.
  - 7. The method of claim 1, wherein generating the hierarchical data structure comprises:
    - performing a topic modeling process configured to identify two or more topics among the content of the training data that is configured to define the two or more topical nodes of the hierarchical data structure.
  - 8. The method of claim 1, further comprising accessing one or more rules configured to instruct the natural language model how to categorize one or more documents into the two or more nodes of the hierarchical data structure.
  - 9. The method of claim 8, wherein generating the hierarchical data structure comprises:
    - conducting a rules generation process configured to evaluate logical consistency among the one or more rules.
  - 10. The method of claim 1, wherein generating the hierarchical data structure comprises:
    - generating, by the natural language platform, at least one annotation prompt for each topical node among the two or more topical nodes in the hierarchical data structure, said annotation prompt configured to elicit an annotation about said topical node indicating a level of accuracy of placement of the node within the hierarchical data structure;
      
      causing display of, by the natural language platform, the at least one annotation prompt for each topical node; and
      
      receiving, by the natural language platform, for each topical node, the annotation in response to the displayed annotation prompt.
  - 11. The method of claim 10, further comprising evaluating performance of the hierarchical data structure based on the annotations.
  - 12. The method of claim 11, further comprising in response to the evaluating, determining that the hierarchical data structure fails to satisfy at least one performance criterion;
    - andmodifying a logical relationship among the two or more topical nodes based on the annotations and in response to determining that the data structure fails to satisfy the at least one performance criterion.
  - 13. The method of claim 9, further comprising generating a training guideline based on the annotations to the nodes, the training guideline configured to provide instructions to an annotator for answering one or more annotation prompts for each document among the plurality of documents to be annotated.
  - 14. The method of claim 1, wherein the hierarchical data structure comprises at least a third topical node and a fourth topical node, wherein the third and fourth topical nodes both represent sub-partitions within the topical theme of the first node and organized by a third and fourth topical theme, respectively, among the topical content of the training data within which the training data is to be subdivided into.

15. A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to perform operations comprising:
- ingesting training data representative of documents to be analyzed by a natural language model, wherein the training data includes at least one of a first document and a portion of the first document;
  
  generating a hierarchical data structure, the hierarchical data structure comprising at least two topical nodes, wherein the at least two topical nodes represent partitions organized by two or more topical themes among the topical content of the training data within which the training data is to be subdivided into;
  
  selecting among the training data a plurality of documents to be annotated;
  
  determining, for each document among the plurality of documents, a level of ambiguity in interpreting said document that the natural language platform is trying to resolve, wherein the level of ambiguity is dependent upon information currently possessed by the natural language platform;
  
  generating an annotation prompt for each document among the plurality of documents to be annotated, said annotation prompt being dynamically generated as either a first level prompt corresponding to a first level of specificity or a second level prompt corresponding to a second level of specificity, wherein both the first level prompt and the second level prompt comprise a human readable textual instruction worded according to the first level of specificity or the second level of specificity, and the first level prompt and the second level prompt are presented alternatively,the first level of specificity and the second level of specificity corresponding to the level of ambiguity of said document,said annotation prompt configured to elicit an annotation about said document designed to resolve said level of ambiguity and indicating which node among the at least two topical nodes of the hierarchical data structure said document is to be classified into,wherein the first level of specificity comprises a first level of true-or-false question and the second level of specificity comprises a multiple-choice question comprising at least three options, wherein the first level of specificity corresponds to a lower level of ambiguity than the second level of specificity;
  
  causing display of the at least one annotation prompt for each document among the plurality of documents to be annotated;
  
  receiving for each document among the plurality of documents to be annotated, the annotation in response to the displayed annotation prompt; and
  
  generating the natural language model using an adaptive machine learning process configured to determine, among the received annotations, patterns for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The computer readable medium of claim 15, wherein the operations further comprise:
    - testing performance of the natural language model using a subset of the documents among the training data that received annotations.
  - 17. The computer readable medium of claim 16, wherein the operations further comprise:
    - computing a performance metric of the natural language model, based on results of the testing; and
      
      determining whether the natural language model satisfies at least one performance criterion based on the computed performance metric.
  - 18. The computer readable medium of claim 17, wherein the operations further comprise:
    - determining that the natural language platform fails to satisfy the at least one performance criterion based on the computed performance metric;
      
      in response to said determining;
      
      identifying a topical node among the two or more topical nodes of the hierarchical data structure that the natural language model fails to accurately categorize documents into;
      
      selecting a second plurality of documents to be annotated that were not previously annotated, the second plurality comprising documents associated with said topical node that the natural language model failed to accurately categorize documents into;
      
      generating a second set of annotation prompts comprising an annotation prompt for each document among the second plurality of documents to be annotated, said annotation prompt among the second set comprising a human readable textual communication that is configured to elicit an annotation about said document to improve the natural language model in accurately categorizing documents into said topical node;
      
      causing display of the second set of annotation prompts for documents among the second plurality of documents to be annotated;
      
      receiving, by the natural language platform, for documents among the second plurality of documents to be annotated, a second set of annotations in response to the second set of displayed annotation prompts; and
      
      generating, by the natural language platform, a refined natural language model using the adaptive machine learning process and based on the hierarchical data structure, the training data and the second set of annotations.
  - 19. The computer readable medium of claim 15, wherein generating the hierarchical data structure comprises:
    - generating at least one annotation prompt for each topical node among the two or more topical nodes in the hierarchical data structure, said annotation prompt configured to elicit an annotation about said topical node indicating a level of accuracy of placement of the node within the hierarchical data structure;
      
      causing display of the at least one annotation prompt for each topical node; and
      
      receiving for each topical node, the annotation in response to the displayed annotation prompt.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AI IP Investments Limited
Original Assignee
Sansa Corporation (Barbados) Inc.
Inventors
Munro, Robert J., Erle, Schuyler D., Walker, Christopher, Luger, Sarah K., Brenier, Jason, King, Gary C., Tepper, Paul A., Mechanic, Ross, Gilchrist-Scott, Andrew, Long, Jessica D., Robinson, James B., Callahan, Brendan D., Casbon, Michelle, Sarin, Ujjwal, Nair, Aneesh, Basavaraj, Veena, Saxena, Tripti, Nunez, Edgar, Hinrichs, Martha G., Most, Haley, Schnoebelen, Tyler J.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Kim, Jonathan

Application Number

US14/964,517
Publication Number

US 20160162456A1
Time in Patent Office

1,070 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24532   of parallel queries

G06F 16/285   Clustering or classification

G06F 16/288   Entity relationship models

G06F 16/3329   Natural language query form...

G06F 16/35   Clustering; Classification

G06F 16/367   Ontology

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06F 3/0482   Interaction with lists of s...

G06F 40/137   Hierarchical processing, e....

G06F 40/169   Annotation, e.g. comment da...

G06F 40/221   Parsing markup language str...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06F 40/42   Data-driven translation

G06N 20/00   Machine learning

G06Q 50/01   Social networking

Methods for generating natural language processing systems

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

40 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Methods for generating natural language processing systems

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links