TECHNIQUES FOR COMBINING HUMAN AND MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING

US 20190311024A1
Filed: 11/09/2018
Published: 10/10/2019
Est. Priority Date: 12/09/2014
Status: Abandoned Application

First Claim

Patent Images

1. A method for generating a natural language model, the method comprising:

receiving more than one annotation of a document;

calculating a level of agreement among the received annotations;

determining that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement;

determining an aggregated annotation representing an aggregation of information in the received annotations and training a natural language model using the aggregated annotation, when the first criterion is satisfied;

generating at least one human readable prompt configured to receive additional annotations of the document, when the second criterion is satisfied; and

discarding the received annotations from use in training the natural language model, when the third criterion is satisfied.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, apparatuses and computer readable medium are presented for generating a natural language model. A method for generating a natural language model comprises: receiving more than one annotation of a document; calculating a level of agreement among the received annotations; determining that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement; determining an aggregated annotation representing an aggregation of information in the received annotations and training a natural language model using the aggregated annotation, when the first criterion is satisfied; generating at least one human readable prompt configured to receive additional annotations of the document, when the second criterion is satisfied; and discarding the received annotations from use in training the natural language model, when the third criterion is satisfied.

5 Citations

20 Claims

1. A method for generating a natural language model, the method comprising:
- receiving more than one annotation of a document;
  
  calculating a level of agreement among the received annotations;
  
  determining that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement;
  
  determining an aggregated annotation representing an aggregation of information in the received annotations and training a natural language model using the aggregated annotation, when the first criterion is satisfied;
  
  generating at least one human readable prompt configured to receive additional annotations of the document, when the second criterion is satisfied; and
  
  discarding the received annotations from use in training the natural language model, when the third criterion is satisfied.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the second criterion is satisfied when the number of annotations received is less than a minimum number.
  - 3. The method of claim 1, wherein the annotations of the document comprise selection of one or more portions of the document relevant to one or more topics.
  - 4. The method of claim 1, wherein the annotations of the document comprise selection of one or more categories among a plurality of categories.
  - 5. The method of claim 4, wherein the level of agreement is determined for each category based on a percentage of annotations that select said category.
  - 6. The method of claim 5, wherein:
    - the first criterion is satisfied when the number of annotations received is at least a minimum number and the level of agreement for a category is at least a threshold level; and
      
      the aggregated annotation is determined as selecting or not selecting said category.
  - 7. The method of claim 5, wherein the second criterion is satisfied when the number of annotations received is less than a maximum number and the level of agreement is less than a threshold level.
  - 8. The method of claim 5, wherein the third criterion is satisfied when the number of annotations received is at least a maximum number and the level of agreement is less than a threshold level.
  - 9. The method of claim 4, wherein a numerical value is assigned to each of the plurality of categories.
  - 10. The method of claim 9, wherein:
    - the level of agreement comprises a difference between the highest numerical value and the lowest numerical value among the selected categories;
      
      the first criterion is satisfied when the difference is no more than a threshold value; and
      
      the third criterion is satisfied when the difference is more than the threshold value.
  - 11. The method of claim 10, wherein the aggregated annotation is determined as selection of a category with the numerical value closest to a mean of the numerical values of all received annotations.
  - 12. The method of claim 10, wherein the aggregated annotation is determined as selection of a category with the numerical value closest to a median of the numerical values of all received annotations.
  - 13. The method of claim 1, wherein determining that the criterion among the first criterion, the second criterion, and the third criterion is satisfied is further based on a result of an analysis of the document by one or more pre-existing natural language models.
  - 14. The method of claim 1, wherein determining that the criterion among the first criterion, the second criterion, and the third criterion is satisfied is further based on known performance levels of annotators.
  - 15. The method of claim 1, wherein at least one of the annotations received comprises prediction by a pre-existing natural language model.

16. An apparatus for generating a natural language model, the apparatus comprising one or more processors configured to:
- receive more than one annotation of a document;
  
  calculate a level of agreement among the received annotations;
  
  determine that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement;
  
  determine an aggregated annotation representing an aggregation of information in the received annotations and train a natural language model using the aggregated annotation, when the first criterion is satisfied;
  
  generate at least one human readable prompt configured to receive additional annotations of the document, when a second criterion is satisfied; and
  
  discard the received annotations from use in training the natural language model, when the third criterion is satisfied.
- View Dependent Claims (17, 18, 19)
- - 17. The apparatus of claim 16, wherein the annotations of the document comprise selection of one or more categories among a plurality of categories.
  - 18. The apparatus of claim 17, wherein the level of agreement is determined for each category based on a percentage of annotations that select said category.
  - 19. The apparatus of claim 17, whereina numerical value is assigned to each of the plurality of categories;

20. A non-transitory computer readable medium comprising instructions that, when executed by a processor, cause the processor to:
- receive more than one annotation of a document;
  
  calculate a level of agreement among the received annotations;
  
  determine that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement;
  
  determine an aggregated annotation representing an aggregation of information in the received annotations and train a natural language model using the aggregated annotation, when the first criterion is satisfied;
  
  generate at least one human readable prompt configured to receive additional annotations of the document, when a second criterion is satisfied; and
  
  discard the received annotations from use in training the natural language model, when the third criterion is satisfied.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AI IP Investments Limited
Original Assignee
AIPARC holdings Pte. Ltd.
Inventors
Munro, Robert J., Walker, Christopher, Luger, Sarah K., Callahan, Brendan D., King, Gary C., Tepper, Paul A., Thompson, Jana N., Schnoebelen, Tyler J., Brenier, Jason, Long, Jessica D.

Application Number

US16/185,843
Publication Number

US 20190311024A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24532   of parallel queries

G06F 16/285   Clustering or classification

G06F 16/288   Entity relationship models

G06F 16/3329   Natural language query form...

G06F 16/35   Clustering; Classification

G06F 16/367   Ontology

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06F 3/0482   Interaction with lists of s...

G06F 40/137   Hierarchical processing, e....

G06F 40/169   Annotation, e.g. comment da...

G06F 40/221   Parsing markup language str...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06F 40/42   Data-driven translation

G06N 20/00   Machine learning

G06Q 50/01   Social networking

TECHNIQUES FOR COMBINING HUMAN AND MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

5 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

TECHNIQUES FOR COMBINING HUMAN AND MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

5 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links