UTILIZING USER-VERIFIED DATA FOR TRAINING CONFIDENCE LEVEL MODELS

US 20180181559A1
Filed: 01/27/2017
Published: 06/28/2018
Est. Priority Date: 12/22/2016
Status: Abandoned Application

First Claim

Patent Images

1. A method, comprising:

performing, by a processing device, syntactico-semantic analysis of a natural language text to produce a plurality of semantic structures;

interpreting, using a set of production rules, the plurality of semantic structures to extract a plurality of information objects representing entities referenced by the natural language text;

determining an attribute value for an information object of the plurality of information objects;

determining a confidence level associated with the attribute value, by evaluating a confidence function associated with the set of production rules;

responsive to determining that the confidence level falls below a threshold confidence value, verifying the attribute value;

appending, to a training data set, at least part of the natural language text referencing the information object and the attribute value; and

determining, using the training data set, at least one parameter of the confidence function.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for utilizing user-verified data for training confidence level models. An example method comprises: performing syntactico-semantic analysis of a natural language text to produce a plurality of semantic structures; interpreting, using a set of production rules, the plurality of semantic structures to extract a plurality of information objects representing entities referenced by the natural language text; determining an attribute value for an information object of the plurality of information objects; determining a confidence level associated with the attribute value, by evaluating a confidence function associated with the set of production rules; responsive to determining that the confidence level falls below a threshold confidence value, verifying the attribute value; appending, to a training data set, at least part of the natural language text referencing the information object and the attribute value; and determining, using the training data set, at least one parameter of the confidence function.

Citations

20 Claims

1. A method, comprising:
- performing, by a processing device, syntactico-semantic analysis of a natural language text to produce a plurality of semantic structures;
  
  interpreting, using a set of production rules, the plurality of semantic structures to extract a plurality of information objects representing entities referenced by the natural language text;
  
  determining an attribute value for an information object of the plurality of information objects;
  
  determining a confidence level associated with the attribute value, by evaluating a confidence function associated with the set of production rules;
  
  responsive to determining that the confidence level falls below a threshold confidence value, verifying the attribute value;
  
  appending, to a training data set, at least part of the natural language text referencing the information object and the attribute value; and
  
  determining, using the training data set, at least one parameter of the confidence function.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the confidence function is represented by a linear classifier producing a distance from the information object to a hyper-plane in a hyperspace of features associated with the set of production rules.
  - 3. The method of claim 1, wherein a semantic structure of the plurality of semantic structures is represented by a graph comprising a plurality of nodes corresponding to a plurality of semantic classes and a plurality of edges corresponding to a plurality of semantic relationships.
  - 4. The method of claim 1, wherein a production rule of the set of production rules comprises one or more logical expressions defined on one or more semantic structure templates.
  - 5. The method of claim 1, wherein verifying the attribute value further comprises receiving, via a graphical user interface, an input confirming the attribute value.
  - 6. The method of claim 1, wherein verifying the attribute value further comprises receiving, via a graphical user interface, an input modifying the attribute value.
  - 7. The method of claim 1, further comprising:
    - receiving the threshold confidence value via a graphical user interface.
  - 8. The method of claim 1, further comprising:
    - responsive to receiving, via a graphical user interface, an input confirming the attribute value, increasing the confidence level by a first pre-defined value.
  - 9. The method of claim 1, wherein updating the confidence level further comprises:
    - responsive to receiving, via a graphical user interface, an input confirming the attribute value, setting the confidence level to a second pre-defined value.

10. A system, comprising:
- a memory;
  
  a processor, coupled to the memory, the processor configured to;
  
  perform syntactico-semantic analysis of the natural language text to produce a plurality of semantic structures;
  
  interpret, using a set of production rules, the plurality of semantic structures to extract a plurality of information objects representing entities referenced by the natural language text;
  
  determine an attribute value for an information object of the plurality of information objects;
  
  determine a confidence level associated with the attribute value, by evaluating a confidence function associated with the set of production rules;
  
  responsive to determining that the confidence level falls below a threshold confidence value, verify the attribute value;
  
  append, to a training data set, at least part of the natural language text referencing the information object and the attribute value; and
  
  determine, using the training data set, at least one parameter of the confidence function.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The system of claim 10, wherein the confidence function is represented by a linear classifier producing a distance from the information object to a hyper-plane in a hyperspace of features associated with the set of production rules.
  - 12. The system of claim 10, wherein a semantic structure of the plurality of semantic structures is represented by a graph comprising a plurality of nodes corresponding to a plurality of semantic classes and a plurality of edges corresponding to a plurality of semantic relationships.
  - 13. The system of claim 10, wherein a production rule of the set of production rules comprises one or more logical expressions defined on one or more semantic structure templates.
  - 14. The system of claim 10, wherein verifying the attribute value further comprises receiving, via a graphical user interface, an input confirming the attribute value.
  - 15. The system of claim 10, wherein verifying the attribute value further comprises receiving, via a graphical user interface, an input modifying the attribute value.

16. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computer system, cause the computer system to:
- perform syntactico-semantic analysis of the natural language text to produce a plurality of semantic structures;
  
  interpret, using a set of production rules, the plurality of semantic structures to extract a plurality of information objects representing entities referenced by the natural language text;
  
  determine an attribute value for an information object of the plurality of information objects;
  
  determine a confidence level associated with the attribute value, by evaluating a confidence function associated with the set of production rules;
  
  responsive to determining that the confidence level falls below a threshold confidence value, verify the attribute value;
  
  append, to a training data set, at least part of the natural language text referencing the information object and the attribute value; and
  
  determine, using the training data set, at least one parameter of the confidence function.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable non-transitory storage medium of claim 16, wherein the confidence function is represented by a linear classifier producing a distance from the information object to a hyper-plane in a hyperspace of features associated with the set of production rules.
  - 18. The computer-readable non-transitory storage medium of claim 16, wherein a semantic structure of the plurality of semantic structures is represented by a graph comprising a plurality of nodes corresponding to a plurality of semantic classes and a plurality of edges corresponding to a plurality of semantic relationships.
  - 19. The computer-readable non-transitory storage medium of claim 16, wherein a production rule of the set of production rules comprises one or more logical expressions defined on one or more semantic structure templates.
  - 20. The computer-readable non-transitory storage medium of claim 16, wherein verifying the attribute value further comprises receiving, via a graphical user interface, an input confirming the attribute value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ABBYY Production LLC (ABBYY Software)
Original Assignee
ABBYY InfoPoisk LLC
Inventors
Matskevich, Stepan Evgenjevich, Belov, Andrey Alexandrovich

Application Number

US15/417,747
Publication Number

US 20180181559A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/04847   Interaction techniques to c...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/216   using statistical methods

G06F 40/268   Morphological analysis

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

G06F 40/35   Discourse or dialogue repre...

UTILIZING USER-VERIFIED DATA FOR TRAINING CONFIDENCE LEVEL MODELS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

UTILIZING USER-VERIFIED DATA FOR TRAINING CONFIDENCE LEVEL MODELS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links