Creation of structured data from plain text
First Claim
1. A method for creating structured data representation from a plain text description for an application domain, given a domain model that defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary, the method comprising:
- parsing the plain text description using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
mapping at least some of the parse trees onto the domain model to create a plurality of instance trees; and
selecting at least one of the instance trees to create the structured data.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for converting plain text into structured data. Parse trees for the plain text are generated based on the grammar of a natural language, the parse trees are mapped on to instance trees generated based on an application-specific model. The best map is chosen, and the instance tree is passing to an application for execution. The method and system can be used both for populating a database and/or for retrieving data from a database based on a query.
251 Citations
23 Claims
-
1. A method for creating structured data representation from a plain text description for an application domain, given a domain model that defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary, the method comprising:
-
parsing the plain text description using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
mapping at least some of the parse trees onto the domain model to create a plurality of instance trees; and
selecting at least one of the instance trees to create the structured data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
representing the plurality of parse trees generated in a single directed acyclic graph.
-
-
4. The method of claim 1, wherein the grammar used to generate the plurality of parse trees is context-free.
-
5. The method of claim 1, wherein the vocabulary used is a general vocabulary.
-
6. The method of claim 1, wherein the vocabulary used is specific to the application domain.
-
7. The method of claim 1, wherein mapping the plurality of parse trees onto instance trees comprises:
-
generating a plurality of instance trees based on the domain model;
pruning the plurality of generated instance trees by discarding incomplete instance trees, to create a second plurality of instance trees; and
choosing one instance tree from the second plurality of instance trees.
-
-
8. The method of claim 7, wherein choosing one instance tree from the second plurality of instance trees comprises:
choosing the instance tree which covers the maximum number of words of plain text.
-
9. A method for creating structured data representation from a plain text description for an application domain, given a domain model that defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary, the method comprising:
-
parsing the plain text using a grammar and the vocabulary to generate a parse tree, the grammar defined independently of the domain model;
mapping the parse tree onto the domain model to create an instance tree of objects of the application domain;
creating the structured data using the instance tree.
-
-
10. A method for creating structured data representation from a plain text description for an application domain, the method comprising:
-
constructing a domain model which defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary;
parsing the plain text using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
generating, based on the domain model, a plurality of instance trees, each instance tree generated corresponding to a parse tree;
choosing one instance tree from the plurality of instance trees; and
generating structured data based on the chosen instance tree.
-
-
11. A method for creating structured data representation from a plain text description for an application domain, the method comprising:
-
constructing a domain model which defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary;
parsing the plain text using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
generating, based on the domain model, a plurality of instance trees, each instance tree generated corresponding to a parse tree;
pruning the plurality of instance trees to create a second plurality of instance trees;
choosing one instance tree from the second plurality of instance trees; and
generating structured data based on the chosen instance tree.
-
-
12. A computer-implemented system for creating structured data representation from a plain text description for an application domain, the system comprising:
-
a parser for parsing the plain text using a grammar and a vocabulary to generate a plurality of parse trees, the grammar defined independently of the application domain;
a mapper communicatively coupled to the parser, for mapping the plurality of parse trees onto a domain model to create a plurality of instance trees, the domain model defining objects of the application domain and relationships between the objects and identifying the objects with the vocabulary; and
an output simplifier communicatively coupled to the mapper, wherein instructions to the output simplifier are included in the domain model. - View Dependent Claims (13, 14, 15, 16, 17)
a model storage communicatively coupled to the mapper for providing it with the domain model.
-
-
14. The system of claim 12, further comprising:
a grammar storage communicatively coupled to the parser for providing the parser with the grammar.
-
15. The system of claim 12, further comprising:
a vocabulary storage for storing vocabulary specific to the application.
-
16. The system of claim 12, further comprising:
a vocabulary storage for storing general vocabulary.
-
17. The system of claim 12, wherein updating the vocabulary updates the domain model.
-
18. A computer-implemented system for creating structured data representation from a plain text description for an application domain, the system comprising:
-
a parser for parsing the plain text using a grammar and a vocabulary to generate a parse tree;
a mapper communicatively coupled to the parser, for mapping the parse tree onto a domain model to create an instance tree, the domain model defining objects of the application domain and relationships between the objects and identifying the objects with the vocabulary, the grammar defined independently of the domain model; and
an output simplifier communicatively coupled to the mapper, wherein instructions to the output simplifier are included in the domain model.
-
-
19. A computer program product for storing a program for permitting a computer to perform a method for creating structured data representation from a plain text description for an application domain, given a domain model that defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary, the method comprising:
-
parsing the plain text description using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
mapping at least some of the parse trees onto the domain model to create a plurality of instance trees; and
selecting at least one of the instance trees to create the structured data.
-
-
20. A computer program product for storing a program for permitting a computer to perform a method for creating structured data representation from a plain text description for an application domain, given a domain model that defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary, the method comprising:
-
parsing the plain text using a grammar and the vocabulary to generate a parse tree, the grammar defined independently of the domain model;
mapping the parse tree onto the domain model to create an instance tree of objects of the application domain; and
creating the structured data using the instance tree.
-
-
21. A computer program product for storing a program for permitting a computer to perform a method for creating structured data representation from a plain text description for an application domain, the method comprising:
-
constructing a domain model which defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary;
parsing the plain text using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
generating, based on the domain model, a plurality of instance trees, each instance tree generated corresponding to a parse tree;
choosing one instance tree from the plurality of instance trees; and
generating structured data based on the chosen instance tree.
-
-
22. A computer program product for storing a program for permitting a computer to perform a method for creating structured data representation from a plain text description for an application domain, the method comprising:
-
constructing a domain model which defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary;
parsing the plain text using a grammar and the vocabulary to generate a plurality of parse trees, the grammar defined independently of the domain model;
generating, based on the domain model, a plurality of instance trees, each instance tree generated corresponding to a parse tree;
pruning the plurality of instance trees to create a second plurality of instance trees;
choosing one instance tree from the second plurality of instance trees; and
generating structured data based on the chosen instance tree.
-
-
23. A computer-implemented system for creating structured data representation from a plain text description for an application domain, the system comprising:
-
a content engine comprising;
a parser for parsing the plain text using a grammar and a vocabulary to generate a plurality of parse trees;
a mapper communicatively coupled to the parser, for mapping the plurality of parse trees onto a domain model to create a plurality of instance trees, the grammar defined independently of the domain model; and
an output simplifier communicatively coupled to the mapper, wherein instructions to the output simplifier are included in the domain model;
a grammar storage communicatively coupled to the content engine for providing the content engine with the grammar;
a model storage communicatively coupled to the content engine for providing the content engine with the domain model which defines objects of the application domain and relationships between the objects, and identifies the objects with a vocabulary; and
a vocabulary storage for providing the content engine with the vocabulary.
-
Specification