Information extraction using a trainable grammar
First Claim
Patent Images
1. A computer-implemented method for information extraction, comprising:
- defining a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept;
training the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols; and
parsing a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for information extraction includes defining a stochastic context free grammar (SCFG) including symbols and rules applicable to the symbols, the symbols including at least one output concept. The SCFG is trained on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols. A document is parsed using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document.
53 Citations
24 Claims
-
1. A computer-implemented method for information extraction, comprising:
-
defining a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept;
training the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols; and
parsing a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
- 11. A computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a definition of a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept, to train the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols, and to parse a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document.
-
21. Apparatus for information extraction (IE), comprising:
-
an input interface, which is coupled to receive a definition of a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept; and
an IE processor, which is adapted to train the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols, and to parse a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document. - View Dependent Claims (22, 23, 24)
-
Specification