Information extraction using a trainable grammar
First Claim
Patent Images
1. A computer-implemented method for information extraction, comprising:
- defining a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept;
training the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols; and
parsing a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for information extraction includes defining a stochastic context free grammar (SCFG) including symbols and rules applicable to the symbols, the symbols including at least one output concept. The SCFG is trained on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols. A document is parsed using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document.
-
Citations
24 Claims
-
1. A computer-implemented method for information extraction, comprising:
-
defining a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept;
training the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols; and
parsing a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
- 11. A computer software product, comprising a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a definition of a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept, to train the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols, and to parse a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document.
-
21. Apparatus for information extraction (IE), comprising:
-
an input interface, which is coupled to receive a definition of a stochastic context free grammar (SCFG) comprising symbols and rules applicable to the symbols, the symbols comprising at least one output concept; and
an IE processor, which is adapted to train the SCFG on a tagged training corpus so as to determine probabilities of the rules and of one or more of the symbols, and to parse a document using the rules and symbols responsively to the probabilities so as to extract occurrences of the at least one output concept from the document. - View Dependent Claims (22, 23, 24)
-
Specification