Training a statistical parser on noisy data by filtering
First Claim
Patent Images
1. A computer-implemented method of creating training data to train a parser in a selected domain, comprising:
- parsing unannotated text of the selected domain using a first parser to obtain parsed text;
identifying in the parsed text a subset thereof that is more appropriate than other portions for obtaining an improved parsing model in the selected domain; and
creating the improved parsing model using the subset of parsed text and a training module.
2 Assignments
0 Petitions
Accused Products
Abstract
A filtering or identifying approach is disclosed and applied to the task of unsupervised adaptation of a parsing model to a selected domain. In particular, unannotated text data from the selected domain is parsed using a first parser. A subset of the parsed text is then selected and used to train an improved model using a training module which can be of the type that outputs a parsing model that is usable by the first parser or can be of the type that outputs a parsing model that is usable by another type of parser.
-
Citations
20 Claims
-
1. A computer-implemented method of creating training data to train a parser in a selected domain, comprising:
-
parsing unannotated text of the selected domain using a first parser to obtain parsed text;
identifying in the parsed text a subset thereof that is more appropriate than other portions for obtaining an improved parsing model in the selected domain; and
creating the improved parsing model using the subset of parsed text and a training module. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer readable medium having instructions which when performed by a computer create training data for training a parser, the instructions comprising:
-
parsing unannotated text of the selected domain using a first parser to obtain parsed text;
ranking portions of the parsed text to identify a subset thereof that is more appropriate than other portions for obtaining an improved parsing model in the selected domain; and
creating the improved parsing model using the subset of parsed text and a training module. - View Dependent Claims (17, 18, 19, 20)
-
Specification