Natural-language processing system using a large corpus
First Claim
1. ) A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing, comprising, in combination:
- a) for a first adjoining pair, comprising a first pair element and a second pair element, of such natural-language elements of such input string, finding, from such string data from such corpus, a first listing of each such element syntactically equivalent to such first pair element and a second listing of each such element syntactically equivalent to such second pair element;
b) from matching each such first-listing element with each such second-listing element, making a matched-pairs third listing by finding which matched pairs of said matching are found in such string data from such corpus; and
c) for such matched pairs of such matched-pairs third listing, finding, from such string data from such corpus, a fourth listing of each fourth such natural-language element syntactically equivalent to any such matched pair of said third listing.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer-parsing system based upon using vectors (lists) to represent natural-language elements, providing a robust, distributed way to score grammaticality of an input string by using as a source material a large corpus of natural-language text. The system uses recombining of asymetric associations of syntactically similar strings to form an the vectors. The system uses equivalence lists for your the organization subparts of the string to build equivalence lists for our the province longer strings in an order controlled by the potential these/parse to be scored. The power of recombination of Entries from: vector elements in building longer strings provides a means of representing collocational complexity. Grammaticality scoring is based upon the number and similarity of the vector elements.
65 Citations
26 Claims
-
1. ) A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing, comprising, in combination:
-
a) for a first adjoining pair, comprising a first pair element and a second pair element, of such natural-language elements of such input string, finding, from such string data from such corpus, a first listing of each such element syntactically equivalent to such first pair element and a second listing of each such element syntactically equivalent to such second pair element;
b) from matching each such first-listing element with each such second-listing element, making a matched-pairs third listing by finding which matched pairs of said matching are found in such string data from such corpus; and
c) for such matched pairs of such matched-pairs third listing, finding, from such string data from such corpus, a fourth listing of each fourth such natural-language element syntactically equivalent to any such matched pair of said third listing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 24)
-
-
19. ) A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string, to be parsed, of natural-language elements in the subject language, for assisting natural-language parsing, comprising, in combination:
-
a) for each of at least two natural-language input subcombinations which are potential subparses of such input string, building an equivalence list of all corpus strings syntactically equivalent to such each input string subcombination;
b) from such equivalence lists, in different orders for each potential parse of said input string, building to a final equivalence list for each such potential parse of such input string; and
c) from the number and quality of entries in each respective such final equivalence list, scoring the grammaticality of such respective potential parse. - View Dependent Claims (20)
-
-
21. ) A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing, comprising, in combination:
-
a) for a first adjoining pair, comprising a first pair element and a second pair element, of such natural-language elements of such input string, finding, from such string data from such corpus, a first listing of each such element syntactically equivalent to such first pair element and a second listing of each such element syntactically equivalent to such second pair element; and
b) from matching each such first-listing element with each such second-listing element, making a matched-pairs third listing by finding which matched pairs of said matching are found in such string data from such corpus;
c) wherein at least one of said first adjoining pair comprises at least a pair of natural-language elements. - View Dependent Claims (22)
-
-
25. ) A computer-readable medium (for a computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing) whose contents cause a computer system to determine a grammatical parse by:
-
a) for each of at least two natural-language input subcombinations which are potential subparses of such input string, building an equivalence list of all corpus strings syntactically equivalent to such each input string subcombination;
b) from such equivalence lists, in different orders for each potential parse of said input string, building to a final equivalence list for each such potential parse of such input string; and
c) from the number and quality of entries in each respective such final equivalence list, scoring the grammaticality of such respective potential parse.
-
-
26. ) A computer-implemented natural-language system (for a computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing) comprising:
-
a) for each of at least two natural-language input subcombinations which are potential subparses of such input string, means for building an equivalence list of all corpus strings syntactically equivalent to such each input string subcombination;
b) means for building, from such equivalence lists, in different orders for each potential parse of said input string, to a final equivalence list for each such potential parse of such input string; and
c) means for scoring, from the number and quality of entries in each respective such final equivalence list, the grammaticality of such respective potential parse.
-
Specification