Natural-language processing system using a large corpus
First Claim
1. A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing, comprising, in combination:
- a) for a first adjoining pair, comprising a first pair element and a second pair element, of such natural-language elements of such input string, finding, from such string data from such corpus, a first listing of each such element syntactically related to such first pair element and a second listing of each such element syntactically related to such second pair element;
b) from matching each such first-listing element with each such second-listing element, making a matched-pairs third listing by finding which matched pairs of said matching are found in such string data from such corpus;
c) for such matched pairs of such matched-pairs third listing, finding, from such string data from such corpus, a recombined fourth listing of each fourth such natural-language element syntactically related to any such matched pair of said third listing; and
d) scoring each such natural-language element of such fourth listing, such scoring comprising scoring each element, as it is related to combinations between other elements.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer-parsing system using vectors (lists) to represent natural-language elements, providing a robust, distributed way to score grammaticality of an input string by using as a source material a large corpus of natural-language text. The system uses recombining of asymmetric associations of syntactically similar strings to form the vectors. The system uses equivalence lists for subparts of the string to build equivalence lists for longer strings in an order controlled by the potential parse to be scored. The power of recombination of vector elements in building longer strings provides a means of representing collocational complexity. Grammaticality scoring is based upon the number and similarity of the vector elements.
-
Citations
24 Claims
-
1. A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing, comprising, in combination:
-
a) for a first adjoining pair, comprising a first pair element and a second pair element, of such natural-language elements of such input string, finding, from such string data from such corpus, a first listing of each such element syntactically related to such first pair element and a second listing of each such element syntactically related to such second pair element; b) from matching each such first-listing element with each such second-listing element, making a matched-pairs third listing by finding which matched pairs of said matching are found in such string data from such corpus; c) for such matched pairs of such matched-pairs third listing, finding, from such string data from such corpus, a recombined fourth listing of each fourth such natural-language element syntactically related to any such matched pair of said third listing; and d) scoring each such natural-language element of such fourth listing, such scoring comprising scoring each element, as it is related to combinations between other elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19)
-
-
14. A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string, to be parsed, of natural-language elements in the subject language, for assisting natural-language parsing, comprising, in combination:
-
a) for each of at least two natural-language input subcombinations which are potential subparses of such input string, building a recombined list of all corpus strings syntactically related to such each input string subcombination, where the recombination of such recombined list is a paradigmatic recombination of words related by co-occurrence in sequence, representing longer strings using different paradigmatic recombinations of shorter strings; b) from such recombined lists, in different orders for each potential parse of said input string, building to a final recombined list for each such potential parse of such input string; and c) from the number and quality of entries in each respective such final recombined list, scoring the grammaticality of such respective potential parse. - View Dependent Claims (15)
-
-
16. A computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting natural-language processing, comprising, in combination:
-
a) for a first adjoining pair, comprising a first pair element and a second pair element, of such natural-language elements of such input string, finding, from such string data from such corpus, a first listing of each such element syntactically related to such first pair element and a second listing of each such element syntactically related to such second pair element; and b) from matching each such first-listing element with each such second-listing element, making a matched-pairs third listing by finding which matched pairs of said matching are found in such string data from such corpus; c) wherein at least one of said first adjoining pair comprises at least a pair of natural-language elements and the corresponding listing includes at least some more primitive elements, representing longer strings using different paradigmatic recombinations of shorter strings, where such recombination is a paradigmatic recombination of words related by co-occurrence in sequence. - View Dependent Claims (17)
-
-
20. A computer-readable medium for a computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting in natural-language processing whose contents cause a computer system to determine a grammatical parse by:
-
a) for each of at least two natural-language input subcombinations which are potential subparses of such input string, building a recombined list of all corpus strings syntactically related to such each input string subcombination, where the recombination of such recombined list is a paradigmatic recombination of words related by co-occurrence in sequence, representing longer strings using different paradigmatic recombinations of shorter strings; b) from such recombined lists, in different orders for each potential parse of said input string, building to a final recombined list for each such potential parse of such input string; and c) from the number and quality of entries in each respective such final recombined list, scoring the grammaticality of such respective potential parse.
-
-
21. A computer-implemented natural-language system for a computer system, using a provided corpus of linear natural-language elements of natural language text string data in a subject language and an input string of natural-language elements in the subject language, for assisting in natural-language processing comprising:
-
a) for each of at least two natural-language input subcombinations which are potential subparses of such input string, means for building a recombined list of all corpus strings syntactically related to such each input string subcombination, where the recombination of such recombined list is a paradigmatic recombination of words related by co-occurrence in sequence, representing longer strings using different paradigmatic recombinations of shorter strings; b) means for building, from such recombined lists, in different orders for each potential parse of said input string, to a final recombined list for each such potential parse of such input string; and c) means for scoring, from the number and quality of entries in each respective such final recombined list,the grammaticality of such respective potential parse.
-
-
22. A method for finding structure in pattern recognition systems, by recombining scores of elements of vectors, lists, network associations, or other method of relating elements, where by “
- recombining”
is meant changing scores of individual elements as they are related to combinations of other elements; and
representing combinations of elements using scores of many individual elements;
scoring each element as it is related to combinations between other elements. - View Dependent Claims (23, 24)
- recombining”
Specification