Apparatus and method for generating processor usable data from natural language input data
First Claim
1. Processing apparatus for generating data in a processor usable form from input data in the form of units in a natural language in which the units are of a plurality of different categories, the processing apparatus comprising:
- data unit generating means for categorizing units of input data into respective categories to generate processor usable data units comprising unit data and corresponding unit category data, said data units comprising one of a group consisting of words, lexical units and semantic units and said unit category data comprising one of a group consisting of parts of speech, words and lexical features; and
a cascaded plurality of finite state matching means, each of said finite state matching means being configured in accordance with grammar rules for the natural language, a first of said cascaded plurality of finite state matching means being operable to match said unit category data with at least one predetermined pattern of unit category data and to output group category data for any said unit category data found to match said at least one predetermined pattern of unit category data, the or each other said finite state matching means of the cascade being operable to use any unmatched unit category data and said group category data from at least one previous said finite state matching means of the cascade in place of matched category data to match said unit and/or group category data with at least one predetermined pattern of unit and/or group category data and to output new group category data for any unit and/or group category data found to match said at least one predetermined pattern of unit and/or category data;
wherein at least one of said finite state matching means is operable to output said unit data corresponding to matched unit category data as a plurality of variables, at least one said variable being indexed by another said variable.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus generates data in processor usable form from natural language input data units in different unit categories. Input data categorized into categories generates data units having unit identification data and corresponding unit category data for input to a cascaded plurality of matching processing stages. Each matching processing stage of this cascade except the first uses any unmatched unit category data and group category data from previous matching processing stages in place of matched category data to match the unit and/or group category data with at least one predetermined pattern of unit and/or group category data. New group category data is output for any unit and/or group category matching each predetermined pattern of unit and/or group category data. At least one of the matching processing stages outputs unit data corresponding to matched unit category data as a plurality of variables. At least one of these variables is indexed by another of the variables.
64 Citations
33 Claims
-
1. Processing apparatus for generating data in a processor usable form from input data in the form of units in a natural language in which the units are of a plurality of different categories, the processing apparatus comprising:
-
data unit generating means for categorizing units of input data into respective categories to generate processor usable data units comprising unit data and corresponding unit category data, said data units comprising one of a group consisting of words, lexical units and semantic units and said unit category data comprising one of a group consisting of parts of speech, words and lexical features; and
a cascaded plurality of finite state matching means, each of said finite state matching means being configured in accordance with grammar rules for the natural language, a first of said cascaded plurality of finite state matching means being operable to match said unit category data with at least one predetermined pattern of unit category data and to output group category data for any said unit category data found to match said at least one predetermined pattern of unit category data, the or each other said finite state matching means of the cascade being operable to use any unmatched unit category data and said group category data from at least one previous said finite state matching means of the cascade in place of matched category data to match said unit and/or group category data with at least one predetermined pattern of unit and/or group category data and to output new group category data for any unit and/or group category data found to match said at least one predetermined pattern of unit and/or category data;
wherein at least one of said finite state matching means is operable to output said unit data corresponding to matched unit category data as a plurality of variables, at least one said variable being indexed by another said variable. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
the processing apparatus according to claim 1;
comparing means for comparing said variables generated from said input data with variables generated from reference data by comparing variables starting from a variable indicated to be the head of said input data or reference data which does not modify any others of said input data or reference data in accordance with relationships defining equivalence between said variables; and
control means for controlling the operation of a system in accordance with the result of the comparison.
-
-
16. A processor implemented method of generating data in processor usable form from input data in the form of units in a natural language in which the units are of a plurality of different categories, the method comprising:
-
categorizing units of input data into respective categories to generate processor usable data units comprising unit data and corresponding unit category data, said data units comprising one of a group consisting of words, lexical units and semantic units and said unit category data consisting of parts of speech, words and lexical features;
a first matching step of using a cascaded plurality of finite state matching means to match said unit category data with at least one predetermined pattern of unit category data and to output group category data for any said unit category data found to match said at least one predetermined pattern of unit category data, each of said finite state matching means being configured in accordance with grammar rules for the natural language;
at least one further matching step using a finite state matching means of using any unmatched said unit category data and said group category data from at least one previous matching step in place of matched category data to match said unit and/or group category data with at least one predetermined pattern of unit and/or group category data, and outputting new group category data for any unit and/or group category data found to match said at least one predetermined pattern of unit and/or category data;
wherein at least one of said matching step outputs said unit data corresponding to matched unit category data as a plurality of variables, and at least one said variable is indexed by another said variable. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
comparing said variables generated from said input data with variables generated from reference data by comparing variables starting from a variable indicated to be the head of said input data or reference data which does not modify any others of said input data or reference data respectively, in accordance with relationships defining equivalence between said variables; and
controlling the operation of a system in accordance with the result of the comparison.
-
-
32. A carrier medium carrying processor implementing instructions for controlling a processor to carry out the method according to claim 16.
-
33. A signal carrying processor implementing instructions for controlling a processor to carry out the method according to claim 16.
Specification