Data processing system implemented process and compiling technique for performing context-free parsing algorithm based on register vector grammar
First Claim
Patent Images
1. A method, utilizing a data processing system, for compiling context-free phase structure grammar productions into context-free register vector grammar productions and parsing a word string, said method comprising the steps of:
- a) translating the phase structure grammar productions for a given nonterminal category into a finite-state automaton such that each phase structure grammar production is represented by an independent sequence of transitions beginning at an initial state;
b) reducing the finite-state automaton by removing nondeterminism, removing inaccessible states, and merging indistinguishable states;
c) translating each of the finite-state automaton states into a register vector grammar production; and
d) repeating steps a) -c) for each nonterminal category to translate the entire phrase structure grammar into register vector grammar; and
e) parsing a word string to determine whether the word string is a member of the language defined by the register vector grammar.
1 Assignment
0 Petitions
Accused Products
Abstract
A context-free parsing algorithm employing register vector grammars provides fast parsing of natural languages. A compiler for register vector grammars accepts input grammars as standard phase structure rules and generates strongly equivalent grammars in register vector grammar form. By applying the context-free register vector grammar parsing algorithm to the resulting grammars, strings may be parsed and trees may be constructed in the same manner performed with phase structure grammar.
-
Citations
4 Claims
-
1. A method, utilizing a data processing system, for compiling context-free phase structure grammar productions into context-free register vector grammar productions and parsing a word string, said method comprising the steps of:
-
a) translating the phase structure grammar productions for a given nonterminal category into a finite-state automaton such that each phase structure grammar production is represented by an independent sequence of transitions beginning at an initial state; b) reducing the finite-state automaton by removing nondeterminism, removing inaccessible states, and merging indistinguishable states; c) translating each of the finite-state automaton states into a register vector grammar production; and d) repeating steps a) -c) for each nonterminal category to translate the entire phrase structure grammar into register vector grammar; and e) parsing a word string to determine whether the word string is a member of the language defined by the register vector grammar. - View Dependent Claims (2, 3, 4)
-
4. The method of claim 1, wherein the word string has n tokens X0, X1 X2 ... Xi ... Xn-1, followed by a dummy terminator symbol Xn, and wherein step e) is performed by the steps of:
-
(1) constructing a set of states Si, where i initially equals 0, containing the single state <
c, v, f>
, where c is the root category of the grammar, v is the initialization vector associated with c, and f equals 0;(2) processing the state in set Si by scanning the register vector grammar productions associated with c to find those productions whose condition vector cv matches v, using a rvg-match operation; 93) for those register vector grammar productions whose category is nonterminal and which in step (2) match, applying a predict operation to the state and to the production by adding to set S2 a new state consisting of the production'"'"'s nonterminal category, the initialization vector of the category, and index i; (4) for those register vector grammar productions whose category is terminal and which in step (2) match, with the terminal being the same as input token Xi, applying a shift operation to the production and to the state by adding to set Si+1 a copy of the state being processed, with the vector thereof updated by applying the result vector rv of the production to v, using an rvg-apply operation; (5) matching the close vector associated with c to v to determine whether the complete operation is applicable to the state; (6) recognizing each state <
d, w, g,>
in set Sf in which d is nonterminal, d has a production whose category is c, and the cv of the production matches w, using the rvg-match operation; and(2) adding to set Si a copy of each state <
d, w, g>
with the vector thereof updated by applying the rv of the production to w using the rvg-apply operation;(7) repeating steps (2)-(6) for each additional state <
c, v, f>
) in set Si added by steps (g) and (j);(8) repeating steps (20-(7) for sets S1 ... Sn ; (9) matching the close vector of the root category c with the vector v of each state in set Sn whose category is c, using the rvg-match operation; and (10) recognizing a successful parse of the word string when a match of step (9) is successful, and otherwise recognizing that the word string is not a member of the language defined by the register vector grammar.
-
-
Specification