Method and apparatus for mapping multiword expressions to identifiers using finite-state networks
First Claim
1. A method for mapping multiword expressions to identifiers using finite-state networks, comprising:
- encoding each of a plurality of multiword expressions into a regular expression;
each regular expression encoding a base form common to a plurality of derivative forms defined by ones of the multiword expressions;
compiling with factorization each of the plurality of regular expressions into a set of finite-state networks;
performing a union of the finite-state networks in the set of finite-state networks to define a multiword finite-state network and a set of subnets, each subset comprising a distinct standard network having an associated distinct start state and an associated final state comprising an entry and exit point, respectively, comprising arcs and states distinct from arcs and states of the finite-state network;
traversing the multiword finite-state network and the set of subnets to identify a path corresponding to one of the plurality of multiword expressions;
wherein said traversing accounts for only transitions originating from the multiword finite-state network to ascertain a path number identifying a base form of the one of the plurality of multiword expressions; and
wherein said factorization comprises inserting an arc with a label in the multiword finite-state network at each appearance of a repeating subnet in the set of subnets, where each label is a reference to a subnet in the set of subnets.
3 Assignments
0 Petitions
Accused Products
Abstract
Multiword expressions are mapped to identifiers using finite-state networks. Each of a plurality of multiword expressions is encoded into a regular expression. Each regular expression encodes a base form common to a plurality of derivative forms defined by ones of the multiword expressions. Each of the plurality of regular expressions is compiled with factorization into a set of finite-state networks. A union of the finite-state networks in the set of finite-state networks is performed to define a multiword finite-state network and a set of subnets. The multiword finite-state network and the set of subnets are traversed to identify a path corresponding to one of the plurality of multiword expressions, wherein only transitions originating from the multiword finite-state network are accounted for to ascertain a path number identifying a base form of the one of the plurality of multiword expressions.
-
Citations
20 Claims
-
1. A method for mapping multiword expressions to identifiers using finite-state networks, comprising:
-
encoding each of a plurality of multiword expressions into a regular expression;
each regular expression encoding a base form common to a plurality of derivative forms defined by ones of the multiword expressions;compiling with factorization each of the plurality of regular expressions into a set of finite-state networks; performing a union of the finite-state networks in the set of finite-state networks to define a multiword finite-state network and a set of subnets, each subset comprising a distinct standard network having an associated distinct start state and an associated final state comprising an entry and exit point, respectively, comprising arcs and states distinct from arcs and states of the finite-state network; traversing the multiword finite-state network and the set of subnets to identify a path corresponding to one of the plurality of multiword expressions; wherein said traversing accounts for only transitions originating from the multiword finite-state network to ascertain a path number identifying a base form of the one of the plurality of multiword expressions; and wherein said factorization comprises inserting an arc with a label in the multiword finite-state network at each appearance of a repeating subnet in the set of subnets, where each label is a reference to a subnet in the set of subnets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for mapping multiword expressions to identifiers using finite-state networks, comprising:
-
means for encoding each of a plurality of multiword expressions into a regular expression;
each regular expression encoding a base form common to a plurality of derivative forms defined by ones of the multiword expressions;means for compiling with factorization each of the plurality of regular expressions into a set of finite-state networks; means for performing a union of the finite-state networks in the set of finite-state networks to define a multiword finite-state network and a set of subnets, each subset comprising a distinct standard network having an associated distinct start state and an associated final state comprising an entry and exit point, respectively, comprising arcs and states distinct from arcs and states of the finite-state network; means for traversing the multiword finite-state network and the set of subnets to identify a path corresponding to one of the plurality of multiword expressions; wherein said traversing means accounts for only transitions originating from the multiword finite-state network to ascertain a path number identifying a base form of the one of the plurality of multiword expressions; and wherein said factorization comprises inserting an arc with a label in the multiword finite-state network at each appearance of a repeating subnet in the set of subnet, where each label is a reference to a subnet in the set of subnets. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An article of manufacture for use in a machine, comprising:
-
a) a memory; b) instructions stored in the memory for mapping multiword expressions to identifiers using finite-state networks, the instructions adapted to perform a method comprising; encoding each of a plurality of multiword expressions into a regular expression;
each regular expression encoding a base form common to a plurality of derivative forms defined by ones of the multiword expressions;compiling with factorization each of the plurality of regular expressions into a set of finite-state networks; performing a union of the finite-state networks in the set of finite-state networks to define a multiword finite-state network and a set of subnets, each subset comprising a distinct standard network having an associated distinct start state and an associated final state comprising an entry and exit point, respectively, comprising arcs and states distinct from arcs and states of the finite-state network; traversing the multiword finite-state network and the set of subnets to identify a path corresponding to one of the plurality of multiword expressions; wherein said traversing accounts for only transitions originating from the multiword finite-state network to ascertain a path number identifying a base form of the one of the plurality of multiword expressions; and wherein said factorization comprises inserting an arc with a label in the multiword finite-state network at each appearance of a repeating subnet in the set of subnets, where each label is a reference to a subnet in the set of subnets. - View Dependent Claims (20)
-
Specification