Region-Matching Transducers for Natural Language Processing
First Claim
1. A computer implemented method, comprising:
- recording in a memory input data having delimited strings;
recording in the memory a region-matching transducer defining one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks;
the plurality of class-matching networks defining a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes;
the region-matching transducer (i) having, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and (ii) sharing states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap;
applying the region-matching transducer recorded in the memory to the input data with an apply-stage replacement method, which apply-stage replacement method follows a longest match principle for identifying one or more patterns in the region-matching transducer that match one or more sequences of delimited strings in the input data;
at least one of the matching sequences of delimited strings satisfying at least one pattern in the region-matching transducer defined by an arrangement of a plurality of class-matching networks; and
recording in the memory, in response to said applying, the one or more sequences of delimited strings in the input data matching the one or more patterns in the region-matching transducer.
1 Assignment
0 Petitions
Accused Products
Abstract
Computer methods, apparatus and articles of manufacture therefor, are disclosed for developing a region-matching transducer for marking language data having delimited strings. The region-matching transducer defines one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks. The plurality of class-matching networks defines a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes. The region-matching transducer has, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and shares states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap.
-
Citations
20 Claims
-
1. A computer implemented method, comprising:
-
recording in a memory input data having delimited strings; recording in the memory a region-matching transducer defining one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks;
the plurality of class-matching networks defining a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes;
the region-matching transducer (i) having, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and (ii) sharing states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap;applying the region-matching transducer recorded in the memory to the input data with an apply-stage replacement method, which apply-stage replacement method follows a longest match principle for identifying one or more patterns in the region-matching transducer that match one or more sequences of delimited strings in the input data;
at least one of the matching sequences of delimited strings satisfying at least one pattern in the region-matching transducer defined by an arrangement of a plurality of class-matching networks; andrecording in the memory, in response to said applying, the one or more sequences of delimited strings in the input data matching the one or more patterns in the region-matching transducer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer apparatus, comprising:
-
a memory for storing processing instructions of the apparatus; and a processor coupled to the memory for executing the processing instructions of the apparatus;
the processor in executing the processing instructions;recording in the memory input data having delimited strings; recording in the memory a region-matching transducer defining one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks;
the plurality of class-matching networks defining a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes;
the region-matching transducer (i) having, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and (ii) sharing states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap;applying the region-matching transducer recorded in the memory to the input data with an apply-stage replacement method, which apply-stage replacement method follows a longest match principle for identifying one or more patterns in the region-matching transducer that match one or more sequences of delimited strings in the input data;
at least one of the matching sequences of delimited strings satisfying at least one pattern in the region-matching transducer defined by an arrangement of a plurality of class-matching networks; andrecording in the memory, in response to said applying, the one or more sequences of delimited strings in the input data matching the one or more patterns in the region-matching transducer. - View Dependent Claims (16)
-
-
17. An article of manufacture comprising computer usable media including computer readable instructions embedded therein that causes a computer to perform a method, wherein the method comprises:
-
recording in a memory input data having delimited strings; recording in the memory a region-matching transducer defining one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks;
the plurality of class-matching networks defining a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes;
the region-matching transducer (i) having, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and (ii) sharing states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap;applying the region-matching transducer recorded in the memory to the input data with an apply-stage replacement method, which apply-stage replacement method follows a longest match principle for identifying one or more patterns in the region-matching transducer that match one or more sequences of delimited strings in the input data;
at least one of the matching sequences of delimited strings satisfying at least one pattern in the region-matching transducer defined by an arrangement of a plurality of class-matching networks; andrecording in the memory, in response to said applying, the one or more sequences of delimited strings in the input data matching the one or more patterns in the region-matching transducer. - View Dependent Claims (18)
-
-
19. A computer apparatus, comprising:
-
a memory for recording input data having delimited strings; a region-matching transducer defining one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks;
the plurality of class-matching networks defining a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes;
the region-matching transducer (i) having, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and (ii) sharing states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap;an FST engine for applying the region-matching transducer recorded in the memory to the input data with an apply-stage replacement method, which apply-stage replacement method follows a longest match principle for identifying one or more patterns in the region-matching transducer that match one or more sequences of delimited strings in the input data;
at least one of the matching sequences of delimited strings satisfying at least one pattern in the region-matching transducer defined by an arrangement of a plurality of class-matching networks; andwherein the memory records the one or more sequences of delimited strings in the input data matching the one or more patterns in the region-matching transducer. - View Dependent Claims (20)
-
Specification