METHOD AND SYSTEM FOR AUTOMATICALLY GENERATING REGULAR EXPRESSIONS FOR RELAXED MATCHING OF TEXT PATTERNS
First Claim
1. A computer-implemented method of automatically generating regular expressions for relaxed matching of text patterns, comprising:
- receiving, by a computing system, an input phrase expressed in a natural language;
determining, by said computing system, that said input phrase is a plain text pattern;
automatically tokenizing, by said computing system, said plain text pattern, wherein said automatically tokenizing includes automatically generating a first token list;
automatically applying, by said computing system, one or more rules to said first token list, wherein said automatically applying includes automatically modifying said first token list and automatically generating a modified token list in response to said automatically modifying said first token list; and
automatically converting, by said computing system, said modified token list into a regular expression, wherein said regular expression matches said plain text pattern and one or more variations of said plain text pattern.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for automatically generating regular expressions for relaxed matching of text patterns. A received input phrase expressed in a natural language is determined to be a plain text pattern. The plain text pattern is automatically tokenized, thereby generating a first token list. Rules loaded from a predefined rule set are automatically applied to the first token list to automatically generate a modified token list. The order of the rules being applied to the first token list is specified by the rule set. The modified token list is automatically converted into a regular expression that matches the plain text pattern and one or more variations of the plain text pattern. A utilization of the regular expression for an information extraction facilitates a recall and a precision of the information extraction.
-
Citations
20 Claims
-
1. A computer-implemented method of automatically generating regular expressions for relaxed matching of text patterns, comprising:
-
receiving, by a computing system, an input phrase expressed in a natural language; determining, by said computing system, that said input phrase is a plain text pattern; automatically tokenizing, by said computing system, said plain text pattern, wherein said automatically tokenizing includes automatically generating a first token list; automatically applying, by said computing system, one or more rules to said first token list, wherein said automatically applying includes automatically modifying said first token list and automatically generating a modified token list in response to said automatically modifying said first token list; and automatically converting, by said computing system, said modified token list into a regular expression, wherein said regular expression matches said plain text pattern and one or more variations of said plain text pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product, comprising a computer usable medium having a computer readable program code embodied therein, said computer readable program code containing instructions that when executed by a processor of a computing system implement a method for automatically generating regular expressions for relaxed matching of text patterns, said method comprising:
-
receiving an input phrase expressed in a natural language; determining that said input phrase is a plain text pattern; automatically tokenizing said plain text pattern, wherein said automatically tokenizing includes automatically generating a first token list; automatically applying one or more rules to said first token list, wherein said automatically applying includes automatically modifying said first token list and automatically generating a modified token list in response to said automatically modifying said first token list; and automatically converting said modified token list into a regular expression, wherein said regular expression matches said plain text pattern and one or more variations of said plain text pattern. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification