×

METHOD FOR AUTOMATICALLY GENERATING REGULAR EXPRESSIONS FOR RELAXED MATCHING OF TEXT PATTERNS

  • US 20090070327A1
  • Filed: 09/06/2007
  • Published: 03/12/2009
  • Est. Priority Date: 09/06/2007
  • Status: Abandoned Application
First Claim
Patent Images

1. A computer-implemented method of automatically generating regular expressions for relaxed matching of text patterns, comprising:

  • loading, by a computing system, a predefined set of rules from a rule file in a repository coupled to said computing system, wherein each rule of said predefined set of rules is expressed in an Extensible Markup Language (XML) format;

    receiving, by a computing system, an input phrase expressed in a natural language;

    determining, by said computing system, that said input phrase is a plain text pattern, wherein said determining that said input phrase is said plain text pattern includes determining that said input phrase is not a regular expression;

    automatically tokenizing, by said computing system, said plain text pattern, wherein said automatically tokenizing includes automatically generating a first token list;

    automatically applying, by said computing system, one or more rules to said first token list, wherein said automatically applying includes applying said one or more rules in an order specified by said predefined set of rules, automatically modifying said first token list and automatically generating a modified token list in response to said automatically modifying said first token list, wherein said one or more rules are included in said predefined set of rules, wherein said automatically modifying said first token list includes applying a predefined modification operator to said first token list, wherein said predefined modification operator is included in a rule of said one or more rules, wherein said predefined modification operator is an operator selected from the group consisting of a replace word operator, a split-at-character operator, and a whitespace operator, wherein said automatically modifying said first token list further includes;

    replacing a sequence of one or more tokens in said first token list with a replacement regular expression specified by said rule if said predefined modification operator is said replace word operator,detecting a character specified by said rule and splitting a token of said first token list into two tokens in response to said detecting said character if said predefined modification operator is said split-at-character operator, andreplacing whitespace in said first token list with a replacement regular expression specified by said rule if said predefined modification operator is said whitespace operator; and

    automatically converting, by said computing system, said modified token list into a regular expression, wherein said regular expression matches said plain text pattern and one or more variations of said plain text pattern.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×