×

Record boundary identification and extraction through pattern mining

  • US 7,606,816 B2
  • Filed: 07/28/2005
  • Issued: 10/20/2009
  • Est. Priority Date: 06/03/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of encoding data, the method comprising:

  • locating, by a computer in a set of data, one or more primary data items that match one or more patterns in a set of specified patterns, wherein the patterns in the set of specified patterns correspond to symbols;

    for each primary data item of the one or more primary data items, performing steps comprising;

    determining the symbol that is associated with the pattern that the primary data item matches, andreplacing the primary data item in the set of data with the symbol;

    after performing the replacing, locating, in the set of data, one or more secondary data items that existed in the set of data prior to the replacing;

    for each secondary data item of the one or more secondary data items, performing particular steps comprising;

    generating a hash value based on the secondary data item; and

    replacing the secondary data item in the set of data with a symbol that is associated with the hash value;

    wherein the set of data corresponds to a first document that contains multiple records; and

    determining boundaries for each record in the first document based on symbols that have replaced the secondary data items in the set of data.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×