×

Method and system for matching data sets of non-standard formats

  • US 8,090,725 B1
  • Filed: 04/16/2010
  • Issued: 01/03/2012
  • Est. Priority Date: 01/13/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method for comparing a plurality of resumes, including the following steps:

  • receiving a first resume from a database stored on a computer;

    parsing the first resume into bands based on a predefined setting;

    generating a word array by parsing the text in each band into separate parsed words and storing each of the parsed words in the word array, wherein the word array includes separate rows for each of the parsed words and a column populated with information that is indicative of the band associated with each of the parsed words;

    standardizing each of the words contained in the word array, by iteratively correcting punctuation, replacing well-known abbreviations, or removing common words;

    generating an attribute array by iteratively comparing each of the parsed words contained in the word array to attributes in an attribute dictionary and adding each of the parsed words that match one of the attributes in the attribute dictionary to the attribute array, wherein the attribute array includes information regarding the number of times each of the attributes occurs within the first resume and information indicative of the band in which each of the attributes was first found;

    identifying root attributes based on the number of times in which attributes or multi-word attributes occur within the first resume and counting the number of occurrences of each root attribute in the first resume;

    identifying leaf attributes that are related to root attributes and counting the number of occurrences of each leaf attribute in the first resume;

    generating a first metric indicative of the significance of each of the attributes in the attribute array to the first resume;

    generating a second metric indicative of the significance of each of the root attributes to the first resume;

    generating a third metric indicative of the significance of each of the leaf attributes to the associated root attribute;

    weighting the first, second and third metrics, wherein for each attribute, the first, second and third metrics are one of the bands indicating a relative position of the attribute within the first resume, a number of occurrences of each of the associated root attributes in the first resume, and a support value indicating a relationship to each of the associated root attributes or a combination of the three metrics;

    ranking the attributes based on a weighted value of the first metric for each of the attributes;

    ranking the root attributes based on a weighted value of the second metric for each of the root attributes;

    ranking the leaf attributes based on a weighted value of the third metric for each of the leaf attributes;

    generating a profile for the first resume based on the rank of the attributes, the root attributes and the leaf attributes;

    selecting one or more additional resumes for comparison and generating profiles for the additional resumes by using the same steps that were employed to generate the profile for the first resume;

    comparing the profile for the first resume with the profiles for the additional resumes; and

    ranking the profiles for the additional resumes based on how closely the profiles for the additional resumes match the profile for the first resume.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×