×

Address extraction from a communication

  • US 10,013,672 B2
  • Filed: 11/02/2012
  • Issued: 07/03/2018
  • Est. Priority Date: 11/02/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • receiving a communication from a sender comprising a plurality of words, wherein at least one of the words is a zip code comprising five numerical digits;

    assigning, via a computing apparatus, a score to each of the words, wherein the score assigned to each of the words is based on a ratio of a first frequency of usage of the respective word in a language relative to a second frequency of usage of the respective word in the language, and wherein a first set of words comprises a first total number of words used as an address, a second set of words comprises a second total number of words including words used other than as an address, the first frequency is determined by counting occurrence of the respective word in the first set of words relative to the first total, the second frequency is determined by counting occurrence of the respective word in the second set relative to the second total, and the first total is less than the second total, and wherein the assigning the score further comprises determining a score for a numerical digit sequence based on treating any numerical digit sequence of a given digit length as being the same word;

    determining, via the computing apparatus, a respective total sum for each of a plurality of word sequences in the communication, the respective total sum determined as a sum of the scores for each word in the respective word sequence;

    identifying a first word sequence of the word sequences having a total sum that is greater than a threshold value;

    applying a at least one filter to the first word sequence, the at least one filter comprising determining a ratio of number tokens to character tokens in the first word sequence, and comparing the ratio to a predetermined value to determine whether the first word sequence passes the at least one filter, and the at least one filter further comprising determining whether the first word sequence includes a token that scores below a predetermined threshold, wherein determining that the first word sequence includes a token that scores below the predetermined threshold disqualifies the first word sequence from being identified as an address;

    in response to determining that the first word sequence passes the at least one filter, extracting the first word sequence from the plurality of words of the received communication as a first address of the sender, wherein the first word sequence contains the zip code; and

    storing, in a data repository, the first address in a first person profile of the sender, wherein the data repository stores a plurality of person profiles including the first person profile.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×