×

Method and apparatus for matching misspellings caused by phonetic variations

  • US 9,594,742 B2
  • Filed: 07/17/2014
  • Issued: 03/14/2017
  • Est. Priority Date: 09/05/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying phonetic equivalents between words spoken in a natural source language by native speakers of the source language and words spoken by non-native speakers of the source language who natively speak a common natural second language, comprising the steps of:

  • a. receiving at a processor in communication with a computer-readable medium a first term and a second term, wherein each of the first term and second term comprises a character string stored on the computer-readable medium and at least one of the first and second terms is derived from a non-native speaker of the source language;

    b. tokenizing at the processor the first term and the second term to create a first tokenized set comprising a plurality of first tokens from the first term and a second tokenized set comprising a plurality of second tokens from the second term, wherein each of the first tokens and second tokens comprises at least one consonant or consonant placeholder, and at least one vowel or vowel placeholder;

    c. after the tokenizing step, comparing at the processor each first token from the first tokenized set with a corresponding second token from the second tokenized set to determine if the first tokenized set comprises an equal number of tokens as the second tokenized set;

    d. if the first tokenized set comprises an equal number of tokens as the second tokenized set, comparing the characters in each of the first tokens in the first tokenized set to the characters in the corresponding second token from the second tokenized set to determine if a match exists between the first term and the second term, wherein said comparison step is performed using a first compiled language library (CLL) comprising a set of equivalent consonant pairs and a set of equivalent vowel pairs, wherein an equivalence exists if the characters in each of the first tokens in the first tokenized set are identical to the characters in the corresponding second token from the second tokenized set or if the first tokens in the first tokenized set are equivalent to the characters in the corresponding second token from the second tokenized set, wherein consonant equivalencies and vowel equivalencies are found based on a phonetically identical pronunciation of such consonants and vowels by the non-native speakers of the source language who natively speak the common second language; and

    e. outputting from the processor an indicator of whether the first and second terms are phonetic equivalents.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×