Transliteration pair matching
First Claim
1. An orthographic method for transliteration pair matching, said method comprising:
- extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a first language to obtain a first orthographic feature sequence set;
extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a second language to obtain a second orthographic feature sequence set, said digital representation of said proper name in said first language and said digital representation of said proper name in said second language comprising a transliteration pair;
comparing said first and second orthographic feature sequence sets to determine a similarity score, based on a similarity model comprising a plurality of conditional probabilities of known orthographic feature sequences in said first language given known orthographic feature sequences in said second language and a plurality of conditional probabilities of known orthographic feature sequences in said second language given known orthographic feature sequences in said first language; and
based on at least one threshold value, determining whether said transliteration pair belong to an identical actual proper name.
1 Assignment
0 Petitions
Accused Products
Abstract
Feature sequences are extracted, as individual letters separated by spaces, from a digital representation of a proper name in a first language to obtain a first orthographic feature sequence set; and from a digital representation of a proper name in a second language to obtain a second orthographic feature sequence set. The first and second orthographic feature sequence sets (a transliteration pair) are compared to determine a similarity score, based on a similarity model including a plurality of conditional probabilities of known orthographic feature sequences in the first language given known orthographic feature sequences in the second language and a plurality of conditional probabilities of known orthographic feature sequences in the second language given known orthographic feature sequences in the first language. Based on at least one threshold value, it is determined whether the transliteration pair belong to an identical actual proper name.
35 Citations
25 Claims
-
1. An orthographic method for transliteration pair matching, said method comprising:
-
extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a first language to obtain a first orthographic feature sequence set; extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a second language to obtain a second orthographic feature sequence set, said digital representation of said proper name in said first language and said digital representation of said proper name in said second language comprising a transliteration pair; comparing said first and second orthographic feature sequence sets to determine a similarity score, based on a similarity model comprising a plurality of conditional probabilities of known orthographic feature sequences in said first language given known orthographic feature sequences in said second language and a plurality of conditional probabilities of known orthographic feature sequences in said second language given known orthographic feature sequences in said first language; and based on at least one threshold value, determining whether said transliteration pair belong to an identical actual proper name. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer readable medium comprising computer executable instructions which when executed by a computer cause the computer to perform a method for transliteration pair matching, the method comprising the steps of:
-
extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a first language to obtain a first orthographic feature sequence set; extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a second language to obtain a second orthographic feature sequence set, said digital representation of said proper name in said first language and said digital representation of said proper name in said second language comprising a transliteration pair; comparing said first and second orthographic feature sequence sets to determine a similarity score, based on a similarity model comprising a plurality of conditional probabilities of known orthographic feature sequences in said first language given known orthographic feature sequences in said second language and a plurality of conditional probabilities of known orthographic feature sequences in said second language given known orthographic feature sequences in said first language; and based on at least one threshold value, determining whether said transliteration pair belong to an identical actual proper name. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus for transliteration pair matching comprising:
-
a memory; and at least one processor, coupled to said memory, and operative to; extract feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a first language to obtain a first orthographic feature sequence set; extract feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a second language to obtain a second orthographic feature sequence set, said digital representation of said proper name in said first language and said digital representation of said proper name in said second language comprising a transliteration pair; compare said first and second orthographic feature sequence sets to determine a similarity score, based on a similarity model comprising a plurality of conditional probabilities of known orthographic feature sequences in said first language given known orthographic feature sequences in said second language and a plurality of conditional probabilities of known orthographic feature sequences in said second language given known orthographic feature sequences in said first language; and based on at least one threshold value, determine whether said transliteration pair belong to an identical actual proper name. - View Dependent Claims (21, 22, 23, 24)
-
-
25. An apparatus for transliteration pair matching comprising:
-
means for extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a first language to obtain a first orthographic feature sequence set; means for extracting feature sequences, as individual letters separated by spaces, from a digital representation of a proper name in a second language to obtain a second orthographic feature sequence set, said digital representation of said proper name in said first language and said digital representation of said proper name in said second language comprising a transliteration pair; means for comparing said first and second orthographic feature sequence sets to determine a similarity score, based on a similarity model comprising a plurality of conditional probabilities of known orthographic feature sequences in said first language given known orthographic feature sequences in said second language and a plurality of conditional probabilities of known orthographic feature sequences in said second language given known orthographic feature sequences in said first language; and means for, based on at least one threshold value, determining whether said transliteration pair belong to an identical actual proper name.
-
Specification