Systems and methods for searching using queries written in a different character-set and/or language from the target pages
First Claim
Patent Images
1. A method comprising:
- identifying a first set of anchor text written in a first format and containing a given term;
identifying a set of documents to which the first set of anchor text points;
identifying a second set of anchor text written in a second format and pointing to the identified set of documents;
analyzing the second set of anchor text to determine that a representation of the given term in the first format corresponds to a representation of the given term in the second format.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. Queries can be expressed using character sets and/or languages that are different from the character set and/or language of at least some of the data that is to be searched. A translation between these character sets and/or languages can be performed by examining the use of terms in aligned text. Probabilities can be associated with each possible translation. Refinements can be made to these probabilities by examining user interactions with the search results.
270 Citations
45 Claims
-
1. A method comprising:
-
identifying a first set of anchor text written in a first format and containing a given term;
identifying a set of documents to which the first set of anchor text points;
identifying a second set of anchor text written in a second format and pointing to the identified set of documents;
analyzing the second set of anchor text to determine that a representation of the given term in the first format corresponds to a representation of the given term in the second format. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A search method comprising:
-
obtaining a query written in a first format from a user;
translating the query into a second format using a probabilistic dictionary, the probabilistic dictionary mapping terms from the first format to the second format;
searching a database for information responsive to the translated query; and
returning search results written in the second format to the user. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for creating a probabilistic dictionary, the probabilistic dictionary mapping terms in a first format to terms in a second format, the method comprising:
-
for a given term, identifying a first set of data in the first format that contains the term;
identifying a second set of data in the second format that is aligned with the first set of data; and
analyzing the second set of data to determine one or more probabilities with which the given term maps onto one or more terms in the second set of data. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer program product embodied on a computer-readable medium, the computer program product including instructions, which when executed by a computer system, are operable to cause the computer system to perform acts comprising:
-
identifying a first set of anchor text written in a first format and containing a given term;
identifying a set of web pages to which the first set of anchor text points;
identifying a second set of anchor text written in a second format and pointing to the identified set of web pages;
determining a probability that a representation of the given term in the first format corresponds to a representation of the given term in the second format. - View Dependent Claims (31, 32, 33)
-
-
34. A translation method comprising:
-
identifying a first body of text written in a first format;
identifying a second body of text written in a second format, the second body of text being aligned with the first body of text;
creating a dictionary of translations between terms in the first body of text and terms in the second body of text by comparing the occurrence of terms in the first body of text with the occurrence of terms in the second body of text. - View Dependent Claims (35, 36, 37, 38)
-
-
39. A method comprising:
-
receiving a query containing at least one query term written in a first format;
translating the query term into a plurality of variants written in a second format; and
using one or more of the variants to search for information written in the second format that is responsive to the query. - View Dependent Claims (40, 41, 42, 43)
-
-
44. A method comprising:
-
receiving a numeric query entered from a telephone keypad;
translating the numeric query into a group of potential alphanumeric translations in a first format;
discarding potential translations that are determined to include predefined low-probability character combinations;
translating the remaining alphanumeric translations from the first format to a second format using a probabilistic dictionary; and
performing a search using the alphanumeric translations in the second format. - View Dependent Claims (45)
-
Specification