SHORT TEXT LANGUAGE DETECTION USING GEOGRAPHIC INFORMATION
First Claim
1. A computer-implemented method comprising:
- determining a particular language based at least in part on both content of text submitted by a user and at least one of;
(a) whether the text is encoded in ASCII, (b) a top-level domain of a particular URL, (c) a source IP address that is associated with the user, and (d) whether characters of the text are in a specified subset of Unicode; and
presenting, to the user, one or more content items that are associated with the particular language.wherein the content of the text does not expressly state the particular language.
3 Assignments
0 Petitions
Accused Products
Abstract
A content-providing entity receives a relatively short text from a user and attempts to determine, automatically, based on that short text (and on other available clues), a language that the user can read and understand. The content-providing entity may then provide, to the user, documents that are written in the determined language. The content-providing entity may determine a language of the input text based on several factors in combination: (a) the service provider'"'"'s “market,” which is determined based on at least a portion of the URL of the Internet site to which the user directed his browser; (b) the user'"'"'s “region,” which is determined based on the source Internet Protocol (IP) address of the IP packets that the user sends to the Internet site; (c) the “script” in which the short user-entered text is written; and (d) a statistical analysis of the frequency of the characters present in the short user-entered text.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
determining a particular language based at least in part on both content of text submitted by a user and at least one of;
(a) whether the text is encoded in ASCII, (b) a top-level domain of a particular URL, (c) a source IP address that is associated with the user, and (d) whether characters of the text are in a specified subset of Unicode; andpresenting, to the user, one or more content items that are associated with the particular language. wherein the content of the text does not expressly state the particular language. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A volatile or non-volatile computer-readable storage medium storing one or more instructions which, when executed by one or more processors, cause the one or more processors to perform steps comprising:
-
determining a particular language based at least in part on both content of text submitted by a user and at least one of;
(a) whether the text is encoded in ASCII, (b) a top-level domain of a particular URL, (c) a source IP address that is associated with the user, and (d) whether characters of the text are in a specified subset of Unicode; andpresenting, to the user, one or more content items that are associated with the particular language. wherein the content of the text does not expressly state the particular language. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification