Identifying language of origin for words using estimates of normalized appearance frequency
First Claim
Patent Images
1. A method of identifying a language of origin of an input word, comprising:
- generating a wide area network query based on the input word to obtain search results in a plurality of different languages;
estimating a frequency of occurrence of the input word in each of the different languages based on the search results;
identifying the language of origin of the input word based on the estimated frequencies of occurrence, andoutputting an indication of the language of origin.
2 Assignments
0 Petitions
Accused Products
Abstract
The language of origin of a word or named entity is predicted using estimates of frequency of occurrence of the word or named entity in different languages. In one embodiment, the normalized frequency of occurrence of the word or named entity in a variety of different languages is estimated and the values are used as features in a feature vector which is scored and used to identify language of origin.
65 Citations
19 Claims
-
1. A method of identifying a language of origin of an input word, comprising:
-
generating a wide area network query based on the input word to obtain search results in a plurality of different languages; estimating a frequency of occurrence of the input word in each of the different languages based on the search results; identifying the language of origin of the input word based on the estimated frequencies of occurrence, and outputting an indication of the language of origin. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for identifying a language of origin of an input word, comprising:
-
a frequency of occurrence estimation system configured to estimate a frequency of occurrence of the input word in each of a plurality of different languages; and a language identifier configured to identify the language of origin of the input word based on the frequency of occurrence estimated. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer readable medium storing computer readable instructions which, when executed by a computer cause the computer to implement a feature extraction system comprising:
a normalized frequency of occurrence estimation system configured to receive an input word, access contents over a network and extract frequency of occurrence features indicative of a normalized frequency of occurrence of the input word in a plurality of different languages, the frequency of occurrence features being used to identify a language of origin of the input word. - View Dependent Claims (18, 19)
Specification