Identifying language of origin for words using estimates of normalized appearance frequency

  • US 7,689,408 B2
  • Filed: 09/01/2006
  • Issued: 03/30/2010
  • Est. Priority Date: 09/01/2006
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A method of identifying a language of origin of an input word, using a computer with a processor, comprising:

  • generating a wide area network query based on the input word to obtain, with the processor, search results, comprising web pages, in a plurality of different languages;

    estimating, with the processor, a normalized frequency of occurrence of the input word in each of the different languages based on the search results;

    identifying, with the processor, the language of origin of the input word based on the estimated frequencies of occurrence, andoutputting an indication of the language of origin;

    wherein the search results comprise web pages and wherein estimating a normalized frequency of occurrence in a selected language comprises;

    obtaining a count of a number of web pages in the selected language in the search results that contain the input word; and

    estimating a total number of web pages in the selected language by generating a wide area network query based on one or more function words in the selected language to obtain function word search results, and estimating the total number of web pages based on the function word search result.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×