×

Word detection

  • US 8,463,598 B2
  • Filed: 01/28/2011
  • Issued: 06/11/2013
  • Est. Priority Date: 08/23/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • determining, by one or more computers, first word frequencies for existing words and a candidate word in a training corpus, each of the existing words and the candidate word being one or more characters, the candidate word defined by a sequence of characters, wherein the sequence of characters define constituent words that are each an existing word in a dictionary;

    determining, by the one or more computers, second word frequencies for the constituent words and the candidate word in a development corpus;

    determining, by the one or more computers, a candidate word entropy-related measure based on the second word frequency of the candidate word and the first word frequencies of the constituent words and the candidate word;

    determining, by the one or more computers, an existing word entropy-related measure based on the second word frequencies of the constituent words and the first word frequencies of the constituent words and the candidate word; and

    determining, by the one or more computers, that the candidate word is a new word when the candidate word entropy-related measure exceeds the existing word entropy-related measure.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×