×

Methods and systems for selecting a language for text segmentation

  • US 8,306,808 B2
  • Filed: 08/08/2011
  • Issued: 11/06/2012
  • Est. Priority Date: 09/30/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method comprising:

  • accessing, by a computer system, a string of characters that are associated with a computing device;

    identifying, by the computer system, a plurality of candidate languages for segmenting the string of characters, wherein the plurality of candidate languages are identified based on one or more language indicators associated with the string of characters or the computing device;

    determining weights for the plurality of candidate languages based on the one or more language indicators, wherein each of the weights indicates a probability that a corresponding candidate language from the plurality of candidate languages is an appropriate language to use for interpreting the string of characters based on the string of characters or the computing device;

    determining one or more segmented results from the string of characters for each of the plurality of candidate languages, wherein a segmented result comprises a plurality of tokens that are created by inserting one or more breaks into the string of characters;

    identifying, from the plurality of candidate languages, an operable language for the string of characters based, at least in part, on a comparison of weighted frequencies associated with the candidate languages, wherein each of the weighted frequencies comprises a frequency with which the segmented results occur in a corpus associated with a corresponding candidate language, the frequency being weighted according to a corresponding weight from the determined weights that is associated with the corresponding candidate language; and

    providing information that identifies the operable language.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×