×

Methods and systems for improving text segmentation

  • US 7,680,648 B2
  • Filed: 09/30/2004
  • Issued: 03/16/2010
  • Est. Priority Date: 09/30/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • receiving a string of characters that comprises a plurality of characters with no token-delineating breaks;

    segmenting the string of characters into a first segmented result that comprises a first plurality of tokens and at least one break, wherein the first plurality of tokens includes all of the plurality of characters;

    segmenting the string of characters into a second segmented result that comprises a second plurality of tokens and at least one break, wherein the second plurality of tokens includes all the plurality of characters, and wherein the second plurality of tokens is different than the first plurality of tokens;

    determining a first frequency of occurrence for the first segmented result in a corpus and a second frequency of occurrence for the second segmented result in the corpus by providing the first segmented result and second segmented result to a search engine and receiving in response from the search engine the first frequency of occurrence for the first segmented result and the second frequency of occurrence for the second segmented result;

    comparing the first frequency of occurrence for the first result to the second frequency of occurrence for the second segmented result;

    selecting the first segmented result as an operable segmented result for the received string of characters when the first frequency of occurrence for the first request is determined to exceed a determined value relative to the second frequency of occurrence for the second result; and

    providing the operable segmented result for further processing.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×