×

Automatic segmentation of texts comprising chunks without separators

  • US 7,536,296 B2
  • Filed: 05/28/2003
  • Issued: 05/19/2009
  • Est. Priority Date: 05/28/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method of segmenting into chunks syntagms of a text including individual elements written without separators, said chunks comprising strings including at least one of said individual elements, including the steps of:

  • defining a lexicon including a set of strings, each string comprising at least one of said individual elements, wherein the strings in said lexicon are at least partly representative of said chunks, the lexicon also including a dynamic lexicon and a static lexicon;

    orderly searching the syntagm being segmented on an element-by-element basis by searching, within said lexicon, strings corresponding to any of said chunks, wherein, in the case of a positive search result, the corresponding chunk located is stored with an associated cost;

    checking whether the chunk located was already present in at least the dynamic lexicon and, in the case where the chunk located was already present, reducing the cost associated therewith;

    storing in a computer memory, as a result of said orderly searching, a plurality of candidate segmentation sequences, each corresponding to a respective segmentation pattern and having an associated corresponding accrued cost;

    selecting as the final result of segmentation the candidate sequence having the lowest associated accrued cost, andincreasing said associated cost by a constant value at each new step in said searching on an element-by-element basis.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×