×

Automatic segmentation of a text

  • US 6,374,210 B1
  • Filed: 11/24/1999
  • Issued: 04/16/2002
  • Est. Priority Date: 11/30/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of segmenting a connected text into words, including the steps of:

  • reading an input string representing the connected text;

    identifying at least one sequence of isolated words in the input string by comparing the input string to words in a dictionary; and

    outputting at least one of the identified word sequences;

    characterized in that the step of identifying at least one word sequence includes building a tree structure representing word sequences) in the input string in an iterative manner by;

    taking the input string as a working string;

    for each word of a dictionary;

    comparing the word with a beginning of the working string; and

    if the word matches the beginning of the working string;

    forming a node in the tree representing the word;

    associating with the node a part of the input string which starts at a position immediately adjacent an end position of the word; and

    forming a sub-tree, linked to the node, representing word sequence(s) in the part of the input string associated with the node by using the associated part as the working string;

    wherein in dependence on a predetermined criterion deciding whether new words are to be added to the tree structure;

    if new words are to be added;

    selecting at least one node in the tree whose associated word is to be followed by new words;

    forming a plurality of new words;

    each of the new words matching a beginning of the input string part associated with the selected node and consisting of a different number of characters;

    for each formed new word forming a respective sub-tree linked to the selected node;

    each sub-tree representing word sequence(s) starting with the respective new word in the input string part associated with the selected node.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×