×

Statistical stemming

  • US 8,352,247 B2
  • Filed: 04/23/2012
  • Issued: 01/08/2013
  • Est. Priority Date: 11/05/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • obtaining a plurality of canonical suffix-rewriting rules, each rule associated with one or more words to which the rule applies, wherein each canonical suffix-rewriting rule specifies a suffix replacement;

    generating a suffix tree from the words associated with the canonical suffix-rewriting rules, the suffix tree comprising a plurality of nodes and a plurality of leaves arranged in a hierarchical structure, wherein each node has one or more children below the node in the tree and each child is either a node or a leaf, and wherein each leaf corresponds to a distinct one of the words associated with the canonical suffix-rewriting rules and each node corresponds to a suffix of its children in the tree;

    selecting, using one or more processors a minimum colored subset of the nodes and leaves in the suffix tree, wherein each node and leaf in the minimum colored subset is associated with a respective optimal canonical suffix-rewriting rule, wherein the minimum colored subset and optimal canonical suffix-rewriting rules are selected such that a derived canonical suffix-rewriting rule for each leaf matches the canonical suffix-rewriting rule associated with the word corresponding to the leaf, wherein the derived canonical suffix-rewriting rule for each leaf is the optimal canonical suffix-rewriting rule for a first colored ancestor of the leaf in the tree, and wherein the first colored ancestor for a leaf is the leaf, if the leaf is included in the minimum colored subset, and otherwise is a first node above the leaf in the tree that is in the minimum colored subset; and

    generating a plurality of final suffix-rewriting rules, each rule mapping a suffix of a node in the minimum colored subset to the suffix of the node with the suffix replacement specified by the canonical rule for the node.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×