×

Statistical stemming

  • US 8,554,543 B2
  • Filed: 12/10/2012
  • Issued: 10/08/2013
  • Est. Priority Date: 11/05/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • receiving, for a canonical suffix-rewriting rule, a plurality of applicable words and a plurality of non-applicable words, wherein each applicable word is a word to which the rule should be applied, and each non-applicable word is a word to which the rule should not be applied, and wherein the canonical suffix-rewriting rule specifies a suffix replacement;

    generating a suffix tree from the applicable words and the non-applicable words, the suffix tree comprising a plurality of nodes and a plurality of leaves arranged in a hierarchical structure wherein each node has one or more nodes or one or more leaves below the node in the tree, wherein each leaf corresponds to either a unique applicable word or a unique non-applicable word, and each node corresponds to a suffix of its children in the tree;

    selecting, using one or more processors, a minimum colored subset of the nodes and leaves in the suffix tree, wherein each node and leaf in the minimum colored subset has an associated optimal status, wherein the minimum colored subset and the optimal status of each node and leaf in the minimum colored subset are selected such that a derived status for each leaf is valid when the leaf corresponds to an applicable word, and the derived status for each leaf is not valid when the leaf corresponds to a non-applicable word, wherein the derived status for each leaf is the optimal status for the leaf if the leaf is included in the minimum colored subset, and otherwise is the optimal status of a first node above the leaf in the tree that is in the minimum colored subset; and

    generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status, and maps the suffix of the node to the suffix of the node with the replacement specified by the suffix-rewriting rule.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×