Please download the dossier by clicking on the dossier button x
×

Method and apparatus for efficient segmentation of compound words using probabilistic breakpoint traversal

  • US 20030097252A1
  • Filed: 10/18/2001
  • Published: 05/22/2003
  • Est. Priority Date: 10/18/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method for segmenting compound words in an unrestricted natural-language input, the method comprising:

  • receiving a natural-language input consisting of a plurality of characters;

    constructing a set of probabilistic breakpoints in the natural-language input based on probabilistic breakpoint analysis;

    identifying a plurality of linkable components by traversal of substrings of the natural-language input delimited by the set of probabilistic breakpoints; and

    returning a segmented string consisting of a plurality of linkable components spanning the natural-language input, wherein the segmented string is interpretable as a compound word.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×