×

Search-based word segmentation method and device for language without word boundary tag

  • US 8,131,539 B2
  • Filed: 03/07/2008
  • Issued: 03/06/2012
  • Est. Priority Date: 03/07/2007
  • Status: Active Grant
First Claim
Patent Images

1. A search-based word segmentation method for a language without a word boundary tag, comprising the steps of:

  • a. providing at least one search engine with a segment of a text comprising at least one segment;

    b. searching for the segment through the at least one search engine, and returning search results each including candidate word segmentation units; and

    c. determining a word segmentation approach for the segment in accordance with at least part of the returned search results by performing steps of;

    extracting, from the at least part of the returned search results, all candidate word segmentation units appearing in the segment;

    scoring the extracted candidate word segmentation units;

    ranking subsets of extracted candidate word segmentation units in accordance with the scoring, wherein the candidate word segmentation units in each subset sequentially form the segment; and

    selecting a highest-ranked subset as the word segmentation approach for the segment.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×