×

System for chinese tokenization and named entity recognition

  • US 6,311,152 B1
  • Filed: 02/17/2000
  • Issued: 10/30/2001
  • Est. Priority Date: 04/08/1999
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of tokenization and named entity recognition of ideographic language, said method including the steps of:

  • generating a word lattice for a string of ideographic characters using finite state grammars and a system lexicon, said finite state grammars are a dynamic and complementary extension of said lexicon for creating named entity hypotheses and said lexicon includes single ideographic characters, words, and predetermined features of said characters and words;

    generating segmented text by determining word boundaries in said string of ideographic characters using said word lattice dependent upon a contextual language model and one or more entity language models; and

    recognizing one or more named entities in said string of ideographic characters using said word lattice dependent upon said contextual language model and said one or more entity language models.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×