×

System and iterative method for lexicon, segmentation and language model joint optimization

  • US 6,904,402 B1
  • Filed: 06/30/2000
  • Issued: 06/07/2005
  • Est. Priority Date: 11/05/1999
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • developing an initial language model from a lexicon and segmentation derived from a received corpus; and

    iteratively refining the initial language model by dynamically updating the lexicon and re-segmenting the corpus according to statistical principles until a threshold of predictive capability is achieved;

    wherein;

    iteratively refining the language model comprises;

    re-segmenting the corpus by determining, for each segment, a probability of occurrence for that segment; and

    updating the lexicon from the re-segmented corpus;

    updating the lexicon comprises;

    identifying a frequency of occurrence for each word of a lexicon in the received corpus; and

    deleting the word with the smallest identified frequency from the lexicon; and

    the method further comprises re-segmenting the deleted word into two or more smaller words and updating the lexicon with the re-segmented words.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×