×

Methods and systems relating to information extraction

  • US 8,280,719 B2
  • Filed: 04/24/2006
  • Issued: 10/02/2012
  • Est. Priority Date: 05/05/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of training an information extraction system comprising:

  • employing a first corpus of annotated text;

    automatically, using a first computer, extracting information from a second corpus of unannotated text, automatically extracting information from the second corpus comprising parsing the second corpus of text based on relative positions of words in the second corpus and generating a hierarchical cluster tree indicative thereof, the cluster tree containing word groups as hierarchical groups, which have decreasingly similar usage statistics as the groups increase in size, wherein the information automatically extracted from the second corpus comprises information indicative of relative word positions in sentences in the second corpus;

    automatically, using a second computer, populating a discriminative information extraction model based on the information extracted from the second corpus of unannotated text and information extracted from the first corpus of annotated text;

    automatically, using a third computer, identifying from at least one of the first corpus, the second corpus, and a third corpus one or more word strings including words having an ambiguous relationship and providing the one or more word strings to a trainer for annotation or to an information extraction system previously trained with sufficient information to accurately annotate the one or more word strings; and

    automatically, using a fourth computer, updating the discriminative information extraction model based on annotations to the one or more word strings provided by the trainer or by the information system previously trained.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×