×

Extracting semantic classes and instances from text

  • US 8,510,308 B1
  • Filed: 06/16/2010
  • Issued: 08/13/2013
  • Est. Priority Date: 06/16/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by data processing apparatus, the method comprising:

  • receiving a collection of text;

    identifying a first collection of instance-class pairs for the collection of text, wherein the first collection of instance-class pairs are identified by applying one or more template patterns to a collection of documents;

    clustering a collection of semantically similar phrases using the collection of text;

    determining, for each class in the first collection of instance-class pairs;

    whether a threshold number of instances within a cluster in the semantically similar phrase clusters are labeled by the class, andwhether a threshold number of clusters in the semantically similar phrase clusters include at least one instance that is labeled by the class;

    in response to determining that a threshold number of instances within a cluster are labeled by a class and a threshold number of clusters in the semantically similar phrase clusters include at least one instance that is labeled by the class, selecting each instance in the first collection of instance-class pairs that are labeled by the class to be included in a second collection of instance-class pairs; and

    storing the second collection of instance-class pairs for use in information retrieval.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×