×

Device and method for term set expansion based on semantic similarity

  • US 9,268,821 B2
  • Filed: 02/22/2012
  • Issued: 02/23/2016
  • Est. Priority Date: 03/04/2011
  • Status: Active Grant
First Claim
Patent Images

1. A set expansion processing device comprising:

  • a receiver for receiving a seed string from a user;

    a searcher for ordering a search engine to search, with the seed string, a first set of documents containing the seed string and generate snippets from the first set of documents received from the search engine;

    a segment acquirer for generating segments composed of strings by partitioning the generated snippets, including the seed string, using one or more predetermined segment partition strings, wherein the strings composing the segments are arranged in order of appearance;

    a segment component acquirer for generating segment components by partitioning each of the generated segments using one or more predetermined segment component partition strings;

    a segment score computer for computing a segment score for each of the generated segments based on the variance or the standard deviation from the mean value of the lengths of the segment components appearing in their corresponding segments;

    a segment component score computer for computing a segment component score for each of the segment components contained in each of the generated segments, based on a distance between the position of the seed string and the position of each corresponding segment component in the segment in which the corresponding segment component appears, and further based on the segment score computed for the segment in which the corresponding segment component appears;

    a selector for selecting, from the segment components, instance candidates as part of an expanded set of terms contained in the same semantic category as the seed string based on the computed segment component score for each of the generated segment components, wherein the instance candidates include the seed string; and

    an extractor for;

    ordering the search engine to search, using the instance candidates, a second set of documents containing the instance candidates and generate additional snippets from the second set of documents received from the search engine;

    generating a connection graph indicating n-grams connected to each of the instance candidates from the additional snippets by searching using the instance candidates;

    computing a semantic similarity between the seed string and the instance candidates based on a left-side context similarity between n-grams followed by the seed string and n-grams followed by each of the instance candidates in the connection graph, and based on a right-side context similarity between n-grams following the seed string and n-grams following each of the instance candidates in the connection graph; and

    extracting an instance that should be contained in the expanded set of terms from the instance candidates based on the semantic similarity,wherein, when the searcher orders the search engine to search, with the same semantic category as the seed string, the search engine outputs a third set of documents containing the expanded set of terms, including the extracted instance.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×