×

Multi-concept latent semantic analysis queries

  • US 9,015,160 B2
  • Filed: 12/14/2011
  • Issued: 04/21/2015
  • Est. Priority Date: 12/14/2011
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • one or more memory units; and

    one or more processing units operable to;

    access text;

    identify a plurality of terms from the text;

    determine a plurality of term vectors associated with the identified plurality of terms;

    calculate a weight of each of the determined plurality of term vectors;

    cluster the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first cluster related to a first concept of the text and a second cluster related to a second concept of the text, the first concept being distinct from the second concept, the first and second clusters each comprising two or more of the determined term vectors, the clustering comprising grouping two or more of the determined term vectors together based on the determined weights of the two or more term vectors and a distance between the two or more term vectors;

    create a first pseudo-document according to the first cluster;

    create a second pseudo-document according to the second cluster;

    identify, using latent semantic analysis (LSA) of the first pseudo-document, a first set of terms associated with the first cluster;

    identify, using LSA of the second pseudo-document, a second set of terms associated with the second cluster;

    determine a first weight associated with the first cluster and a second weight associated with the second cluster, wherein the first weight is based at least on the determined weights of the term vectors of the first cluster, and wherein the second weight is based at least on the determined weights of the term vectors of the second cluster;

    determine a first percentage of a list of output terms that should come from the first cluster and a second percentage of the list of output terms that should come from the second cluster, the first percentage based on a ratio of the first weight to a sum of the first and second weights, the second percentage based on a ratio of the second weight to the sum of the first and second weights;

    select one or more terms from the first set of terms according to the determined first percentage;

    select one or more terms from the second set of terms according to the determined second percentage;

    combine the selected terms from the first and second sets of terms into the list of output terms, the list of output terms having the first and second concepts of the text; and

    store the list of output terms in the one or more memory units.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×