×

Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

  • US 7,406,456 B2
  • Filed: 04/14/2004
  • Issued: 07/29/2008
  • Est. Priority Date: 01/27/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for a data processing system to efficiently identify at least one dataset from a collection of datasets according to a query containing information indicative of desired datasets, wherein each dataset is a document and includes one or more data points and each data point corresponds to at least one of a word, a phase, and a sentence, the method comprising the machine-executed steps:

  • for each dataset, constructing a semantic vector representing each dataset;

    receiving the query containing information indicative of desired datasets;

    for the query, constructing a semantic vector representing the query;

    selecting datasets based on a distance between the semantic vector for the query and the semantic vector of each dataset; and

    displaying information of the selected datasets to be corresponding to the desired datasets identified in the query;

    wherein;

    the query or each of the datasets includes at least one data point; and

    the semantic vector for the query or each of the datasets is constructed by the steps of;

    for each data point, identifying a relationship between each data point and multiple predetermined categories corresponding to dimensions in the semantic space;

    determining the significance of each data point with respect to the multiple predetermined categories according to a predetermined formula;

    for each data point, constructing a semantic vector representing each data point, wherein each semantic vector has dimensions equal to the number of multiple predetermined categories and represents the significance of its corresponding data point with respect to each of the multiple predetermined categories; and

    based on the semantic vector for each of the at least one data point, form the semantic vector representing the query or each of the datasets; and

    wherein the significance of each data point is determined by calculating a probability distribution of each data point occurring in each predetermined category and a probability distribution of the data point'"'"'s occurrence across all predetermined categories.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×