Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors
First Claim
1. A method for a data processing system to efficiently cluster data points from a dataset, the method comprising the machine-executed steps of:
- constructing a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space;
applying a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and
providing access to a result of the clustering process;
wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of;
for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
determining the significance of each data point with respect to the predetermined categories; and
constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method are disclosed for producing a semantic representation of information in a semantic space. The information is first represented in a table that stores values which indicate a relationship with predetermined categories. The categories correspond to dimensions in the semantic space. The significance of the information with respect to the predetermined categories is then determined. A trainable semantic vector (TSV) is constructed to provide a semantic representation of the information. The TSV has dimensions equal to the number of predetermined categories and represents the significance of the information relative to each of the predetermined categories. Various types of manipulation and analysis, such as searching, classification, and clustering, can subsequently be performed on a semantic level.
54 Citations
14 Claims
-
1. A method for a data processing system to efficiently cluster data points from a dataset, the method comprising the machine-executed steps of:
-
constructing a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space; applying a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and providing access to a result of the clustering process; wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of; for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space; determining the significance of each data point with respect to the predetermined categories; and constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for clustering data points from a dataset comprising:
-
a computer configured to; construct a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space; apply a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and provide access to a result of the clustering process; wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of; for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space; determining the significance of each data point with respect to the predetermined categories; and constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.
-
-
14. A computer-readable medium carrying one or more sequences of instructions for clustering data points from a dataset, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the machine-executed steps of:
-
constructing a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space; applying a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and providing a result of the clustering process; wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of; for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space; determining the significance of each data point with respect to the predetermined categories; and constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.
-
Specification