Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

US 7,299,247 B2
Filed: 04/14/2004
Issued: 11/20/2007
Est. Priority Date: 01/27/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for a data processing system to efficiently cluster data points from a dataset, the method comprising the machine-executed steps of:

constructing a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space;

applying a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and

providing access to a result of the clustering process;

wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of;

for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;

determining the significance of each data point with respect to the predetermined categories; and

constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method are disclosed for producing a semantic representation of information in a semantic space. The information is first represented in a table that stores values which indicate a relationship with predetermined categories. The categories correspond to dimensions in the semantic space. The significance of the information with respect to the predetermined categories is then determined. A trainable semantic vector (TSV) is constructed to provide a semantic representation of the information. The TSV has dimensions equal to the number of predetermined categories and represents the significance of the information relative to each of the predetermined categories. Various types of manipulation and analysis, such as searching, classification, and clustering, can subsequently be performed on a semantic level.

54 Citations

View as Search Results

14 Claims

1. A method for a data processing system to efficiently cluster data points from a dataset, the method comprising the machine-executed steps of:
- constructing a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space;
  
  applying a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and
  
  providing access to a result of the clustering process;
  
  wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of;
  
  for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
  
  determining the significance of each data point with respect to the predetermined categories; and
  
  constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the data points correspond to documents.
  - 3. The method of claim 1, wherein the step of applying a clustering process comprises the steps:
    - randomly distributing the data points among a predetermined number of clusters;
      
      determining a cluster center for each cluster;
      
      re-distributing the data points based on the determined cluster centers;
      
      measuring an amount of change in each cluster; and
      
      repeating the steps of determining, re-distributing, and measuring until a predetermined convergence factor has been reached.
  - 4. The method of claim 3, wherein:
    - the step of randomly distributing comprises a step of randomly assigning a fuzzy membership function to each data point; and
      
      the step of re-distributing comprises the step of recalculating the fuzzy membership function for each data point.
  - 5. The method of claim 4, further comprising the step of making final cluster assignments based on the fuzzy membership functions.
  - 6. The method of claim 5, wherein each data point is assigned to zero or more clusters.
  - 7. The method of claim 3, wherein the step of randomly distributing comprises a step of randomly distributing an equal number of data points to each of the predetermined number of clusters.
  - 8. The method of claim 3, wherein the predetermined convergence factor is equal to about 0.0001.
  - 9. The method of claim 3, wherein the predetermined number of clusters is automatically determined based on the size of the dataset.
  - 10. The method of claim 3, wherein the predetermined number of clusters is input by a user.
  - 11. The method of claim 3, wherein the step of determining a cluster center comprises a step of constructing an average trainable semantic vector representative of an average value of all datasets within the cluster across all dimensions of the semantic space.
  - 12. The method of claim 11, wherein the step of re-distributing comprises a step of assigning the data points to clusters based on the distance from a data point to the nearest cluster center.

13. A system for clustering data points from a dataset comprising:
- a computer configured to;
  
  construct a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space;
  
  apply a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and
  
  provide access to a result of the clustering process;
  
  wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of;
  
  for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
  
  determining the significance of each data point with respect to the predetermined categories; and
  
  constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.

14. A computer-readable medium carrying one or more sequences of instructions for clustering data points from a dataset, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the machine-executed steps of:
- constructing a trainable semantic vector for each data point from the dataset in a multi-dimensional semantic space;
  
  applying a clustering process to the constructed trainable semantic vectors to identify similarities between groups of data points within the dataset; and
  
  providing a result of the clustering process;
  
  wherein the trainable semantic vector for each data point from the dataset is constructed by the machine-executed steps of;
  
  for each data point, identifying a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
  
  determining the significance of each data point with respect to the predetermined categories; and
  
  constructing a semantic vector for each data point, wherein each semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Manning & Napier Information Services LLC
Original Assignee
Manning & Napier Information Services LLC
Inventors
Yuan, Bo, Snyder, David L., Calistri-Yeh, Randall J., Osborne, George B.
Primary Examiner(s)
PANNALA, SATHYANARAYA R

Application Number

US10/823,561
Publication Number

US 20040193414A1
Time in Patent Office

1,315 Days
Field of Search

707 2- 6, 707/9, 707/104.1
US Class Current

1/1
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Y10S 707/99939   Privileged access

Y10S 707/99945   Object-oriented database st...

Y10S 707/99948   Application of database or ...

Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

54 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links