Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

US 20040199546A1
Filed: 04/14/2004
Published: 10/07/2004
Est. Priority Date: 01/27/2000
Status: Active Grant

First Claim

Patent Images

1-40. -40 (Cancelled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method are disclosed for producing a semantic representation of information in a semantic space. The information is first represented in a table that stores values which indicate, a relationship with predetermined categories. The categories correspond to dimensions in the semantic space. The significance of the information with respect to the predetermined categories is then determined. A trainable semantic vector (TSV) is constructed to provide a semantic representation of the information. The TSV has dimensions equal to the number of predetermined categories and represents the significance of the information relative to each of the predetermined categories. Various types of manipulation and analysis, such as searching, classification, and clustering, can subsequently be performed on a semantic level.

Citations

78 Claims

1-40. -40 (Cancelled)

41. A method of classifying new datasets within a predetermined number of categories based on assignment of a plurality of sample datasets to each category, the method comprising the steps:
- constructing a trainable semantic vector for each sample dataset relative to the predetermined categories in a multi-dimensional semantic space;
  
  constructing a trainable semantic vector for each category based on the trainable semantic vectors for the sample datasets;
  
  receiving a new dataset;
  
  constructing a trainable semantic vector for the new dataset;
  
  determining a distance between the trainable semantic vector for the new dataset and the trainable semantic vector of each category; and
  
  classifying the new dataset within the category whose trainable semantic vector has the shortest distance to the trainable semantic vector of the new dataset.
- View Dependent Claims (42, 43, 44, 45, 72)
- - 42. The method of claim 41 wherein the datasets correspond to documents.
  - 43. The method of claim 41 wherein the datasets correspond to email messages and the categories correspond to frequently asked questions with substantially static responses.
  - 44. The method of claim 41, further comprising the steps:
    - detecting when a prescribed number of new datasets has been classified; and
      
      updating the trainable semantic vectors for each of the categories.
  - 45. The method of claim 44, wherein the step of updating comprises the step of re-constructing trainable semantic vectors for each category based on the trainable semantic vectors for the sample datasets and the trainable semantic vectors for the new datasets added to each category.
  - 72. The method of claim 41, wherein:
    - the new data set or each of the sample data sets includes at least one data point; and
      
      the trainable semantic vector for each sample data set or the new dataset is constructed by performing the steps of;
      
      for each data point, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories;
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each of the at least one data point to form the semantic vector of the sample dataset or the new dataset.

46. A method of classifying new datasets within a predetermined number of categories based on assignment of a plurality of sample datasets to each category, the method comprising the steps:
- constructing a trainable semantic vector for each sample dataset relative to the predetermined categories in a multi-dimensional semantic space;
  
  receiving a new dataset;
  
  constructing a trainable semantic vector for the new dataset;
  
  identifying a select number of sample datasets whose trainable semantic vectors are closest in distance to the trainable semantic vector for the new dataset; and
  
  classifying the new dataset in the category containing the greatest number of the select sample datasets.
- View Dependent Claims (47, 48, 49, 73)
- - 47. The method of claim 46 wherein the datasets correspond to documents.
  - 48. The method of claim 46 wherein the datasets correspond to email messages and the categories correspond to frequently asked questions with substantially static responses.
  - 49. The method of claim 46, further comprising the steps:
    - detecting when a prescribed number of new datasets has been classified; and
      
      adding the new datasets to the set of sample datasets.
  - 73. The method of claim 46, wherein:
    - the new data set or each of the sample data sets includes at least one data point; and
      
      the trainable semantic vector for each sample data set or the new dataset is constructed by performing the steps of;
      
      for each data point, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories;
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each of the at least one data point to form the semantic vector of the sample dataset or the new dataset.

50. A method of classifying new datasets within a predetermined number of categories, the method comprising the steps:
- receiving a new dataset;
  
  constructing a trainable semantic vector for the new dataset, where the dimensions of the trainable semantic vector correspond to the predetermined number of categories;
  
  classifying the dataset in the category whose corresponding dimension in the trainable semantic vector has the largest value.
- View Dependent Claims (51, 52, 74)
- - 51. The method of claim 50 wherein the datasets correspond to documents.
  - 52. The method of claim 50 wherein the datasets correspond to email messages and the categories correspond to frequently asked questions with substantially static responses.
  - 74. The method of claim 50, wherein the trainable semantic vector for the new dataset is constructed by performing the steps of:
    - for each data point within the new dataset, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories;
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each data point to form the semantic vector of the new dataset.

53-62. -62 (Cancelled)

63. A system for classifying new datasets within a predetermined number of categories based on assignment of a plurality of sample datasets to each category, the system comprising:
- a computer configure to;
  
  construct a trainable semantic vector for each sample dataset relative to the predetermined categories in a multi-dimensional semantic space;
  
  construct a trainable semantic vector for each category based on the trainable semantic vectors for the sample datasets;
  
  receive a new dataset;
  
  construct a trainable semantic vector for the new dataset;
  
  determine a distance between the trainable semantic vector for the new dataset and the trainable semantic vector of each category; and
  
  classify the new dataset within the category whose trainable semantic vector has the shortest distance to the trainable semantic vector of the new dataset.
- View Dependent Claims (75)
- - 75. The system of claim 63, wherein:
    - the new data set or each of the sample data sets includes at least one data point; and
      
      the trainable semantic vector for each sample data set or the new dataset is constructed by performing the steps of;
      
      for each data point, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories;
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each of the at least one data point to form the semantic vector of the sample dataset or the new dataset.

64. A system for classifying new datasets within a predetermined number of categories based on assignment of a plurality of sample datasets to each category, the system comprising:
- a computer configured to;
  
  construct a trainable semantic vector for each sample dataset relative to the predetermined categories in a multi-dimensional semantic space;
  
  receive a new dataset;
  
  construct a trainable semantic vector for the new dataset;
  
  identify a select number of sample datasets whose trainable semantic vectors are closest in distance to the trainable semantic vector for the new dataset; and
  
  classify the new dataset in the category containing the greatest number of the select sample datasets.
- View Dependent Claims (76)
- - 76. The system of claim 64, wherein:
    - the new data set or each of the sample data sets includes at least one data point; and
      
      the trainable semantic vector for each sample data set or the new dataset is constructed by performing the steps of;
      
      for each data point, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories; and
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each of the at least one data point to form the semantic vector of the sample dataset or the new dataset.

65-68_Cancelled_. -68(Cancelled)

69. A computer-readable medium carrying one or more sequences of instructions for classifying new datasets within a predetermined number of categories based on assignment of a plurality of sample datasets to each category, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- constructing a trainable semantic vector for each sample dataset relative to the predetermined categories in a multi-dimensional semantic space;
  
  constructing a trainable semantic vector for each category based on the trainable semantic vectors for the sample datasets;
  
  receiving a new dataset;
  
  constructing a trainable semantic vector for the new dataset;
  
  determining a distance between the trainable semantic vector for the new dataset and the trainable semantic vector of each category; and
  
  classifying the new dataset within the category whose trainable semantic vector has the shortest distance to the trainable semantic vector of the new dataset.
- View Dependent Claims (77)
- - 77. The medium of claim 69, wherein:
    - the new data set or each of the sample data sets includes at least one data point; and
      
      the trainable semantic vector for each sample data set or the new dataset is constructed by performing the steps of;
      
      for each data point, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories;
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each of the at least one data point to form the semantic vector of the sample dataset or the new dataset.

70. A computer-readable medium carrying one or more sequences of instructions for classifying new datasets within a predetermined number of categories based on assignment of a plurality of sample datasets to each category, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- constructing a trainable semantic vector for each sample dataset relative to the predetermined categories in a multi-dimensional semantic space;
  
  receiving a new dataset;
  
  constructing a trainable semantic vector for the new dataset;
  
  identifying a select number of select datasets whose trainable semantic vectors are closest in distance to the trainable semantic vector for the new dataset; and
  
  classifying the new dataset in the category containing the greatest number of the select datasets.
- View Dependent Claims (78)
- - 78. The medium of claim 70, wherein:
    - the new data set or each of the sample data sets includes at least one data point; and
      
      the trainable semantic vector for each sample data set or the new dataset is constructed by performing the steps of;
      
      for each data point, constructing a table for storing information indicative of a relationship between each data point and predetermined categories corresponding to dimensions in the semantic space;
      
      determining the significance of each data point with respect to the predetermined categories;
      
      constructing a trainable semantic vector for each data point, wherein each trainable semantic vector has dimensions equal to the number of predetermined categories and represents the relative strength of its corresponding data point with respect to each of the predetermined categories; and
      
      combining the trainable semantic vector for each of the at least one data point to form the semantic vector of the sample dataset or the new dataset.

71. (Cancelled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Manning & Napier Information Services LLC
Original Assignee
Manning & Napier Information Services LLC
Inventors
Yuan, Bo, Snyder, David L., Calistri-Yeh, Randall J., Osborne, George B.

Granted Patent

US 7,444,356 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Y10S 707/99939   Privileged access

Y10S 707/99945   Object-oriented database st...

Y10S 707/99948   Application of database or ...

Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

78 Claims

Specification

Solutions

Use Cases

Quick Links

Construction of trainable semantic vectors and clustering, classification, and searching using trainable semantic vectors

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

78 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links