System and method for organizing data
First Claim
1. A method for converting information from at least one raw database into a distilled database, the raw database including a plurality of records, each of the plurality of records including a data field, each data field including a data element, the method comprising the steps of:
- converting a non-numeric data field in the raw database to a numeric vector;
comparing said vector with a distilled matrix to determine whether said vector is included in said distilled matrix;
including said vector in said distilled matrix if said vector is not included in said distilled matrix; and
forming the distilled database using said distilled matrix;
wherein said step of converting the data field comprises the steps of;
selecting an appropriate number system with a radix at least equal to a number of possible values of a data element in said data field;
representing said data element as a digit in the number system; and
storing said digit in said vector.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for organizing raw data from one or more sources. The content of the raw data is converted into an appropriate number system and stored in a format that facilitates the use of efficient mathematical operations. The number system is selected to handle each of the various elements, characters, or other representative indicia found in the raw data. Furthermore, the number system is selected so that the numerical data retains semantic significance with respect to the raw data. Once converted into the numeric format, the data is processed using various techniques to extract the best information from the raw data into a distilled database.
-
Citations
38 Claims
-
1. A method for converting information from at least one raw database into a distilled database, the raw database including a plurality of records, each of the plurality of records including a data field, each data field including a data element, the method comprising the steps of:
-
converting a non-numeric data field in the raw database to a numeric vector;
comparing said vector with a distilled matrix to determine whether said vector is included in said distilled matrix;
including said vector in said distilled matrix if said vector is not included in said distilled matrix; and
forming the distilled database using said distilled matrix;
wherein said step of converting the data field comprises the steps of;
selecting an appropriate number system with a radix at least equal to a number of possible values of a data element in said data field;
representing said data element as a digit in the number system; and
storing said digit in said vector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
maintaining information with said vector indicative of its origin in the raw database.
-
-
3. The method of claim 1, further comprising the steps of:
-
including said vector in a reference database; and
identifying an appropriate position for said vector in said reference database.
-
-
4. The method of claim 3, wherein said step of identifying an appropriate position for said vector comprises the step of locating another vector similar to said vector.
-
5. The method of claim 4, wherein said step of locating another vector similar to said vector comprises the step of numerically comparing said vector with said another vector.
-
6. The method of claim 3, further comprising the step of locating a first vector in said reference database that is similar to a second vector in said reference database.
-
7. The method of claim 6, wherein said step of locating a first vector comprises the step of locating said first vector in said reference database that is identifiable as said second vector in said reference database.
-
8. The method of claim 7, wherein said step of locating said first vector comprises the step of locating said first vector in said reference database that is a duplicate of said second vector in said reference database.
-
9. The method of claim 6, further comprising the step of forming a distilled vector from said first vector and said second vector that includes the best information from said first vector and said second vector.
-
10. The method of claim 9, wherein said step of comparing said vector with a distilled matrix comprises the step of comparing said distilled vector with said distilled matrix to determine whether said distilled vector is included in said distilled matrix.
-
11. The method of claim 3, further comprising the step of locating a first vector in said reference database that is dissimilar to every other vector in said reference database.
-
12. The method of claim 11, further comprising the step of forming a distilled vector from said first vector.
-
13. The method of claim 12, wherein said step of comparing said vector with a distilled matrix comprises the step of comparing said distilled vector with said distilled matrix to determine whether said distilled vector is included in said distilled matrix.
-
14. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a dot product between said vector and a vector in said distilled matrix.
-
15. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises performing an eigenvector analysis.
-
16. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises performing a pattern recognition analysis.
-
17. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a dot product between said vector and a vector in said distilled matrix.
-
18. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a cross product between said vector and a vector in said distilled matrix.
-
19. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a difference between said vector and a vector in said distilled matrix.
-
20. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a sum of said vector and a vector in said distilled matrix.
-
21. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a determinant of said distilled matrix.
-
22. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a magnitude of said vector.
-
23. The method of claim 1, wherein said step of comparing said vector with a distilled matrix comprises the step of determining a direction of said vector.
-
24. A method for converting information from a raw database into a distilled database, the raw database including a plurality of records, each of the plurality of records including a data field, the data field including a plurality of data elements, the method comprising:
-
converting the plurality of data elements in at least one non-numeric data field of one of the plurality of records in the raw database to a numeric value;
forming a vector including said numeric value, said vector representative of said one of the plurality of records in the raw database;
comparing said vector with a distilled matrix to determine whether said vector is included in said distilled matrix, said comparing using said numeric value;
including said vector in said distilled matrix if said vector is not included in said distilled matrix; and
forming the distilled database using said distilled matrix, wherein said converting the plurality of data elements in at least one non-numeric data field of one of the plurality of records comprises;
representing each of the plurality of data elements as a digit in a number system, said number system having a radix at least equal to a number of possible values of a data element in said non-numeric data field, said digits collectively forming said numeric value in said number system. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
maintaining information with said vector indicative of its origin in the raw database.
-
-
26. The method of claim 24, further comprising:
-
including said vector in a reference database; and
identifying an appropriate position for said vector in said reference database.
-
-
27. The method of claim 26, wherein said identifying an appropriate position for said vector comprises locating another vector similar to said vector.
-
28. The method of claim 27, wherein said locating another vector similar to said vector comprises numerically comparing said vector with said another vector.
-
29. The method of claim 26, further comprising locating a first vector in said reference database that is similar to a second vector in said reference database.
-
30. The method of claim 29, wherein said locating a first vector comprises locating said first vector in said reference database that is identifiable as said second vector in said reference database.
-
31. The method of claim 30, wherein said locating said first vector comprises locating said first vector in said reference database that is a duplicate of said second vector in said reference database.
-
32. The method of claim 30, further comprising forming a distilled vector from said first vector and said second vector that includes the best information from said first vector and said second vector.
-
33. The method of claim 32, wherein said comparing said vector with a distilled matrix comprises comparing said distilled vector with said distilled matrix to determine whether said distilled vector is included in said distilled matrix.
-
34. The method of claim 26, further comprising locating a first vector in said reference database that is dissimilar to every other vector in said reference database.
-
35. The method of claim 34, further comprising forming a distilled vector from said first vector.
-
36. The method of claim 35, wherein said comparing said vector with a distilled matrix comprises comparing said distilled vector with said distilled matrix to determine whether said distilled vector is included in said distilled matrix.
-
37. A method for converting information from a raw database into a distilled database, the raw database including a plurality of records, each of the plurality of records including a non-numeric data field having a plurality of data elements, the method comprising:
-
converting a value of the non-numeric data field of one of the plurality of records in the raw database to a numeric value represented in a first number system, said first number system having a radix at least equal to a number of possible values of each of the plurality of data elements, said numeric value retaining semantic significance with respect to said value of the non-numeric data field;
forming a vector including said numeric value, said vector representative of said one of the plurality of records in the raw database;
comparing said vector with the distilled database to determine whether said vector is included in the distilled database, said comparing using said numeric value; and
including said vector in the distilled database if said vector is not included in the distilled database.
-
-
38. A computer program product that includes a computer readable medium having stored therein a computer program for carrying out a method for converting information from a raw database into a distilled database, the raw database including a plurality of records, each of the plurality of records including a non-numeric data field, the non-numeric data field including a plurality of data elements, the computer program comprising:
-
a first code segment for converting the plurality of data elements in the non-numeric data field of one of the plurality of records in the raw database to a numeric value by representing each of the plurality of data elements as a digit in a number system, the number system having a radix at least equal to a number of possible values of a data element in the non-numeric data field, the digits collectively forming said numeric value in the number system;
a second code segment for forming a vector including said numeric value, said vector representative of the one of the plurality of records in the raw database;
a third code segment for comparing said vector with the distilled database to determine whether said vector is included in the distilled database, said comparing using said numeric value; and
a fourth code segment for including said vector in the distilled database if said vector is not included in the distilled database.
-
Specification