Creation Of A Category Tree With Respect To The Contents Of A Data Stock
First Claim
1. A method for the automatic creation of a category tree with respect to the contents of a data stock comprising information objects, wherein the information objects of the data stock are indexed in an index, characterized by the following process steps:
- 1. Filtering out stop words for each information object in the index by means of a list;
2. Creating a list of words in which the stop words which have been filtered out are not contained;
3. Calculating a significance value for each word in the list of words;
4. Sorting the list of words according to their significance values;
5. Reducing the sorted list of words to a maximum number;
6. Storing the reduced list of words in a table;
7. Detecting co-occurrences in the stored list of words;
8. Storing the co-occurrences in a table in a database;
9. Retrieving words which have the highest significance value but no co-occurrences;
10. Selecting a first level of the category tree from the retrieved words;
11. Retrieving words for each selected word of the first level by means of the co-occurrence table, which words are in co-occurrence with the respectively selected word of the first level;
12. Creating a list of words with the retrieved words;
13. Retrieving the frequency of each word on the list of words;
14. Sorting the list of words according to frequency;
15. Reducing the sorted list of words to a predetermined maximum number, wherein the words which comprise a frequency above average remain on the list of words;
16. Selecting another level of the category tree on the base of the determined words;
17. Iteratively repeating of the process steps 11 through 16 for at least one other level of the category tree, wherein in process step 11 during the retrieve of words by means of the co-occurrence table, for each selected word of the first and at least one other level, the words will be retrieved which are in co-occurrence with the respectively selected word of the first and at least one other level, until the quantity of retrieved/selected words is equal to zero.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods for the automatic creation of a category tree with respect to the contents of a data stock, wherein a taxonomy of the data stock will be created on the base of co-occurrences. Another object of the present invention is furthermore a data processing system comprising data which represent information in at least one data stock which is accessible via at least one data source, which is designed and/or adapted to at least partially carry out a method according to the invention. Another object of the present invention is furthermore a data processing device for the electronic processing of data, comprising a control and/or computer unit, an input unit and an output unit, which is designed and/or adapted to at least partially carry out a method according to the invention, preferably using at least a part of a data processing system according to the invention.
-
Citations
31 Claims
-
1. A method for the automatic creation of a category tree with respect to the contents of a data stock comprising information objects, wherein the information objects of the data stock are indexed in an index, characterized by the following process steps:
-
1. Filtering out stop words for each information object in the index by means of a list; 2. Creating a list of words in which the stop words which have been filtered out are not contained; 3. Calculating a significance value for each word in the list of words; 4. Sorting the list of words according to their significance values; 5. Reducing the sorted list of words to a maximum number; 6. Storing the reduced list of words in a table; 7. Detecting co-occurrences in the stored list of words; 8. Storing the co-occurrences in a table in a database; 9. Retrieving words which have the highest significance value but no co-occurrences; 10. Selecting a first level of the category tree from the retrieved words; 11. Retrieving words for each selected word of the first level by means of the co-occurrence table, which words are in co-occurrence with the respectively selected word of the first level; 12. Creating a list of words with the retrieved words; 13. Retrieving the frequency of each word on the list of words; 14. Sorting the list of words according to frequency; 15. Reducing the sorted list of words to a predetermined maximum number, wherein the words which comprise a frequency above average remain on the list of words; 16. Selecting another level of the category tree on the base of the determined words; 17. Iteratively repeating of the process steps 11 through 16 for at least one other level of the category tree, wherein in process step 11 during the retrieve of words by means of the co-occurrence table, for each selected word of the first and at least one other level, the words will be retrieved which are in co-occurrence with the respectively selected word of the first and at least one other level, until the quantity of retrieved/selected words is equal to zero. - View Dependent Claims (2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28)
-
-
8. A method for the automatic creation/calculation of a category tree with respect to the contents of all texts of a data stock, characterized by the following process steps:
-
1. Creating sets of words having a pre-determinable number of significant words for each text of the data stock; 2. Storing the respective set of words in a relational database in form of a list of words, wherein the words are respectively assigned to an identifier of the respective set of words; 3. Creating a list of words from the base stored sets of words; 4. Selecting a first level of the category tree on the base of the words of the created list of words; 5. Retrieving co-occurrences for each word in the list of words within the sets of words stored in the database; 6. Storing the co-occurrences in a database in form of a list of words; 7. Retrieving another level of the category tree on the base of the words of the created list of words; 8. Retrieving co-occurrences for each word combination of the first and the at least one other level of the category tree with other words of the created list of words within the sets of words stored in the database; 9. Storing the co-occurrences in a database in form of a list of words; 10. Iteratively repeating the process steps 7 through 9 for at least one other level of the category tree until the number of the words retrieved in process step 8 for each combination of words of the first and the at least one other level of the category tree with other words of the list of words within the sets of words stored in the database is equal to zero. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 29, 31)
-
-
19. A method for the automatic creation of a category tree with respect to the contents of all texts of a data stock, characterized by the following process steps:
-
1. Creating sets of words having a pre-determinable number of significant words for each text of the data stock; 2. Storing the respective set of words in a relational database in form of a list of words, wherein the words are respectively assigned to an identifier of the respective set of words; 3. Creating a list of words on the base of the sets of words; 4. Selecting a first level of the category tree on the base of the words of the created list of words; 5. Comparing each word on the list of words to each word within the sets of words stored in the database, wherein it is checked whether two words match and/or achieve a certain minimum similarity with respect to each other, and wherein in case of a match and/or given minimum similarity between the one word and all other words of the sets of words a weighted link having the weight 0.1 will be created, wherein the weight of the link will be increased by 0.1 if the link already exists and wherein if a weight of 1.0 is exceeded, the weight will be reset to 0.9 and all other links will be reduced to a value of 90%; 6. Retrieving the links of each word on the created list of words; 7. Storing the links in a list of words; 8. Selecting another level of the category tree on the base of the retrieved links and/or stored list of words; 9. Retrieving the links of each word on the created list of words and at least one stored list of words; 10. Storing the links in a list of words; 11. Iteratively repeating the process steps 8 through 10 for at least one other level of the category tree until the number of the links retrieved in process step 9 is equal to zero. - View Dependent Claims (20, 21, 22, 23, 30)
-
Specification