Creation of a category tree with respect to the contents of a data stock

US 8,745,069 B2
Filed: 11/08/2010
Issued: 06/03/2014
Est. Priority Date: 05/08/2008
Status: Active Grant

First Claim

Patent Images

1. A system for analyzing data to establish a category tree comprising:

a data source;

an inventory representation of data in communication with the data source;

a computer unit having a processor in communication with said data source and said inventory representation of data;

software executing on said processor to;

1. create a list of words of each element within the inventory representation of data;

2. filter out stop words in each of said list of words;

3. calculate a significance value for each word remaining in each said list of words;

4. sort said list of words in descending order according to the significance values to create a sorted list of words;

5. reduce said sorted list of words to a maximum number of top elements to create a reduced list of words;

6. store said reduced list of words in a persistent memory;

7. detect co-occurrences within the stored reduced list of words;

8. store said co-occurrences as a table in the persistent memory;

9. retrieve words from the stored reduced list of words which have the highest significance values but which have no co-occurrences with each other;

10. establish a first level of the category tree using said retrieved words;

11. retrieve a list of co-occurrences for each word of said first level from said stored reduced list of words;

12. create a corresponding list of words for each said list of co-occurrences having no co-occurrences with each other;

13. calculate a frequency of co-occurrences for each of said corresponding list of words;

14. sort said corresponding list of words in descending order according to the frequency to create a sorted corresponding list of words;

15. reduce said sorted corresponding list of words to a predetermined maximum number of top elements to create a reduced corresponding list of words;

16. establish a subordinate level of the category tree using said reduced corresponding list of words; and

,17. iteratively repeat steps 11 through 16 while no further co-occurrences can be retrieved from said persistent memory for a set of superior categories, wherein in step 11 the retrieved co-occurrences exists for all superior categories in said category tree;

wherein the category tree is consolidated for display on a display device.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods for the automatic creation of a category tree with respect to the contents of a data stock, wherein a taxonomy of the data stock will be created on the base of co-occurrences. Another object of the present invention is furthermore a data processing system comprising data which represent information in at least one data stock which is accessible via at least one data source, which is designed and/or adapted to at least partially carry out a method according to the invention. Another object of the present invention is furthermore a data processing device for the electronic processing of data, comprising a control and/or computer unit, an input unit and an output unit, which is designed and/or adapted to at least partially carry out a method according to the invention, preferably using at least a part of a data processing system according to the invention.

10 Citations

29 Claims

1. A system for analyzing data to establish a category tree comprising:
- a data source;
  
  an inventory representation of data in communication with the data source;
  
  a computer unit having a processor in communication with said data source and said inventory representation of data;
  
  software executing on said processor to;
  
  1. create a list of words of each element within the inventory representation of data;
  
  2. filter out stop words in each of said list of words;
  
  3. calculate a significance value for each word remaining in each said list of words;
  
  4. sort said list of words in descending order according to the significance values to create a sorted list of words;
  
  5. reduce said sorted list of words to a maximum number of top elements to create a reduced list of words;
  
  6. store said reduced list of words in a persistent memory;
  
  7. detect co-occurrences within the stored reduced list of words;
  
  8. store said co-occurrences as a table in the persistent memory;
  
  9. retrieve words from the stored reduced list of words which have the highest significance values but which have no co-occurrences with each other;
  
  10. establish a first level of the category tree using said retrieved words;
  
  11. retrieve a list of co-occurrences for each word of said first level from said stored reduced list of words;
  
  12. create a corresponding list of words for each said list of co-occurrences having no co-occurrences with each other;
  
  13. calculate a frequency of co-occurrences for each of said corresponding list of words;
  
  14. sort said corresponding list of words in descending order according to the frequency to create a sorted corresponding list of words;
  
  15. reduce said sorted corresponding list of words to a predetermined maximum number of top elements to create a reduced corresponding list of words;
  
  16. establish a subordinate level of the category tree using said reduced corresponding list of words; and
  
  ,17. iteratively repeat steps 11 through 16 while no further co-occurrences can be retrieved from said persistent memory for a set of superior categories, wherein in step 11 the retrieved co-occurrences exists for all superior categories in said category tree;
  
  wherein the category tree is consolidated for display on a display device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein in step 3 the significance value for each word in each said list of words is calculated as the quotient of the relative word frequency within the related data in said inventory representation of data and the relative word frequency within the entire word index.
  - 3. The system according to claim 1, wherein in step 6 during the storing of the reduced list of words as a table, the words in the table will be assigned a significance value, and if the significance value of a given word is higher than the significance value of another instance of the word, the higher significance value will be used, else the significance value will not be modified.
  - 4. The system of claim 1, wherein in step 8 during the storing of the co-occurrences as a table in said persistent memory, said persistent memory will contain a table of co-occurrences having a frequency value in a table line, and wherein the frequency value will be increased by 1, if a co-occurrence already exists in the table, else the initial frequency value will be set to 1.
  - 5. The system of claim 1, wherein the data source is accessible over a network.
  - 6. The system of claim 1, wherein the interface comprises a graphical user interface.
  - 7. The system of claim 1, wherein the inventory representation of data comprises a plurality of elements, each representing either data accessible via the data source or interrelations among the elements.
  - 8. The system of claim 7, wherein said interrelations comprise syntactic interrelations.
  - 9. The system of claim 7, wherein said interrelations comprise semantic interrelations.
  - 10. The system of claim 1, wherein the category tree is consolidated for display on a display device using a similarity check.

11. A system for analyzing data to establish a category tree comprising:
- a data source;
  
  an inventory representation of data in communication with the data source;
  
  a computer unit having a processor in communication with said data source and said inventory representation of data;
  
  software executing on said processor to;
  
  1. create sets of words having a pre-determinable number of significant words for each text of the inventory representation of data;
  
  2. store each set of words in a persistent memory as a list of words with an identifier of the related set of words for each word;
  
  3. retrieve a list of words from each set of words;
  
  4. establish a first level of the category tree with said retrieved list of words;
  
  5. retrieve co-occurrences within each set of words stored in said persistent memory for each word in said list of words of the first level of the category tree;
  
  6. store the co-occurrences in said persistent memory as a list of words;
  
  7. establish a subordinate level of the category tree based on the list of co-occurrences;
  
  8;
  
  determine co-occurrences for each word combination of the first and each subordinate level of the category tree within the stored sets of words in said persistent memory;
  
  9. store said co-occurrences of each word combinations in said persistent memory;
  
  10. iteratively repeat steps 7 through 9 for subordinated levels of the category tree until no further co-occurrences can be determined in step 8 for each combination of words;
  
  wherein the category tree is consolidated for display on a display device.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 12. The system of claim 11, wherein said list of words retrieved in step 3 is at least partially displayed using a display device of a computer.
  - 13. The system of claim 12, wherein said list of words retrieved in step 3 is at least partially displayed in graphic form.
  - 14. The system of claim 11, wherein said list of words retrieved in step 3 is sorted in a descending manner according to the frequency of the respective words.
  - 15. The system of claim 11, wherein in step 5 during the retrieval of co-occurrences in said stored list of words, each word on the list of words will be compared one after the other to the words of each set of words.
  - 16. The system of claim 11, wherein said co-occurrences stored in step 6 is at least partially displayed by a computer display device.
  - 17. The system of claim 16, wherein said list of co-occurrences stored in step 6 is at least partially displayed in graphic form.
  - 18. The system of claim 11, wherein the category tree is consolidated for display on a display device using a similarity check.
  - 19. The system of claim 18, wherein within the scope of said similarity check, words having different word endings but the same word stem will be summarized in the shortest variant.
  - 20. The system of claim 18, wherein within the scope of said similarity check, two words having different lengths will be respectively compared to each other, in that the longer word will be shortened by two letters, the shorter word will then be brought to the length of the other word and both words will then be checked on a concordance.
  - 21. The system of claim 11, wherein determining co-occurrences in step 5 or step 8 a similarity check is used to summarize words having different word endings but the same word stem in the shortest variant.
  - 22. The system of claim 21, wherein within the scope of said similarity check two words having different lengths will be respectively compared to each other, in that the longer word will be shortened by two letters, the shorter word will then be brought to the length of the other word and both words will then be checked on a concordance.
  - 23. The system of claim 11, wherein said pre-determinable number in step 1 is limited to up to 32.

24. A system for analyzing data to establish a category tree comprising:
- a data source;
  
  an inventory representation of data in communication with the data source;
  
  a computer unit having a processor in communication with said data source and said inventory representation of data;
  
  software executing on said processor to;
  
  1. create sets of words having a pre-determinable number of significant words for each text of the inventory representation of data;
  
  2. store each set of words in a persistent memory as a list of words, with an identifier of the related set of words for each word;
  
  3. retrieve a list of words from all words in said persistent memory;
  
  4. establish a first level of the category tree with said retrieved list of words;
  
  5. compare each word in said list of words to each word within the sets of words stored in the persistent memory, to determine whether two words match or achieve a predefined minimum similarity with respect to each other, wherein in case of no match of a word in said list of words this word will be skipped, and wherein in case of a match or given minimum similarity between the one word and all other words of said sets of words a weighted link having the weight 0.1 will be created if no link already exists, else the weight of the link will be increased by 0.1 and wherein if a weight of 1.0 is exceeded, the weight will be reset to 0.9 and all other links will be reduced to a value of 90%, else the increased weight will be used;
  
  6. retrieve the links of each word on the retrieved list of words;
  
  7. store the links in a list of words;
  
  8. retrieve a subordinated level of the category tree based on its stored list of words;
  
  9. retrieve the links of each word on the created list of words and at least one stored list of words;
  
  10. store the links in a list of words;
  
  11. iteratively repeat the steps 8 through 10 for subordinated levels of the category tree until the number of the links retrieved in step 9 is equal to zero;
  
  wherein the category tree is consolidated for display on a display device.
- View Dependent Claims (25, 26, 27, 28, 29)
- - 25. The system of claim 24, wherein said list of words retrieved in step 3 is at least partially displayed using a computer display device.
  - 26. The system of claim 25, wherein said list of co-occurrences stored in step 3 is at least partially displayed in graphic form.
  - 27. The system of claim 24, wherein the consolidation comprises a similarity check.
  - 28. The system of claim 27, wherein within the scope of said similarity check, words having different word endings but the same word stem will be summarized in the shortest variant.
  - 29. The system of claim 27, wherein within the scope of said similarity check, two words having different lengths will be respectively compared to each other, the longer word will be shortened by two letters, the shorter word will then be brought to the length of the longer word, and both words will then be checked on a concordance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Knecon Iqser Holding GmbH
Original Assignee
IQser IP AG
Inventors
Wurzer, Joerg, Magnus, Christian
Primary Examiner(s)
Hwa, Shyue Jiunn

Application Number

US12/941,818
Publication Number

US 20110113043A1
Time in Patent Office

1,303 Days
Field of Search

707/750
US Class Current

707/750
CPC Class Codes

G06F 16/355 Class or cluster creation o...

G06F 16/374 Thesaurus

Creation of a category tree with respect to the contents of a data stock

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

Creation of a category tree with respect to the contents of a data stock

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others