Dynamic taxonomy process for browsing and retrieving information in large heterogeneous data bases
DCFirst Claim
1. A process for retrieving information on large heterogeneous databases, wherein information retrieval is performed through visual queries on dynamic taxonomies, said dynamic taxonomies being an organization of concepts that ranges from a most general concept to a most specific concept, said concepts and their generalization or specialization relationships being an intension, documents in said databases being able to be classified under different concepts, said documents and their classification being called an extension, said process comprising:
- initially displaying a complete taxonomy for said retrieval;
selecting subsets of interest of said complete taxonomy in order to refine said retrieval, said subsets of interest being specified by selecting taxonomy concepts and combining taxonomy concepts through boolean operations or being specified through querying methods, which retrieve classified documents according to different selection criteria, including words contained in a document;
displaying a reduced taxonomy for said selected set, said reduced taxonomy being derived from the original taxonomy by pruning the concepts under which no document in the selected subset of interest is classified; and
iteratively repeating said steps of selecting subsets and of displaying a reduced taxonomy to further refine said retrieval,wherein;
said process is performed on documents of any type and format;
said intension is organized as a hierarchy of concepts or as a directed acyclic graph of concepts, thereby allowing a concept to have multiple fathers;
said process dynamically reconstructs all relationships among concepts based on the classification without requiring, in the intension, concept relationships in addition to generalization or specialization, a relationship between any two concepts existing if and only if at least one document is classified (1) under a first concept or any descendants of the first concept, and (2) under a second concept, or any descendants of the second concept;
documents in said classification are classified under a concept at any level in said intension, including concepts with no sons;
said taxonomy supports operations for concept insertion, deletion, and modification;
said taxonomy supports operations for document insertion and classification, deletion, and reclassification;
documents in said classification are classified manually, programmatically, or automatically;
said process allows retrieval through different languages on a same database, while maintaining the same classification for all said languages; and
said step of displaying a reduced taxonomy either reports only the concepts belonging to the reduced taxonomy or, for each such concept, also reports how many documents in the interest set are classified under the concept.
3 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A process is disclosed for retrieving information in large heterogeneous data bases, wherein information retrieval through visual querying/browsing is supported by dynamic taxonomies; the process comprises the steps of: initially showing a complete taxonomy for the retrieval; refining the retrieval through a selection of subsets of interest, where the refining is performed by selecting concepts in the taxonomy and combining them through Boolean operations; showing a reduced taxonomy for the selected set; and further refining the retrieval through an iterative execution of the refining and showing steps.
62 Citations
15 Claims
-
1. A process for retrieving information on large heterogeneous databases, wherein information retrieval is performed through visual queries on dynamic taxonomies, said dynamic taxonomies being an organization of concepts that ranges from a most general concept to a most specific concept, said concepts and their generalization or specialization relationships being an intension, documents in said databases being able to be classified under different concepts, said documents and their classification being called an extension, said process comprising:
-
initially displaying a complete taxonomy for said retrieval; selecting subsets of interest of said complete taxonomy in order to refine said retrieval, said subsets of interest being specified by selecting taxonomy concepts and combining taxonomy concepts through boolean operations or being specified through querying methods, which retrieve classified documents according to different selection criteria, including words contained in a document; displaying a reduced taxonomy for said selected set, said reduced taxonomy being derived from the original taxonomy by pruning the concepts under which no document in the selected subset of interest is classified; and iteratively repeating said steps of selecting subsets and of displaying a reduced taxonomy to further refine said retrieval, wherein; said process is performed on documents of any type and format; said intension is organized as a hierarchy of concepts or as a directed acyclic graph of concepts, thereby allowing a concept to have multiple fathers; said process dynamically reconstructs all relationships among concepts based on the classification without requiring, in the intension, concept relationships in addition to generalization or specialization, a relationship between any two concepts existing if and only if at least one document is classified (1) under a first concept or any descendants of the first concept, and (2) under a second concept, or any descendants of the second concept; documents in said classification are classified under a concept at any level in said intension, including concepts with no sons; said taxonomy supports operations for concept insertion, deletion, and modification; said taxonomy supports operations for document insertion and classification, deletion, and reclassification; documents in said classification are classified manually, programmatically, or automatically; said process allows retrieval through different languages on a same database, while maintaining the same classification for all said languages; and said step of displaying a reduced taxonomy either reports only the concepts belonging to the reduced taxonomy or, for each such concept, also reports how many documents in the interest set are classified under the concept. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for retrieving items from electronic catalogs, for applications such as electronic commerce or electronic auctions, wherein retrieval is performed through visual queries on dynamic taxonomies, said dynamic taxonomies being a hierarchical organization of concepts, said concepts also comprising features such as price, items in said electronic catalogs being able to be classified under different concepts, said items and their classification being called an extension, said method comprising:
-
displaying a taxonomy for said retrieval; selecting a subset of interest of said taxonomy in order to refine said retrieval, said subset of interest being specified by selecting taxonomy concepts and combining said taxonomy concepts through boolean operations, or being specified through querying methods, said querying methods retrieving classified items according to different selection criteria; displaying a reduced taxonomy for said selected subset of interest, said reduced taxonomy being derived from said taxonomy by eliminating from the extension of said taxonomy all items not in said selected subset of interest and pruning concepts under which no item in said selected subset of interest is classified; and iteratively repeating said steps of selecting a subset and of displaying a reduced taxonomy to further refine said retrieVal, wherein; said hierarchical organization of concepts for said electronic catalogs comprises a set of features, each of said features being a descendant concept of the root concept of said organization, each of said features having as descendants in the taxonomy a set of concepts, each concept in said set of concepts representing either a single value or a set of values for said feature; said items in said electronic catalogs are classified, for each said feature, under zero or more concepts representing either a single value or a set of values for that feature; said step of displaying a reduced taxonomy either reports only the concepts belonging to the reduced taxonomy or, for each such concept also reports how many items in the interest set are classified under the concept; and said step of pruning of concepts includes eliminating from the taxonomy the concepts under which no item in the selected subset of interest is classified, or preventing such concepts from being selected in order to specify interest sets.
-
-
14. A method for retrieving association rules in data mining applications, wherein retrieval is performed through visual queries on dynamic taxonomies, said dynamic taxonomies being an organization of concepts that ranges from a most general concept to a most specific concept, said concepts and their generalization or specialization relationships being an intension, items in said applications being able to be classified under different concepts, said items and their classification being called an extension, said method comprising:
-
displaying a taxonomy for said retrieval; selecting a subset of interest of said taxonomy in order to refine said retrieval, said subset of interest being specified by selecting taxonomy concepts and combining taxonomy concepts through boolean operations, said selected concepts combined through boolean operations being a focus; displaying a reduced taxonomy for said selected subset of interest, said reduced taxonomy being derived from said taxonomy by eliminating from the extension of said taxonomy all items not in said selected subset of interest and pruning concepts under which no item in said selected subset of interest is classified; and iteratively repeating said steps of selecting a subset and of displaying a reduced taxonomy to further refine said retrieval, wherein; said step of pruning of concepts includes eliminating from the taxonomy the concepts under which no item in the selected subset of interest is classified, or preventing said concepts from being selected in order to specify interest sets; said intension is organized as a hierarchy of concepts or as a directed acyclic graph of concepts, thereby allowing a concept to have multiple fathers; each of said association rules defines a probabilistic correlation relationship between the antecedent, said antecedent being a subset of the extension, and the consequent, said consequent being a subset of the extension disjoint from said antecedent; for each concept in said reduced taxonomy, two association rules exist, the first association rule having the focus of said reduced taxonomy as the antecedent of said association rule and having said concept in the reduced taxonomy as the consequent of said association rule, the second association rule having said concept in the reduced taxonomy as the antecedent of said association rule and having the focus of said reduced taxonomy as the consequent of said association rule; in said step of displaying a reduced taxonomy, for an association rule in the reduced taxonomy, a measure of confidence is displayed, said measure of confidence being computed as the ratio between the number of items in the intersection of the antecedent and consequent of said association rule over the number of items in the antecedent of said association rule; in said step of displaying a reduced taxonomy, for an association rule in the reduced taxonomy, a measure of support is displayed, said support being expressed as the number of items in the intersection of the antecedent and consequent of said association rule over the total number of items, or said measure is not displayed; and in said step of displaying a reduced taxonomy, for an association rule in the reduced taxonomy, a measure of the statistical significance of how the subordinate probability of the consequent of said association rule with respect to the antecedent of said association rule deviates from independence of said consequent and antecedent of said association rule, is displayed or said measure is not displayed.
-
-
15. A method for retrieving information on large heterogeneous databases, wherein information retrieval is performed through visual queries on dynamic taxonomies, said dynamic taxonomies being an organization of concepts that ranges from a most general concept to a most specific concept, said concepts and their generalization or specialization relationships being an intension, items in said databases being able to be classified under different concepts, said items and their classification being called an extension, said method comprising;
-
displaying a taxonomy for said retrieval; selecting a subset of interest of said taxonomy in order to refine said retrieval, said subset of interest being specified by selecting taxonomy concepts and combining the taxonomy concepts through boolean operations or being specified through querying methods, said querying methods retrieving classified items according to different selection criteria; displaying a reduced taxonomy for said subset of interest, said reduced taxonomy being derived from said taxonomy by eliminating from the extension of said taxonomy all items not in said selected subset of interest and by pruning concepts under which no item in said selected subset of interest is classified; and iteratively repeating said steps of selecting a subset of interest and of displaying a reduced taxonomy to further refine said retrieval, wherein; said step of pruning of concepts includes eliminating from the taxonomy all the concepts under which no item in the selected subset of interest is classified, or preventing said concepts from being selected in order to specify interest sets; said step of displaying a reduced taxonomy either reports only the concepts belonging to the reduced taxonomy or, for each such concept also reports how many items in the interest set are classified under the concept; said intension is organized as a hierarchy of concepts or as a directed acyclic graph of concepts, thereby allowing a concept to have multiple fathers; items in said classification are classified manually, programmatically, or automatically; and said method is able to reconstruct relationships among concepts based on the classification, a relationship between any two concepts existing if at least one item is classified (1) under a first concept or any descendants of the first concept, and (2) under a second concept, or any descendants of the second concept.
-
Specification