Clustered information processing and searching with structured-unstructured database bridge
First Claim
1. A method for indexing and classifying related information, comprising:
- using a computing system, measuring the similarity of or distance between a plurality of individual resources using a hybrid distance measurement, at least some of the plurality of individual resources being of different types and originating from different data sources;
clustering the plurality of individual resources into a plurality of clusters using the hybrid distance measurement; and
storing the plurality of clusters in both a structured and an unstructured data repository on the computing system or another computing system, such that the structured and the unstructured data repositories contain essentially the same information.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for indexing information and for performing searches are disclosed. In these systems and methods information is “ingested” into the system by clustering the information using a clustering algorithm such as k-means or k-medoids clustering. During the clustering process, a hybrid distance measurement is used that allows the systems and methods to determine similarity across a number of different types of information. Once the information is clustered, it is stored and “mirrored” both in a structured (e.g., relational) data repository and in an unstructured data repository. Methods according to the invention allow the retrieval of both direct search results and search results including related concepts. After clustered information is stored, future searches can be performed by searching the stored results in whichever data repository is most appropriate for the context.
-
Citations
11 Claims
-
1. A method for indexing and classifying related information, comprising:
-
using a computing system, measuring the similarity of or distance between a plurality of individual resources using a hybrid distance measurement, at least some of the plurality of individual resources being of different types and originating from different data sources; clustering the plurality of individual resources into a plurality of clusters using the hybrid distance measurement; and storing the plurality of clusters in both a structured and an unstructured data repository on the computing system or another computing system, such that the structured and the unstructured data repositories contain essentially the same information. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for indexing, searching, and retrieving information, comprising:
-
using a first computing system, measuring the similarity of or distance between a plurality of individual resources using a hybrid distance measurement; clustering the plurality of individual resources into a plurality of clusters using the hybrid distance measurement; storing the plurality of clusters in both a structured and an unstructured data repository on the computing system or another computing system, such that the structured and the unstructured data repositories contain essentially the same information; receiving a query at the first computing system or a second computing system in communication with a first computing system; automatically directing the query to either the structured repository or an unstructured repository; searching the structured or the unstructured repositories in accordance with said automatically directing; and returning a result set to the query including at least a portion of the contents of at least one of the clusters. - View Dependent Claims (8, 9, 10, 11)
-
Specification