Method and apparatus for indexing, searching and displaying data
First Claim
1. A method related to cluster analysis of a world wide web having identifiable web pages and hyperlink relationships made up of Universal Resource Locaters with pointers, wherein objects are related to the world wide web, direct non-semantic relationships relate to hyperlink relationships, and indirect non-semantic relationships relate to a series of hyperlink relationships between objects, comprising:
- crawling webpages on the world wide web for information used to define a set of objects to be indexed and to collect information about the direct non-semantic relationships, wherein Universal Resource Locators that either point to or point away from one or more of the web pages are crawled;
defining the set of objects to be indexed, wherein each object in the set of objects has an identification and wherein a plurality of the objects in the set of objects have direct and indirect non-semantic relationships;
generating, using a computer processor, a numerical representation for the set of objects in the form of a series of arrays representing each of said objects in the set of objects based upon each of said object'"'"'s direct non-semantic relationships, if any, with other of said objects in the set of objects,wherein generating the numerical representation for the set of objects accounts for a plurality of direct non-semantic relationships and includes quantifying the accounted for direct non-semantic relationships, wherein the quantifying includes weighting some of the direct non-semantic relationships differently than others;
generating, using a computer processor, a scalar value for each of said objecting the set of objects, wherein said scalar value accounts for direct and indirect non-semantic relationships that exist with other said objects in the set of objects and generating the scalar value includes;
quantifying, for each of said objects in the set of objects that has one or more of the indirect non-semantic relationships, said object'"'"'s indirect non-semantic relationships with other objects in the set of objects, wherein a.) some of the indirect non-semantic relationships contribute greater value to the scalar value than others, b.) a plurality of different types of indirect relationships, when present, contribute to the scalar value, and c.) quantifying said object'"'"'s indirect non-semantic relationships includes accounting for at least the following three indirect non-semantic relationship patterns for a given object A when present;
i) B cites f and f cites A,ii) B cites f, f cites e, and e cites A, andiii) B cites f, f cites e, e cites d, and d cites A, wherein B, d, e, and fare objects in the set of objects and said accounting for indirect non-semantic relationships uses weights that are calculated using one or more of said objects'"'"'quantity of outbound direct relationships;
storing the generated scalar values in one or more computer memories as an index;
receiving search commands wherein the search commands are received from an input device, wherein the received search commands include one or more search terms;
identifying a resultant set of said objects that are associated with one or more search terms using at least a word index and the received search commands;
determining a rank for objects in the resultant set of objects using said scalar values as a factor in determining the rank; and
sending, for use by a display device, information for displaying identities of two or more objects in the resultant set of objects using the rank as a factor in determining an order of display.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer research tool for indexing, searching and displaying data is disclosed. Specifically, a computer research tool for performing computerized research of data including textual objects in a database or a network and for providing a user interface that significantly enhances data presentation is described. Textual objects and other data in a database or network is indexed by creating a numerical representation of the data. The indexing technique called proximity indexing generates a quick-reference of the relations, patterns and similarity found among the data in the database. Proximity indexing indexes the data by using statistical techniques and empirically developed algorithms.
-
Citations
22 Claims
-
1. A method related to cluster analysis of a world wide web having identifiable web pages and hyperlink relationships made up of Universal Resource Locaters with pointers, wherein objects are related to the world wide web, direct non-semantic relationships relate to hyperlink relationships, and indirect non-semantic relationships relate to a series of hyperlink relationships between objects, comprising:
-
crawling webpages on the world wide web for information used to define a set of objects to be indexed and to collect information about the direct non-semantic relationships, wherein Universal Resource Locators that either point to or point away from one or more of the web pages are crawled; defining the set of objects to be indexed, wherein each object in the set of objects has an identification and wherein a plurality of the objects in the set of objects have direct and indirect non-semantic relationships; generating, using a computer processor, a numerical representation for the set of objects in the form of a series of arrays representing each of said objects in the set of objects based upon each of said object'"'"'s direct non-semantic relationships, if any, with other of said objects in the set of objects, wherein generating the numerical representation for the set of objects accounts for a plurality of direct non-semantic relationships and includes quantifying the accounted for direct non-semantic relationships, wherein the quantifying includes weighting some of the direct non-semantic relationships differently than others; generating, using a computer processor, a scalar value for each of said objecting the set of objects, wherein said scalar value accounts for direct and indirect non-semantic relationships that exist with other said objects in the set of objects and generating the scalar value includes; quantifying, for each of said objects in the set of objects that has one or more of the indirect non-semantic relationships, said object'"'"'s indirect non-semantic relationships with other objects in the set of objects, wherein a.) some of the indirect non-semantic relationships contribute greater value to the scalar value than others, b.) a plurality of different types of indirect relationships, when present, contribute to the scalar value, and c.) quantifying said object'"'"'s indirect non-semantic relationships includes accounting for at least the following three indirect non-semantic relationship patterns for a given object A when present; i) B cites f and f cites A, ii) B cites f, f cites e, and e cites A, and iii) B cites f, f cites e, e cites d, and d cites A, wherein B, d, e, and fare objects in the set of objects and said accounting for indirect non-semantic relationships uses weights that are calculated using one or more of said objects'"'"'quantity of outbound direct relationships; storing the generated scalar values in one or more computer memories as an index; receiving search commands wherein the search commands are received from an input device, wherein the received search commands include one or more search terms; identifying a resultant set of said objects that are associated with one or more search terms using at least a word index and the received search commands; determining a rank for objects in the resultant set of objects using said scalar values as a factor in determining the rank; and sending, for use by a display device, information for displaying identities of two or more objects in the resultant set of objects using the rank as a factor in determining an order of display. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A data processing system for use in cluster analysis of a world wide web having identifiable web pages and hyperlink relationships made up of Universal Resource Locaters with pointers, wherein objects are related to the world wide web, direct non-semantic relationships relate to hyperlink relationships, and indirect non-semantic relationships relate to a series of hyperlink relationships between objects, comprising:
-
one or more computer processors for producing results and sending results for display configured to execute instructions to; control a crawl of the web pages for Universal Resource Locators, wherein the Universal Resource Locators either point to or point away from one or more of the web pages; define, using an identification, each object in a first set objects, wherein the objects are individually identified by the identification; generate a numerical representation of direct non-semantic relationships in the first set of objects wherein the numerical representation quantifies into a plurality of numbers the direct non-semantic relationships between objects in the first set of objects and the quantification of the direct non-semantic relationships includes according a weight using an objects quantity of outbound direct non-semantic relationships; calculate a scalar value for each object in the first set of objects, the scalar value valuing the direct and indirect non-semantic relationships existing among objects, wherein the calculation values at least the following indirect non-semantic relationship patterns for a given object A when present; i) B cites f and f cites A, ii) B cites f, f cites e, and e cites A, and iii) B cites f, f cites e, e cites d, and d cites A, wherein B, d, e and f are objects in the first set of objects and certain indirect non-semantic relationships contribute greater value to the scalar value than other indirect non-semantic relationships; receive search input including one or more search terms emanating from an input device; identify, using the one or more search terms, a word index and the assigned identifiers, a second set of objects, wherein the second set of objects is a subset of the first set of objects having fewer objects; rank a plurality of the objects in the second set of objects, wherein the scalar value is used as a factor in performing the ranking; and send search results about the ranked objects for display, wherein identifying information for two or more of the ranked objects is sent to be displayed in a ranked order display; one or more computer memory devices that store data including; the identification for each object in the first set of objects, the numerical representation, the scalar values, and the word index, wherein the scalar values are stored in the one or more computer memory devices before the one or more computer processors process the received search input; and a network for use by the one or more computer processors. - View Dependent Claims (19, 20, 21, 22)
-
Specification