Methods and apparatus for author identification of search results
First Claim
1. A method for representing, and operating upon, one or more sets of author-identifiers, comprising:
- producing, as a result of computing hardware and programmable memory, a first search result of a first database, wherein the first database has been determined to have a first maximum number of unique author-identifiers;
accepting, as a result of computing hardware and programmable memory, the first search result, wherein each record of the first search result contains content and an author-identifier;
hashing, as a result of computing hardware and programmable memory, a first author-identifier, of a first record of the first search result, to produce a first hash value, wherein, for at least the first maximum number of unique author-identifiers, each author-identifier produces a hash value different from a hash value produced by any other author-identifier;
addressing, as a result of computing hardware and programmable memory, a first location of a first memory, with the first hash value, wherein the first memory has, for each possible hash value, a different addressable location;
storing, as a result of computing hardware and programmable memory, at the first location a first value, wherein the first value is indicative of the first author-identifier being present within the first search result;
performing additional steps of hashing, addressing, and storing, upon additional records of the first search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the first search result, is represented by a unique location in the first memory;
performing steps of hashing, addressing, and storing, upon records of a second search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the second search result, is represented by a unique location in a second memory;
comparing the first location of the first memory to a corresponding second location of the second memory, for purposes of determining whether the first and second sets of author-identifiers intersect with respect to the first author-identifier;
setting a third location of a third memory, corresponding to the first and second locations, to indicate inclusion of the first author-identifier in a third set of author-identifiers, if the comparison indicates an intersection;
performing an additional comparison for each additional pair of locations, a pair chosen from the first and second memories because its locations represent a common author-identifier, for purposes of determining whether a pair indicates an intersection, for the common author-identifier, in the respective first and second sets of author-identifiers;
selecting, for each additional pair of locations indicating intersection, a result location of the third memory, the result location chosen if it is representative of a common author-identifier, for a pair of locations indicating intersection;
storing, at each selected result location, an indication that its author-identifier is to be included in the third set of author-identifiers;
determining a first audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
determining a second audience size, by counting a number of indicators of author-identifier inclusion, within the memory;
determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory;
determining a first measure of overlap, by dividing the first affinity size by the first audience size; and
determining a second measure of overlap, by dividing the first affinity size by the second audience size.
11 Assignments
0 Petitions
Accused Products
Abstract
Given a search result, the set of authors-identifiers appearing in it can be determined by use of a hash function, and an array-type data structure called an audience fingerprint (AF). The AF has as many storage locations as the hash function has possible output values. The number of possible output values is chosen to be large enough, with respect to the maximum number of unique authors expected in any one search result, to create a very high probability of a unique output value for each unique author-identifier that is hashed. At the AF location, addressed with a hash value, is stored an indicator that the author-identifier is present. The indicator can be a single bit, simplifying set operations on AFs. When not in working memory, an AF can be stored as a compacted sparse array. The actual author-identifiers present can be determined, from an AF, with an inverse hash function.
26 Citations
27 Claims
-
1. A method for representing, and operating upon, one or more sets of author-identifiers, comprising:
-
producing, as a result of computing hardware and programmable memory, a first search result of a first database, wherein the first database has been determined to have a first maximum number of unique author-identifiers; accepting, as a result of computing hardware and programmable memory, the first search result, wherein each record of the first search result contains content and an author-identifier; hashing, as a result of computing hardware and programmable memory, a first author-identifier, of a first record of the first search result, to produce a first hash value, wherein, for at least the first maximum number of unique author-identifiers, each author-identifier produces a hash value different from a hash value produced by any other author-identifier; addressing, as a result of computing hardware and programmable memory, a first location of a first memory, with the first hash value, wherein the first memory has, for each possible hash value, a different addressable location; storing, as a result of computing hardware and programmable memory, at the first location a first value, wherein the first value is indicative of the first author-identifier being present within the first search result; performing additional steps of hashing, addressing, and storing, upon additional records of the first search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the first search result, is represented by a unique location in the first memory; performing steps of hashing, addressing, and storing, upon records of a second search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the second search result, is represented by a unique location in a second memory; comparing the first location of the first memory to a corresponding second location of the second memory, for purposes of determining whether the first and second sets of author-identifiers intersect with respect to the first author-identifier; setting a third location of a third memory, corresponding to the first and second locations, to indicate inclusion of the first author-identifier in a third set of author-identifiers, if the comparison indicates an intersection; performing an additional comparison for each additional pair of locations, a pair chosen from the first and second memories because its locations represent a common author-identifier, for purposes of determining whether a pair indicates an intersection, for the common author-identifier, in the respective first and second sets of author-identifiers; selecting, for each additional pair of locations indicating intersection, a result location of the third memory, the result location chosen if it is representative of a common author-identifier, for a pair of locations indicating intersection; storing, at each selected result location, an indication that its author-identifier is to be included in the third set of author-identifiers; determining a first audience size, by counting a number of indicators of author-identifier inclusion, within the first memory; determining a second audience size, by counting a number of indicators of author-identifier inclusion, within the memory; determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory; determining a first measure of overlap, by dividing the first affinity size by the first audience size; and determining a second measure of overlap, by dividing the first affinity size by the second audience size. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification