Methods and apparatus for author identification of search results

US 10,380,203 B1
Filed: 05/10/2014
Issued: 08/13/2019
Est. Priority Date: 05/10/2014
Status: Active Grant

First Claim

Patent Images

1. A method for representing, and operating upon, one or more sets of author-identifiers, comprising:

producing, as a result of computing hardware and programmable memory, a first search result of a first database, wherein the first database has been determined to have a first maximum number of unique author-identifiers;

accepting, as a result of computing hardware and programmable memory, the first search result, wherein each record of the first search result contains content and an author-identifier;

hashing, as a result of computing hardware and programmable memory, a first author-identifier, of a first record of the first search result, to produce a first hash value, wherein, for at least the first maximum number of unique author-identifiers, each author-identifier produces a hash value different from a hash value produced by any other author-identifier;

addressing, as a result of computing hardware and programmable memory, a first location of a first memory, with the first hash value, wherein the first memory has, for each possible hash value, a different addressable location;

storing, as a result of computing hardware and programmable memory, at the first location a first value, wherein the first value is indicative of the first author-identifier being present within the first search result;

performing additional steps of hashing, addressing, and storing, upon additional records of the first search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the first search result, is represented by a unique location in the first memory;

performing steps of hashing, addressing, and storing, upon records of a second search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the second search result, is represented by a unique location in a second memory;

comparing the first location of the first memory to a corresponding second location of the second memory, for purposes of determining whether the first and second sets of author-identifiers intersect with respect to the first author-identifier;

setting a third location of a third memory, corresponding to the first and second locations, to indicate inclusion of the first author-identifier in a third set of author-identifiers, if the comparison indicates an intersection;

performing an additional comparison for each additional pair of locations, a pair chosen from the first and second memories because its locations represent a common author-identifier, for purposes of determining whether a pair indicates an intersection, for the common author-identifier, in the respective first and second sets of author-identifiers;

selecting, for each additional pair of locations indicating intersection, a result location of the third memory, the result location chosen if it is representative of a common author-identifier, for a pair of locations indicating intersection;

storing, at each selected result location, an indication that its author-identifier is to be included in the third set of author-identifiers;

determining a first audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;

determining a second audience size, by counting a number of indicators of author-identifier inclusion, within the memory;

determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory;

determining a first measure of overlap, by dividing the first affinity size by the first audience size; and

determining a second measure of overlap, by dividing the first affinity size by the second audience size.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Given a search result, the set of authors-identifiers appearing in it can be determined by use of a hash function, and an array-type data structure called an audience fingerprint (AF). The AF has as many storage locations as the hash function has possible output values. The number of possible output values is chosen to be large enough, with respect to the maximum number of unique authors expected in any one search result, to create a very high probability of a unique output value for each unique author-identifier that is hashed. At the AF location, addressed with a hash value, is stored an indicator that the author-identifier is present. The indicator can be a single bit, simplifying set operations on AFs. When not in working memory, an AF can be stored as a compacted sparse array. The actual author-identifiers present can be determined, from an AF, with an inverse hash function.

26 Citations

27 Claims

1. A method for representing, and operating upon, one or more sets of author-identifiers, comprising:
- producing, as a result of computing hardware and programmable memory, a first search result of a first database, wherein the first database has been determined to have a first maximum number of unique author-identifiers;
  
  accepting, as a result of computing hardware and programmable memory, the first search result, wherein each record of the first search result contains content and an author-identifier;
  
  hashing, as a result of computing hardware and programmable memory, a first author-identifier, of a first record of the first search result, to produce a first hash value, wherein, for at least the first maximum number of unique author-identifiers, each author-identifier produces a hash value different from a hash value produced by any other author-identifier;
  
  addressing, as a result of computing hardware and programmable memory, a first location of a first memory, with the first hash value, wherein the first memory has, for each possible hash value, a different addressable location;
  
  storing, as a result of computing hardware and programmable memory, at the first location a first value, wherein the first value is indicative of the first author-identifier being present within the first search result;
  
  performing additional steps of hashing, addressing, and storing, upon additional records of the first search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the first search result, is represented by a unique location in the first memory;
  
  performing steps of hashing, addressing, and storing, upon records of a second search result, such that, for at least the first maximum number, each unique author-identifier, appearing in the second search result, is represented by a unique location in a second memory;
  
  comparing the first location of the first memory to a corresponding second location of the second memory, for purposes of determining whether the first and second sets of author-identifiers intersect with respect to the first author-identifier;
  
  setting a third location of a third memory, corresponding to the first and second locations, to indicate inclusion of the first author-identifier in a third set of author-identifiers, if the comparison indicates an intersection;
  
  performing an additional comparison for each additional pair of locations, a pair chosen from the first and second memories because its locations represent a common author-identifier, for purposes of determining whether a pair indicates an intersection, for the common author-identifier, in the respective first and second sets of author-identifiers;
  
  selecting, for each additional pair of locations indicating intersection, a result location of the third memory, the result location chosen if it is representative of a common author-identifier, for a pair of locations indicating intersection;
  
  storing, at each selected result location, an indication that its author-identifier is to be included in the third set of author-identifiers;
  
  determining a first audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
  
  determining a second audience size, by counting a number of indicators of author-identifier inclusion, within the memory;
  
  determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory;
  
  determining a first measure of overlap, by dividing the first affinity size by the first audience size; and
  
  determining a second measure of overlap, by dividing the first affinity size by the second audience size.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 2. The method of claim 1, wherein the first search result is a result of searching a social media database.
  - 3. The method of claim 2, wherein the first search result is a result of at least a first query, the first query designed to identify at least part of a first target audience.
  - 4. The method of claim 2, wherein the first search result is a result of at least a first query, the first query designed to identify at least part of a first outlet audience of a first media outlet.
  - 5. The method of claim 1, wherein, for each of the first and second memories, for each location at least one bit of information is stored, and a location with its bit set to a first value indicates that its author-identifier is present.
  - 6. The method of claim 1, further comprising:
    - compacting the first memory, into a first sparse array representation; and
      
      storing the compacted first sparse array in a non-working memory region, of the computing hardware and programmable memory.
  - 7. The method of claim 1, further comprising:
    - storing at the first location, in an inverse-mapping data structure, the first author-identifier.
  - 8. The method of claim 7, further comprising:
    - storing each additional author-identifier, of the first search result, into the inverse-mapping data structure, at a same location as indicated by each additional hashing of an author-identifier.
  - 9. The method of claim 1, further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first target audience;
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of a second outlet audience of a second media outlet;
      
      determining a first target audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
      
      determining a second outlet-audience size, by counting a number of indicators of author-identifier inclusion, within the second memory;
      
      determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory; and
      
      determining a first composition value, by dividing the first affinity size by the second outlet-audience size.
  - 10. The method of claim 1, further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first target audience;
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of a second outlet audience of a second media outlet;
      
      determining a first target audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
      
      determining a second outlet-audience size, by counting a number of indicators of author-identifier inclusion, within the second memory;
      
      determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory; and
      
      determining a first coverage value, by dividing the first affinity size by the first target audience size.
  - 11. The method of claim 1, further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first target audience; and
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of a second target audience.
  - 12. The method of claim 1, further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first outlet-audience of a first media outlet; and
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of a second outlet-audience of a second media outlet.
  - 13. The method of claim 1, further comprising:
    - categorizing the first measure of overlap as high, if it is above a first threshold level;
      
      categorizing the second measure of overlap as high, if it is above a second threshold level; and
      
      categorizing the first and second audiences as having a high level of overlap, if the first and second measures of overlap are categorized as high.
  - 14. The method of claim 1, further comprising:
    - comparing the first location of the first memory to the corresponding second location of the second memory, for purposes of determining a first result as a logical AND of the first and second sets of author-identifiers, with respect to the first author-identifier; and
      
      setting the third location of a third memory, corresponding to the first and second locations, to indicate according to the first result.
  - 15. The method of claim 1, further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first target audience;
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of a second outlet audience of a second media outlet;
      
      determining a first target audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
      
      determining a second outlet-audience size, by counting a number of indicators of author-identifier inclusion, within the second memory;
      
      determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory;
      
      determining a first coverage value, by dividing the first affinity size by the first target audience size;
      
      determining a first composition value, by dividing the first affinity size by the second outlet-audience size;
      
      generating a first two-dimensional display, a first dimension representative of a coverage value, and a second dimension representative of a composition value;
      
      mapping a first graphical object onto the first two-dimensional display, using the first coverage value for placement of the first graphical object along the first dimension, and the first composition value for placement of the first graphical object along the second dimension.
  - 16. The method of claim 15, further comprising:
    - parameterizing at least one dimension of the first graphical object, in accordance with a first function that accepts the first target audience size as an input.
  - 17. The method of claim 15, further comprising:
    - parameterizing at least one dimension of the first graphical object, in accordance with a first function that accepts the first outlet-audience size as an input.
  - 18. The method of claim 1, further comprising:
    - determining a first audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
      
      determining a second audience size, by counting a number of indicators of author-identifier inclusion, within the second memory;
      
      determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the third memory;
      
      determining a first overlap characterization value, by dividing the first affinity size by the first audience size;
      
      determining a second overlap characterization value, by dividing the first affinity size by the second audience size;
      
      generating a first two-dimensional display, a first dimension representative of a first overlap characterization metric and a second dimension representative of a second overlap characterization metric; and
      
      mapping a first graphical object onto the first two-dimensional display, using the first overlap characterization value for placement of the first graphical object along the first dimension, and the second overlap characterization value for placement of the first graphical object along the second dimension.
  - 19. The method of claim 1, further comprising:
    - representing, on a two-dimensional display, the first set of author-identifiers as a first graphical object;
      
      representing, on a two-dimensional display, the second set of author-identifiers as a second graphical object;
      
      simulating a repulsive force, as acting between the first and second graphical objects;
      
      representing, on a two-dimensional display, the third set of author-identifiers as a third graphical object, the third graphical object appearing to connect the first and second graphical objects; and
      
      simulating the third graphical object as exerting an attractive force, acting to bring into closer proximity the first and second graphical objects.
  - 20. The method of claim 19, further comprising:
    - determining a first size of the first set of author-identifiers;
      
      determining a second size of the second set of author-identifiers;
      
      scaling at least one dimension of the first graphical object, as a function of the first size; and
      
      scaling at least one dimension of the second graphical object, as a function of the second size.
  - 21. The method of claim 19, further comprising:
    - determining a third size of the third set of author-identifiers; and
      
      scaling the attractive force, as a function of the third size.
  - 22. The method of claim 19, further comprising:
    - determining a third size of the third set of author-identifiers; and
      
      scaling at least one dimension of the third graphical object, as a function of the third size.
  - 23. The method of claim 1 further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first type of audience during a first time period;
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of a second type of audience during a first time period;
      
      producing a third search result as a result of at least a third query, the third query designed to identify at least part of a first type of audience during a second time period;
      
      producing a fourth search result as a result of at least a fourth query, the second query designed to identify at least part of a second type of audience during a second time period;
      
      performing a comparison for each pair of locations, a pair chosen from the first and second memories because its locations represent a common author-identifier, for purposes of determining whether a pair indicates an intersection, for the common author-identifier, in the respective first and second sets of author-identifiers;
      
      selecting, for each pair of locations indicating intersection, a first result location of a fifth memory, the first result location chosen if it is representative of a common author-identifier, for a pair of locations indicating intersection;
      
      storing, at each selected first result location, an indication that its author-identifier is to be included in the fifth set of author-identifiers;
      
      performing steps of hashing, addressing, and storing, upon records of a third search result, such that a third set of author identifiers, as represented by a third memory, contains an indicator for each author-identifier of the third search result;
      
      performing steps of hashing, addressing, and storing, upon records of a fourth search result, such that a fourth set of author identifiers, as represented by a fourth memory, contains an indicator for each author-identifier of the fourth search result;
      
      performing a comparison for each pair of locations, a pair chosen from the third and fourth memories because its locations represent a common author-identifier, for purposes of determining whether a pair indicates an intersection, for the common author-identifier, in the respective third and fourth sets of author-identifiers;
      
      selecting, for each pair of locations indicating intersection, a second result location of a sixth memory, the second result location chosen if it is representative of a common author-identifier, for a pair of locations indicating intersection;
      
      storing, at each selected second result location, an indication that its author-identifier is to be included in the sixth set of author-identifiers;
      
      determining a first audience size, by counting a number of indicators of author-identifier inclusion, within the first memory;
      
      determining a second audience size, by counting a number of indicators of author-identifier inclusion, within the second memory;
      
      determining a third audience size, by counting a number of indicators of author-identifier inclusion, within the third memory;
      
      determining a fourth audience size, by counting a number of indicators of author-identifier inclusion, within the fourth memory;
      
      determining a first affinity size, by counting a number of indicators of author-identifier inclusion, within the fifth memory;
      
      determining a second affinity size, by counting a number of indicators of author-identifier inclusion, within the sixth memory;
      
      determining a first overlap characterization value of a first type, by dividing the first affinity size by the first audience size; and
      
      determining a second overlap characterization value of the first type, by dividing the second affinity size by the third audience size.
  - 24. The method of claim 23, further comprising:
    - generating a first two-dimensional display, a first dimension representative of the first type of overlap characterization metric and a second dimension representative of time;
      
      mapping a first graphical object onto the first two-dimensional display, using the first overlap characterization value for placement of the first graphical object along the first dimension, and the first time period for placement of the first graphical object along the second dimension; and
      
      mapping a second graphical object onto the first two-dimensional display, using the second overlap characterization value for placement of the second graphical object along the first dimension, and the second time period for placement of the second graphical object along the second dimension.
  - 25. The method of claim 23, further comprising:
    - determining a first overlap characterization value of a second type, by dividing the first affinity size by the second audience size;
      
      determining a second overlap characterization value of the second type, by dividing the second affinity size by the fourth audience size;
      
      generating a first two-dimensional display, a first dimension representative of a first type of overlap characterization metric and a second dimension representative of a second type of overlap characterization metric;
      
      mapping a first graphical object onto the first two-dimensional display, using the first overlap characterization value of the first type, for placement of the first graphical object along the first dimension, and the first overlap characterization value of the second type, for placement of the first graphical object along the second dimension; and
      
      mapping a second graphical object onto the first two-dimensional display, using the second overlap characterization value of the first type, for placement of the first graphical object along the first dimension, and the second overlap characterization value of the second type, for placement of the second graphical object along the second dimension.
  - 26. The method of claim 1, further comprising:
    - producing the first search result as a result of at least a first query, the first query designed to identify at least part of a first type of audience during a first time period;
      
      producing the second search result as a result of at least a second query, the second query designed to identify at least part of the first type of audience during a second time period;
      
      determining a first size, by counting a number of indicators of author-identifier inclusion, within the first memory;
      
      determining a second size, by counting a number of indicators of author-identifier inclusion, within the second memory;
      
      determining a third size, by counting a number of indicators of author-identifier inclusion, within the third memory;
      
      generating, for a first two-dimensional display, a first graphical object with a first area that is a first function of the first size;
      
      generating, for the first two-dimensional display, a second graphical object with a second area that is determined by applying the first function to the second size; and
      
      positioning the first and second graphical objects, such that there is a resulting first region of overlap, wherein the area of the first region of overlap is approximately equal to a third area, the third area determined by applying the first function to the third size.
  - 27. The method of claim 26, further comprising:
    - generating the first graphical object such that its length, along a first dimension, has a first ratio, with respect to the first size;
      
      generating the second graphical object such that its length, along the first dimension, has the first ratio, with respect to the second size; and
      
      positioning the first and second graphical objects, such that the length of the resulting first region of overlap, has the first ratio, with respect to the third size.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Netbase Solutions, Inc.
Original Assignee
Netbase Solutions, Inc.
Inventors
Bowles, Mark Edward, Tellefsen, Jens Erik, Bhatia, Ranjeet Singh
Primary Examiner(s)
Uddin, Md I

Application Number

US14/274,721
Time in Patent Office

1,921 Days
Field of Search

707706, 707722, 707749, 707736, 707748
US Class Current
CPC Class Codes

G06F 16/9014 hash tables

G06F 16/9535 Search customisation based ...

Methods and apparatus for author identification of search results

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

27 Claims

Specification

Use Cases

Quick Links

Others

Methods and apparatus for author identification of search results

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

27 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others