DISTRIBUTED INDEX DATA STRUCTURE

US 20100114970A1
Filed: 10/31/2008
Published: 05/06/2010
Est. Priority Date: 10/31/2008
Status: Abandoned Application

First Claim

Patent Images

1. A method for use in forming a computer generated distributed index data structure, wherein said distributed index data structure is distributed among a set of two or more processors, the method comprising:

determining two or more global cluster centers based at least in part on at least a portion of a set of data objects distributed to two or more processors;

determining two or more global pivots based at least in part on at least a portion of said set of data objects distributed to two or more processors;

associating one or more data objects with a given cluster center of said two or more global cluster centers, wherein said given cluster center may be associated based at least in part on a closeness determination between said one or more data objects and said two or more global cluster centers; and

determining a table containing distances between one or more of said global pivots and said data objects associated with said given global cluster center.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter disclosed herein relates to forming a computer generated distributed index data structure.

Citations

20 Claims

1. A method for use in forming a computer generated distributed index data structure, wherein said distributed index data structure is distributed among a set of two or more processors, the method comprising:
- determining two or more global cluster centers based at least in part on at least a portion of a set of data objects distributed to two or more processors;
  
  determining two or more global pivots based at least in part on at least a portion of said set of data objects distributed to two or more processors;
  
  associating one or more data objects with a given cluster center of said two or more global cluster centers, wherein said given cluster center may be associated based at least in part on a closeness determination between said one or more data objects and said two or more global cluster centers; and
  
  determining a table containing distances between one or more of said global pivots and said data objects associated with said given global cluster center.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein said two or more global cluster centers are shared among two or more processors of said set of processors, and wherein said two or more global pivots are shared among two or more processors of said set of processors.
  - 3. The method of claim 1, wherein said determining two or more global pivots comprises:
    - determining one or more candidate centers based at least in part on local data objects associated with a given processor of said set of two or more processors, wherein said local data objects comprise a subset of said set of data objects distributed to said given processor;
      
      sending said one or more candidate centers from said given processor to one or more of said set of two or more processors;
      
      receiving one or more additional candidate centers from one or more of said set of two or more processors; and
      
      selecting said two or more global cluster centers from said one or more candidate centers and/or from said one or more additional candidate centers based at least in part on a sum of distances among said one or more candidate centers and said one or more additional candidate centers.
  - 4. The method of claim 1, wherein said determining two or more global pivots comprises:
    - determining one or more candidate pivots based at least in part on local data objects associated with a given processor of said set of two or more processors, wherein said local data objects comprise a subset of said set of data objects distributed to said given processor;
      
      sending said one or more candidate pivots from said given processor to one or more of said set of two or more processors;
      
      receiving one or more additional candidate pivots from one or more of said set of two or more processors;
      
      selecting said two or more global pivots from said one or more candidate pivots and/or from said one or more additional candidate pivots.
  - 5. The method of claim 1, wherein said table comprises a local table based at least in part on local data objects associated with a given processor, wherein said local data objects comprise a subset of said set of data objects distributed to said given processor.
  - 6. The method of claim 1, further comprising:
    - arranging two or more columns of said table based at least in part on a cumulative sum of said distances between said global pivots and said data objects associated with individual columns, wherein columns in said table are associated with respective global pivots and rows in said table are associated with respective data objects; and
      
      arranging two or more rows of said table based at least in part on said distances between said global pivots and said data objects associated with a given column of said two or more columns, and wherein said given column has the lowest cumulative sum of said distances among said two or more columns.
  - 7. The method of claim 1, further comprising:
    - determining a set of one or more adjacent rows in said table with which to restrict a search for data objects corresponding to a search query, wherein said determination is based at least in part on a single column of said table, wherein columns in said table are associated with respective global pivots and rows in said table are associated with respective data objects; and
      
      determining one or more rows from said one or more adjacent rows which to restrict said a search for data objects corresponding to a search query.
  - 8. The method of claim 1, wherein said data objects comprise complex data objects.
  - 9. The method of claim 1, further comprising:
    - receiving a search query at a given processor of said set of two or more processors;
      
      sending a query plan from said given processor to at least a portion of said set of two or more processors, wherein said query plan indicates one or more clusters to be analyzed and distances between said search query to said two or more global pivots, wherein said clusters comprise portions of said set of data objects associated with respective global cluster centers; and
      
      processing said query plan by at least a portion of said set of two or more processors.
  - 10. The method of claim 1, further comprising:
    - receiving a search query at a given processor of said set of two or more processors;
      
      sending a query plan from said given processor to at least a portion of said set of two or more processors;
      
      processing said query plan by at least a portion of said set of two or more processors; and
      
      selectively switching between processing a second search query and said search query based, at least in part, on a renewable number of computations and/or communications allocated to said search query.
  - 11. The method of claim 1, further comprising:
    - receiving a search query at a given processor of said set of two or more processors;
      
      sending a query plan from said given processor to at least a portion of said set of two or more processors;
      
      selecting between synchronous-type parallel computing and asynchronous-type parallel computing based at least in part on a level of query traffic; and
      
      processing said query plan by at least a portion of said set of two or more processors based at least in part on synchronous-type parallel computing or asynchronous-type parallel computing.
  - 12. The method of claim 1, further comprising:
    - determining one or more candidate centers based at least in part on local data objects associated with a given processor of said set of two or more processors, wherein said local data objects comprise a subset of said set of data objects distributed to said given processor;
      
      sending said one or more candidate centers from said given processor to one or more of said set of two or more processors;
      
      receiving one or more additional candidate centers from one or more of said set of two or more processors;
      
      selecting said two or more global cluster centers from said one or more candidate centers and/or from said one or more additional candidate centers based at least in part on a sum of distances among said one or more candidate centers and said one or more additional candidate centers;
      
      determining one or more candidate pivots based at least in part on local data objects associated with a given processor of said set of two or more processors, wherein said local data objects comprise a subset of said set of data objects distributed to said given processor;
      
      sending said one or more candidate pivots from said given processor to one or more of said set of two or more processors;
      
      receiving one or more additional candidate pivots from one or more of said set of two or more processors;
      
      selecting said two or more global pivots from said one or more candidate pivots and/or from said one or more additional candidate pivots;
      
      wherein said two or more global cluster centers are shared among two or more processors of said set of processors, and wherein said two or more global pivots are shared among two or more processors of said set of processors;
      
      wherein said table comprises a local table based at least in part on local data objects associated with a given processor, wherein said local data objects comprise a subset of said set of data objects distributed to said given processor; and
      
      wherein said data objects comprise complex data objects.

13. An article comprising:
- a computer-readable medium comprising computer-readable instructions stored thereon, which, if executed by one or more processors, operatively enable a computing platform to;
  
  form a computer generated distributed index data structure, wherein said distributed index data structure is distributed among a set of two or more processors, comprising;
  
  determine two or more global cluster centers based at least in part on at least a portion of a set of data objects distributed to two or more processors;
  
  determine two or more global pivots based at least in part on at least a portion of said set of data objects distributed to two or more processors;
  
  associate one or more data objects with a given cluster center of said two or more global cluster centers, wherein said given cluster center may be associated based at least in part on a closeness determination between said one or more data objects and said two or more global cluster centers; and
  
  determine a table containing distances between one or more of said global pivots and said data objects associated with said given global cluster center.
- View Dependent Claims (14, 15, 16)
- - 14. The article of claim 13, wherein said computer-readable instructions, if executed by the one or more processors, operatively enable the computing platform to:
    - arrange two or more columns of said table based at least in part on a cumulative sum of said distances between said global pivots and said data objects associated with individual columns, wherein columns in said table are associated with respective global pivots and rows in said table are associated with respective data objects; and
      
      arrange two or more rows of said table based at least in part on said distances between said global pivots and said data objects associated with a given column of said two or more columns, and wherein said given column has the lowest cumulative sum of said distances among said two or more columns.
  - 15. The article of claim 13, wherein said computer-readable instructions, if executed by the one or more processors, operatively enable the computing platform to:
    - determine a set of one or more adjacent rows in said table with which to restrict a search for data objects corresponding to a search query, wherein said determination is based at least in part on a single column of said table, wherein columns in said table are associated with respective global pivots and rows in said table are associated with respective data objects; and
      
      determine one or more rows from said one or more adjacent rows which to restrict said a search for data objects corresponding to a search query.
  - 16. The article of claim 13, wherein said computer-readable instructions, if executed by the one or more processors, operatively enable the computing platform to:
    - receive a search query at a given processor of said set of two or more processors;
      
      send a query plan from said given processor to at least a portion of said set of two or more processors;
      
      select between synchronous-type parallel computing and asynchronous-type parallel computing based at least in part on a level of query traffic; and
      
      process said query plan by at least a portion of said set of two or more processors based at least in part on synchronous-type parallel computing or asynchronous-type parallel computing.

17. An apparatus comprising:
- a computing environment system, said computing environment system being operatively enabled to;
  
  form a computer generated distributed index data structure, wherein said distributed index data structure is distributed among a set of two or more processors, comprising;
  
  determine two or more global cluster centers based at least in part on at least a portion of a set of data objects distributed to two or, more processors;
  
  determine two or more global pivots-based at least in part on at least a portion of said set of data objects distributed to two or more processors;
  
  associate one or more data objects with a given cluster center of said two or more global cluster centers, wherein said given cluster center may be associated based at least in part on a closeness determination between said one or more data objects and said two or more global cluster centers; and
  
  determine a table containing distances between one or more of said global pivots and said data objects associated with said given global cluster center.
- View Dependent Claims (18, 19, 20)
- - 18. The apparatus of claim 17, wherein said computing environment system is further operatively enabled to:
    - arrange two or more columns of said table based at least in part on a cumulative sum of said distances between said global pivots and said data objects associated with individual columns, wherein columns in said table are associated with respective global pivots and rows in said table are associated with respective data objects; and
      
      arrange two or more rows of said table based at least in part on said distances between said global pivots and said data objects associated with a given column of said two or more columns, and wherein said given column has the lowest cumulative sum of said distances among said two or more columns.
  - 19. The apparatus of claim 17, wherein said computing environment system is further operatively enabled to:
    - determine a set of one or more adjacent rows in said table with which to restrict a search for data objects corresponding to a search query, wherein said determination is based at least in part on a single column of said table, wherein columns in said table are associated with respective global pivots and rows in said table are associated with respective data objects; and
      
      determine one or more rows from said one or more adjacent rows which to restrict said a search for data objects corresponding to a search query.
  - 20. The apparatus of claim 17, wherein said computing environment system is further operatively enabled to:
    - receive a search query at a given processor of said set of two or more processors;
      
      send a query plan from said given processor to at least a portion of said set of two or more processors;
      
      select between synchronous-type parallel computing and asynchronous-type parallel computing based at least in part on a level of query traffic; and
      
      process said query plan by at least a portion of said set of two or more processors based at least in part on synchronous-type parallel computing or asynchronous-type parallel computing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oath Inc. (Verizon Communications Inc.)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Marin, Mauricio

Application Number

US12/263,393
Publication Number

US 20100114970A1
Time in Patent Office

Days
Field of Search
US Class Current

707/802
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

DISTRIBUTED INDEX DATA STRUCTURE

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DISTRIBUTED INDEX DATA STRUCTURE

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links