HARDWARE ACCELERATOR ARCHITECTURE AND TEMPLATE FOR WEB-SCALE K-MEANS CLUSTERING

US 20180189675A1
Filed: 12/31/2016
Published: 07/05/2018
Est. Priority Date: 12/31/2016
Status: Abandoned Application

First Claim

Patent Images

1. A hardware accelerator comprising:

one or more sparse tiles to execute operations for a clustering task involving a matrix, each of the sparse tiles comprising a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the one or more sparse tiles over a high bandwidth interface from a first memory unit; and

one or more very/hyper sparse tiles to execute operations for the clustering task involving the matrix, each of the very/hyper sparse tiles comprising a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from a second memory unit.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Hardware accelerator architectures for clustering are described. A hardware accelerator includes sparse tiles and very/hyper sparse tiles. The sparse tile(s) execute operations for a clustering task involving a matrix. Each sparse tile includes a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the sparse tiles over a high bandwidth interface from a first memory unit. Each of the very/hyper sparse tiles are to execute operations for the clustering task involving the matrix. Each of the very/hyper sparse tiles includes a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from a second memory unit.

66 Citations

20 Claims

1. A hardware accelerator comprising:
- one or more sparse tiles to execute operations for a clustering task involving a matrix, each of the sparse tiles comprising a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the one or more sparse tiles over a high bandwidth interface from a first memory unit; and
  
  one or more very/hyper sparse tiles to execute operations for the clustering task involving the matrix, each of the very/hyper sparse tiles comprising a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from a second memory unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The hardware accelerator of claim 1, further comprising a control unit to:
    - determine that the clustering task involving the matrix is to be performed; and
      
      partition the matrix into the first plurality of blocks and the second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the data that are very-sparse or hyper-sparse.
  - 3. The hardware accelerator of claim 2, wherein the control unit is further to:
    - cause the one or more sparse tiles to execute the operations using the first plurality of blocks and further cause the one or more very/hyper sparse tiles to execute the operations using the second plurality of blocks.
  - 4. The hardware accelerator of claim 1, wherein the one or more sparse tiles, to execute the operations, are to:
    - update center values within one or more random access memories of the one or more sparse tiles.
  - 5. The hardware accelerator of claim 4, wherein the one or more sparse tiles, to execute the operations, are further to:
    - stream, by one or more data management units of the one or more sparse tiles, values of a plurality of rows of the matrix over the high bandwidth interface from the first memory unit to local memories of the first plurality of processing elements.
  - 6. The hardware accelerator of claim 5, wherein the one or more sparse tiles, to execute the operations, are further to:
    - execute, by the first plurality of processing elements, a plurality of distance calculations using at least some of the streamed values and a clustering computation subsystem that is separate from the one or more sparse tiles.
  - 7. The hardware accelerator of claim 5, wherein the one or more sparse tiles, to execute the operations, are further to:
    - execute, by the first plurality of processing elements, one or more scale-update operations using the center values.
  - 8. The hardware accelerator of claim 1, wherein the one or more very/hyper sparse tiles, to execute the operations, are to:
    - update, during the operations, center values within the second memory unit over the low-latency interface.
  - 9. The hardware accelerator of claim 8, wherein the one or more very/hyper sparse tiles, to execute the operations, are further to:
    - retrieve, by one or more data management units of the one or more very/hyper sparse tiles through use of random access requests, values of a plurality of rows of the matrix over the low-latency interface from the second memory unit.
  - 10. The hardware accelerator of claim 1, wherein each of the one or more very/hyper sparse tiles and each of the one or more sparse tiles, while executing the respective operations, are to:
    - provide partial distance values to a clustering computation subsystem that is separate from the one or more sparse tiles and separate from the one or more very/hyper sparse tiles; and
      
      obtain nearest cluster identifiers from the clustering computation subsystem.

11. A method in a hardware accelerator for efficiently executing clustering comprising:
- executing, by one or more sparse tiles of the hardware accelerator, operations for a clustering task involving a matrix, each of the sparse tiles comprising a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the one or more sparse tiles over a high bandwidth interface from a first memory unit; and
  
  executing, by one or more very/hyper sparse tiles of the hardware accelerator, operations for the clustering task involving the matrix, each of the very/hyper sparse tiles comprising a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from a second memory unit.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11, further comprising:
    - determining, by the hardware accelerator, that the clustering task involving a matrix is to be performed; and
      
      partitioning, by the hardware accelerator, the matrix into the first plurality of blocks and the second plurality of blocks, wherein the first plurality of blocks includes one or more sections of the matrix that are sparse, and wherein the second plurality of blocks includes another one or more sections of the matrix that are very- or hyper-sparse.
  - 13. The method of claim 12, further comprising:
    - causing the one or more sparse tiles of the hardware processor to perform the operations using the first plurality of blocks and further causing the one or more very/hyper sparse tiles of the hardware processor to perform the operations using the second plurality of blocks.
  - 14. The method of claim 11, wherein executing the operations comprises:
    - updating, by the first plurality of processing elements of each of the one or more sparse tiles, center values within one or more random access memories of the one or more sparse tiles.
  - 15. The method of claim 14, wherein executing the operations further comprises:
    - streaming, by one or more data management units of the one or more sparse tiles, values of a plurality of rows of the matrix over the high bandwidth interface from the first memory unit to local memories of the first plurality of processing elements.
  - 16. The method of claim 15, wherein executing the operations further comprises:
    - executing, by the first plurality of processing elements of each of the one or more sparse tiles, a plurality of distance calculations using at least some of the streamed values and a clustering computation subsystem that is separate from the one or more sparse tiles.
  - 17. The method of claim 15, wherein executing the operations further comprises:
    - executing, by the first plurality of processing elements of each of the one or more sparse tiles, one or more scale-update operations using the center values.
  - 18. The method of claim 11, wherein executing the operations comprises:
    - updating, by the second plurality of processing elements of each of the one or more very/hyper sparse tiles, center values within the second memory unit over the low-latency interface.
  - 19. The method of claim 18, wherein executing the operations further comprises:
    - retrieving, by one or more data management units of the one or more very/hyper sparse tiles through use of random access requests, values of a plurality of rows of the matrix over the low-latency interface from the second memory unit.

20. A system comprising:
- a first memory unit;
  
  a second memory unit;
  
  one or more sparse tiles to execute operations for a clustering task involving a matrix, each of the sparse tiles comprising a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the one or more sparse tiles over a high bandwidth interface from the first memory unit; and
  
  one or more very/hyper sparse tiles to execute operations for the clustering task involving the matrix, each of the very/hyper sparse tiles comprising a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from the second memory unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors

Application Number

US15/396,515
Publication Number

US 20180189675A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/2237   Vectors, bitmaps or matrices

G06F 16/285   Clustering or classification

G06F 17/16   Matrix or vector computatio...

G06N 20/00   Machine learning

HARDWARE ACCELERATOR ARCHITECTURE AND TEMPLATE FOR WEB-SCALE K-MEANS CLUSTERING

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

66 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

HARDWARE ACCELERATOR ARCHITECTURE AND TEMPLATE FOR WEB-SCALE K-MEANS CLUSTERING

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links