Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • work in progress.

NOTE: All evaluation include the load time and save time on HDFS/Hbase.

Hama 0.19

Matrix Multiplication

Node

Size

Density

Blocking Job

Multiplication

5

5,000 * 5,000

100%

21mins, 32sec

13mins, 17sec

Node

Size

Density

Blocking Job

Multiplication

10

5,000 * 5,000

100%

13mins, 31sec

5mins, 25sec

Matrix Transpose

Node

Size

Density

Running for

5

5,000 * 5,000

100%

19mins, 48sec

Node

Size

Density

Running for

10

5,000 * 5,000

100%

12mins, 24sec

Matrix Norm

Node

Type

Size

Density

Running for

5

One

5,000 * 5,000

100%

9mins, 32sec

5

Infinity

5,000 * 5,000

100%

8mins, 11sec

5

Maxvalue

5,000 * 5,000

100%

8mins, 58sec

5

Frobenius

5,000 * 5,000

100%

8mins, 42sec

Node

Type

Size

Density

Running for

10

One

5,000 * 5,000

100%

2mins, 35sec

10

Infinity

5,000 * 5,000

100%

2mins, 19sec

10

Maxvalue

5,000 * 5,000

100%

2mins, 22sec

10

Frobenius

5,000 * 5,000

100%

2mins, 25sec

Benchmarks

This performance contains data load and export operations.

Dependencies Information :

  • Hadoop 0.18.2
  • Hbase 0.18.1

Hardware Information :

  • 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
  • Dense matrix add
  • Dense matrix multiply

NOTE that 10,000 by 10,000 matrix takes 800MB and 1 hour on single node.

Version

Operation

Cluster Size

Rows

Columns

Total Maps

Total Reduces

Time (seconds)

Bytes Written

Trunk 712655

Add

2 node

1,000

1,000

2

2

17 seconds

66,326,104

Trunk 712658

Mult

2 node

300

300

2

2

181 seconds

5,929,512

Version

Operation

Cluster Size

Rows

Columns

Total Maps

Total Reduces

Time (seconds)

Bytes Read

Bytes Written

Trunk 718158

Mult

2 node

300

300

2

2

12 seconds

1,464,484

2,929,092

Trunk 720735

Mult

2 node

1,000

1,000

2

2

20 seconds

16,166,452

32,333,028

Trunk 722320

Mult

2 node

3,000

3,000

4

2

124 seconds

590,672,392

872,228,808

No Format

NOTE: The following numbers are obtained by using poe+ on the entire code, including minimal I/O and matrix construction.

Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix

Mflip/s  Wall sec   Library
-------  --------   -------------------------------------------
 8,300       30     PESSL PDGEMM (16 processors)
 7,900       32     ScaLAPACK routine PDGEMM (16 processors)
 7,900       32     ESSL-SMP routine DGEMM (16 threads)
 7,900       32     NAG-SMP routine F01CKF (16 threads)
 1,200      213     ESSL routine DGEMM

Matrix-Matrix Multiply of 20,000 by 20,000 dense matrix

Mflip/s  Wall sec   Library and configuration
-------  --------   -------------------------------------------
158,900     100     ScaLAPACK PDGEMM (256 proc, 16 nodes) 
146,200     110     PESSL PDGEMM (256 proc, 16 nodes) 
105,400     150     ScaLAPACK PDGEMM (144 proc, 9 nodes, block 128) 
100,960     160     PESSL PDGEMM (144 proc, 9 nodes, block 128) 
 79,400     200     PESSL PDGEMM (144 proc, 9 nodes, block 1024) 
 74,800     214     ScaLAPACK PDGEMM (144 proc, 9 nodes, block 1024) 
 55,000     290     PESSL PDGEMM (64 proc, 4 nodes) 
 50,000     320     ScaLAPACK PDGEMM (64 proc, 4 nodes) 
 27,160     590     PESSL PDGEMM (32 proc, 2 nodes) 
 25,630     625     ScaLAPACK PDGEMM (32 proc, 2 nodes) 
 15,800   1,010     PESSL PDGEMM (16 Proc, 1 node)
 15,600   1,025     ScaLAPACK PDGEMM (16 Proc, 1 node)

Matrix-Matrix Multiply of Larger Dense Matrix

Gflip/s Wall sec Size    Library and configuration
------- -------- -------  -------------------------------------------
163.6   1,529   50,000  ScaLAPACK PDGEMM (256 proc, 16 nodes)
163.4   1,531   50,000  PESSL PDGEMM (256 proc, 16 nodes)
179.6  11,141  100,000  PESSL PDGEMM (256 proc, 16 nodes, 128 block)
210.7   9,495  100,000  ScaLAPACK PDGEMM (256 proc, 16 nodes, 128 block)

...