1) Summary of timing and speedup for the large test cases. Test case CPU time (8 nodes) GPU time (2 V100) Speedup GaAsBi 754.8 (LOOP+) 1067.4 (LOOP+) 5.6x MD-example 6889.6 (LOOP) 18854.5 (LOOP) 2.9x 576_hh_2x2x2_pbe 36822.2 (LOOP+) 89387.1 (LOOP+) 3.3x 128_hh_3x3x3_hse 151539.6 (LOOP) 161687.9 (LOOP+) (Note: on 5 nodes) 86025.9 (LOOP) 97206.6 (LOOP+) 8.8x 8.3x 3-8倍的加速效果 2) Summary of system size and speedup for the test cases. Test case NIONS NBANDS ISPIN Speedup GaAsBi 512 1536 1 5.6x MD-example 128 1536 2 2.9x 576_hh_2x2x2_pbe 574 1440 2 3.3x 128_hh_3x3x3_hse 126 340 2 8.3x 还是3到8倍的加速效果 3) |
说点什么...