add CUDA implementation of pi calculation
-
100000000000 Samples
-
v7 on elmo with MAX_THREADS = 32: 29.532267 s
-
v9 on elmo with 16384 CUDA threads = 7.638226 s
-
-> Speedup: 3.866 :)
Edited by Niklas Eiling
100000000000 Samples
v7 on elmo with MAX_THREADS = 32: 29.532267 s
v9 on elmo with 16384 CUDA threads = 7.638226 s
-> Speedup: 3.866 :)