GPU performance improvement
- Measure runtime of right side vector calculation: Is this a bottleneck? Proper profiling would be a good idea
Speed up matrix copying to GPU
- CSR format: We use CSR now in the sparse solver (CPU and GPU)
- CUDA managed memory and CUDA async: Undesireable because of incompatibility with other tools. But performance study would nevertheless be interesting
- Can data transfers be optimized? Probably not. We focus on latency and that just requires lots of small transfers.
Pre and Post Steps on the GPU
- That would really reduce the amount of memory transfers.
- However: How sequential are pre and post steps? Differential equations would probably be pretty fast on the GPU, but how much data is there?
- Use sparse matrix types and algorithms (Eigen and CUDA)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information