1. Single-thread performance 2. Concurrent/Parallel programming 3. GPU programming (kinda the same as 2) 4. Distributed Computing 5. Quantum Computing
I'm mostly interested in 1-3 (and in that order). I searched HN and DDG and found this (https://theartofhpc.com/) which has been great so far, but I wanted to see if y'all had any other suggestions as I start down this path. Also any favorite profiling tools are welcome!
I'm actively learning C to do this and I have average level competence in algorithms and data structures. I know that's another area I could study but I'm mostly interested in optimizations on the programming side rather than the algorithm side.
Thanks a ton!
This is hardly related to HPC. The closest thing maybe code optimization and competitive programming. First learn how to do anything fast in single process, then think about scale it up later.
2. Parallel programming
Learn MPI. Specifically mpi4py. Once you go MPI, you never use python's multiprocessing module (it sucks). But stuffs are mostly about solving linear algebra problems, which I am not that interested in.
4. Distributed Computing
Related to MPI. But nowaday distributed GPU computing become a thing, thing that I can think of is Dask. Don't know how to use that.
5. Quantum Computing
Completely unrelated to HPC, at least in the near future (10 to 30 years). Areas that is the most active in this field is molecular computation. If you don't have a phy/chem PHD, you are not able to read anything beyond college level stuff in this field. And those are only basics stuff like elementary schools' arithmetics. But write your own program to solve those quantum problems in classical computers is interesting, because people there only think about problems that cause scaling issue, not the others. That's why some may think that quantum computer is a replacement of supercomputers. If you can find ways and invent techniques that can reduce timecomplexity in terms of classical algorithm, you are welcome. And literally there is a lot of room for anyone because every quantum computer emulator is unoptimized and slow as hell. Google, Microsoft, IBM, all tried to make their only emulator but no one can write proper code and make the legit program. But the real problem is their emulator has more bugs than actual codes.
6. Profiling:
Still I prefer line_profiler.
7. Algorithms and data structures:
I found that text book are mostly useless in doing real world code optimization. Stuff being done in university textbooks are mostly within 20 lines. But just any single function in any HPC program is far beyond 100 lines. And most of the time the reason why the thing is slow is because of program structure/architecture, this turn the problem into optimization of 2000 lines of code. I can only gain certain confidence after working in 10 different little projects in total 20000 lines of code, in 1 year of span. So I don't think your "average level competence in algorithms and data structures" claim matter whatsoever. Try bigger project.
8. Helpful Materials
Mr. Performance Ambassador(?) 's lectures
https://www.youtube.com/watch?v=Ge3aKEmZcqY