But I had never heard of this kind of programming using SIMD and “branchless” programming. Does anyone know of a resource where I can learn how to do this kind of programming?
More on this style of programming in this thread from the contributors to simdjson: https://news.ycombinator.com/item?id=22754841
Also, this will amaze you: https://github.com/komrad36/CRC
Knowing details about your system can give you 60x speedup over naive implementations. That CRC32 (Castagnoli) checksum is also 29x faster than the best you can do in WebAssembly, for example.