Mine, obvious, but switching from element wise operations to “vectorized” operation.
redis as a caching layer.
Implement high-performance code in a systems language and use bindings (i.e. ctypes in Python to call C code) for critical speedups.
Avoid lambdas and any other "serverless" code that is 10x slower and 10x more expensive than a cloud instance.
Figure out the bottleneck in a complex workflow and pre-compute as much as possible before the workflow begins, accepting stale data within defined parameters.
Manage your own memory.