Here are the complete specs from the post description… from a dollar value of all these parts, I’m not really losing any money… I just don’t have good enough intuition to see if this system is worth it to learn practice modern day AI.
Specs:
Motherboard: MSI MAG Z390 Tomahawk gaming 9th generation with dual Ethernet ports for wiring with other servers, and max speed 4400 MHz in overclock mode. CPU: Intel Core i5-9400f @4.10 GHz x 6 cores (overclock mode). RAM: 64 GB (4x16) DDR4 max speed 3600 MHz. Storage: One m.2 NVMe SSD 256 GB (for operating system) + Two 3 TB Hard Disk Drive (for data storage) Gaming Display Support: 1 GTX 1660 Super graphic card with 6 GB memory and 1,408 cuda cores, supporting max 3 monitors at the same time. Bus max transfer speed 8.0 GB/s (gen3 mode). AI Deep Learning: 4 Tesla K40 AI accelerators each with 12 GB memory and 2,880 cuda cores, dedicating to machine or deep learning, Bus max transfer speed 8.0 GB/s (gen3 mode) each. Power supply safety: One 700 W PSU dedicated to the motherboard and the GTX 1660 monitor GPU. Another 1,000 W PSU dedicated to the Tesla K40 AI accelerators. CPU Cooling: Cooler Master liquid cooler with LED light control. AI Accelerator Cooling: 4 cooling fans at front and 3 cooling fans at back. Structure: Open frame of high strength Al alloy to safeguard your system in an intensive working environment. Power switch: Big button switch with 5 ft flexible extension cable, and LED indicator for hard drive.
The latest Nvidia driver no longer supports the K40, so you’ll have to use version 470 (or lower, officially Nvidia says 460, but 470 seems to work). That supports CUDA 11.4 natively. Newer versions of CUDA 11.x are supported: https://docs.nvidia.com/deploy/cuda-compatibility/index.html though CUDA 12 is not.
In my testing, a system with a single RTX3060 was faster in tensorflow than with 3 K40s and probably close to the performance of 4 k40s.
If you are considering other GPUs, there are some good benchmarks here (The RTX3060 is not there, though the GTX1080Ti was almost the same performance in the tensorflow test they run): https://lambdalabs.com/gpu-benchmarks
As others have said Google CoLab is free option you can use.
The reason I say that is, if we go with PyTorch, you basically have 2 options for multi-GPU training.
- DataParallel - where you clone your model over each GPU, but each one functions independently, and after each 'training step' they pool their data. This has downsides, in that you don't get to process intermediate layer outputs and synchronise your batch normalisation layers - so you can't use it to train 'big' models. It just makes your smaller models train more quickly. However, you can at least use these in a 'normal' training script.
- DistributedDataParallel - this is 'proper' multi-GPU training - you can now train big models and put a little bit of data on each GPU, and have then synchronise their results after each layer. However, this can be very annoying to use - each GPU runs in its own background process which is either spawned or forked (depending on Windows/Linux) and you therefore cannot run it in an iPython notebook, or an interactive Python console. It also makes tracking metrics etc. MUCH harder - because you need to reduce your metrics over each GPU process (because otherwise you get 4 accuracies, 4 mean squared errors etc. if you have 4 GPUs, and each process only sees one of them).
I personally prefer having 1 GPU with 24 Gb RAM over 3 GPU with 12 Gb RAM - because I can have a larger batch size on each GPU, which is VERY VERY advantageous in large models where you can only have small batch sizes, and batch normalisation starts falling down. I'd rather wait 2x as long for a 'better' model to train.
I was going to say maybe it'd be worth it if they were 24GB GPUs, but I'm not sure you can even use recent Pytorch with cards that old. You'd have to work around the limitations.
You don't even need a GPU to learn anyway, you can use tiny models that train super fast even on CPU for that. You need the beefy GPUs once you want to generate tons of content or train a modern model on big datasets.
Get a 3060 12GB or a 2080Ti 11GB and call it day, at most.
I may have to just save up and get a more capable card with more VRAM. I still want to learn how to do parallel compute but I realize I could just do that at any other time and doesn’t have to be hardware that I necessarily own (rent out a cloud server) even though that would be really nice.
The only thing that sounds exceptional about this system is the 4x12 GB GPU memory. Is that worth it over the inability to use modern CUDA? I don't know much about ML, but I doubt it. People tend to move very quickly in this field (and in others TBH), not caring much about supporting old hardware.
Worth considering just building two machines, as well; the ability to train multiple models in parallel is in many cases more valuable than the ability to train one big one.
https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...
I suspect a newer Nvidia chip manufactured on a more efficient semiconductor process will deliver the same performance for a fraction of the power consumption.
This is often a problem with old hardware. The power efficiency gains from improvements between chip process nodes are so fundamental that it’s hard for older chips to compete in total cost of ownership.
Should be enough for most smaller projects yeah?
You probably won’t be able to use latest versions of pytorch because of k40 cuda support, but that’s okay.
Make sure you load test it for at least 15-20 min to see how high are the gpu temperatures before parting with your money. Do not buy if you can’t test it - an old system like this can have all sorts of hw problems.
Buy the most advanced single card out there with largest VRAM.