Worth it to buy 4x Nvidia Tesla K40 for AI?

Question

I saw a post on a local market place that&rsquo;s selling a complete system with 4 Tesla K40s 12 GBs VRAM w/ passive cooling for $400. The post description said that the system was intended to be used for training AI models, which is what I want to use it for&hellip; nothing too serious I am mostly still learning here. The cards themselves were released on 2013 and would have a combined cuda cores of 12,928 if I&rsquo;m counting the 5th video card for a monitor (GTX 1660)Here are the complete specs from the post description&hellip; from a dollar value of all these parts, I&rsquo;m not really losing any money&hellip; I just don&rsquo;t have good enough intuition to see if this system is worth it to learn practice modern day AI.Specs:Motherboard: MSI MAG Z390 Tomahawk gaming 9th generation with dual Ethernet ports for wiring with other servers, and max speed 4400 MHz in overclock mode. CPU: Intel Core i5-9400f @4.10 GHz x 6 cores (overclock mode). RAM: 64 GB (4x16) DDR4 max speed 3600 MHz. Storage: One m.2 NVMe SSD 256 GB (for operating system) + Two 3 TB Hard Disk Drive (for data storage) Gaming Display Support: 1 GTX 1660 Super graphic card with 6 GB memory and 1,408 cuda cores, supporting max 3 monitors at the same time. Bus max transfer speed 8.0 GB/s (gen3 mode). AI Deep Learning: 4 Tesla K40 AI accelerators each with 12 GB memory and 2,880 cuda cores, dedicating to machine or deep learning, Bus max transfer speed 8.0 GB/s (gen3 mode) each. Power supply safety: One 700 W PSU dedicated to the motherboard and the GTX 1660 monitor GPU. Another 1,000 W PSU dedicated to the Tesla K40 AI accelerators. CPU Cooling: Cooler Master liquid cooler with LED light control. AI Accelerator Cooling: 4 cooling fans at front and 3 cooling fans at back. Structure: Open frame of high strength Al alloy to safeguard your system in an intensive working environment. Power switch: Big button switch with 5 ft flexible extension cable, and LED indicator for hard drive.

rythie · Accepted Answer

The pytorch binaries from pip and conda won’t work on these GPUs, though there are some alternative binaries being maintained that still work: https://blog.nelsonliu.me/2020/10/13/newer-pytorch-binaries-...
The latest Nvidia driver no longer supports the K40, so you’ll have to use version 470 (or lower, officially Nvidia says 460, but 470 seems to work). That supports CUDA 11.4 natively. Newer versions of CUDA 11.x are supported: https://docs.nvidia.com/deploy/cuda-compatibility/index.html though CUDA 12 is not.
In my testing, a system with a single RTX3060 was faster in tensorflow than with 3 K40s and probably close to the performance of 4 k40s.
If you are considering other GPUs, there are some good benchmarks here (The RTX3060 is not there, though the GTX1080Ti was almost the same performance in the tensorflow test they run): https://lambdalabs.com/gpu-benchmarks
As others have said Google CoLab is free option you can use.

jphoward · Answer

Multi-GPU training is a double-edged sword. If you are at the stage where you are running your code in a iPython notebook then you are almost certainly not going to benefit from the multiple GPUs, and I strongly suspect you'd be better with fewer and larger GPUs, even if training time is prolonged.
The reason I say that is, if we go with PyTorch, you basically have 2 options for multi-GPU training.
- DataParallel - where you clone your model over each GPU, but each one functions independently, and after each 'training step' they pool their data. This has downsides, in that you don't get to process intermediate layer outputs and synchronise your batch normalisation layers - so you can't use it to train 'big' models. It just makes your smaller models train more quickly. However, you can at least use these in a 'normal' training script.
- DistributedDataParallel - this is 'proper' multi-GPU training - you can now train big models and put a little bit of data on each GPU, and have then synchronise their results after each layer. However, this can be very annoying to use - each GPU runs in its own background process which is either spawned or forked (depending on Windows/Linux) and you therefore cannot run it in an iPython notebook, or an interactive Python console. It also makes tracking metrics etc. MUCH harder - because you need to reduce your metrics over each GPU process (because otherwise you get 4 accuracies, 4 mean squared errors etc. if you have 4 GPUs, and each process only sees one of them).
I personally prefer having 1 GPU with 24 Gb RAM over 3 GPU with 12 Gb RAM - because I can have a larger batch size on each GPU, which is VERY VERY advantageous in large models where you can only have small batch sizes, and batch normalisation starts falling down. I'd rather wait 2x as long for a 'better' model to train.

JonathanFly · Answer

While it's technically not a bad price in terms of raw compute and memory, the noise from the high speed server fans alone is reason enough not to get this. Never mind the power usage.
I was going to say maybe it'd be worth it if they were 24GB GPUs, but I'm not sure you can even use recent Pytorch with cards that old. You'd have to work around the limitations.
You don't even need a GPU to learn anyway, you can use tiny models that train super fast even on CPU for that. You need the beefy GPUs once you want to generate tons of content or train a modern model on big datasets.
Get a 3060 12GB or a 2080Ti 11GB and call it day, at most.

speedylight · Answer

Well thank you all for the suggestions, I think I mostly agree it&rsquo;s not a good idea. From power cost/consumption, to lack of modern CUDA, PyTorch support as well as the complexity with parallel compute really put a dent in the perceived value.I may have to just save up and get a more capable card with more VRAM. I still want to learn how to do parallel compute but I realize I could just do that at any other time and doesn&rsquo;t have to be hardware that I necessarily own (rent out a cloud server) even though that would be really nice.

unsigner · Answer

Unless you know what you're doing, choose simplicity, relatively new hardware, and low power consumption, over good price.The only thing that sounds exceptional about this system is the 4x12 GB GPU memory. Is that worth it over the inability to use modern CUDA? I don't know much about ML, but I doubt it. People tend to move very quickly in this field (and in others TBH), not caring much about supporting old hardware.

sdenton4 · Answer

For just 5x more cash, you can get two new 3090's... 20k cuda cores, and much more modern, so should be much faster than the core count suggests.Worth considering just building two machines, as well; the ability to train multiple models in parallel is in many cases more valuable than the ability to train one big one.

max0r_ · Answer

Remember that the K40s do not support the latest PyTorch operations due to their old compute capabilities (e.g., torch.compile).

MasterScrat · Answer

If you want to learn about which GPUs make sense to buy or rent for various ML workload, i highly recommend this Tim Dettmers article:https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

pavlov · Answer

That K40 setup requires a 1 kW power supply. With energy prices up, that can get pretty expensive in the long run.
I suspect a newer Nvidia chip manufactured on a more efficient semiconductor process will deliver the same performance for a fraction of the power consumption.
This is often a problem with old hardware. The power efficiency gains from improvements between chip process nodes are so fundamental that it’s hard for older chips to compete in total cost of ownership.

psychoguineapig · Answer

I bought that system! Hopefully didn't snag it from you. Admittedly don't have a ton of hair in the game yet, but excited to try to get parallel gpus working. Hoping to be able to use it as a render node too and maybe retrain some diffusion models with images of myself. Please link helpful resources if you're able to ;) hehe. You can call me dumb too. I will be mastering in geography starting in the fall, so i'm hoping it will be nice for use with qgis and other gis related projects, rasters are fairly similar to images after all. Again, naive but hey only 500$ in the game after all, would rather take that step than buying a 1k$+ card I might just end up using for minecraft. The fellow said he had to compile cuda from source which is something I am not looking forward to, though I'd like to look into virtualization options for distributed computing... might be something there. Wish me luck! Time to assemble the home lab.

aljungberg · Answer

For some workloads, it&rsquo;s almost all about the VRAM. In those cases I&rsquo;ve been wondering if getting a high memory M1 or M2 Mac could be a good lab machine thanks to unified memory. It&rsquo;ll run more quietly, use significantly less power, no worries about overloading your electric circuit. On a 128 GB RAM Mac Studio you could theoretically run or even train models that otherwise would require multiple $6k A6000 GPUs in custom machine builds taking oodles of power at the plug. It&rsquo;d be slow but slow beats not possible. And if you need a new development machine anyhow, you can justify some of that beefy Mac Studio&rsquo;s cost as part of your required spend anyhow. PyTorch has supported &ldquo;mps&rdquo; as a target device for some time now.

langsoul-com · Answer

How much online AI compute can you get for $400.Should be enough for most smaller projects yeah?

dan1234 · Answer

If you&rsquo;re just learning, I&rsquo;d actually look at using a cloud based provider where you can just spin up a gpu vps for the duration of each experiment.

p1esk · Answer

From the specs it&rsquo;s a good value - 1660 alone is ~$150 used.You probably won&rsquo;t be able to use latest versions of pytorch because of k40 cuda support, but that&rsquo;s okay.Make sure you load test it for at least 15-20 min to see how high are the gpu temperatures before parting with your money. Do not buy if you can&rsquo;t test it - an old system like this can have all sorts of hw problems.

thwoeruoi34234 · Answer

This is essentially a GTX 970. Not worth it IMO.

hedgehog0 · Answer

Those who do not recommend, do you have any suggestions for people just getting into the field?

0x008 · Answer

modern AI models and frameworks often require cuda versions and features which are not available on these old GPUs. Unless you know what you are doing I would not buy them.

Taek · Answer

It doesn't have any tensor cores, which means it will run a lot slower for many machine learning tasks, especially the deep learning stuff.

karmasimida · Answer

simple answer noBuy the most advanced single card out there with largest VRAM.

Worth it to buy 4x Nvidia Tesla K40 for AI?

Remember that the K40s do not support the latest PyTorch operations due to their old compute capabilities (e.g., torch.compile).

If you want to learn about which GPUs make sense to buy or rent for various ML workload, i highly recommend this Tim Dettmers article:
https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

How much online AI compute can you get for $400.
Should be enough for most smaller projects yeah?

If you’re just learning, I’d actually look at using a cloud based provider where you can just spin up a gpu vps for the duration of each experiment.

This is essentially a GTX 970. Not worth it IMO.

Those who do not recommend, do you have any suggestions for people just getting into the field?

modern AI models and frameworks often require cuda versions and features which are not available on these old GPUs. Unless you know what you are doing I would not buy them.

It doesn't have any tensor cores, which means it will run a lot slower for many machine learning tasks, especially the deep learning stuff.

simple answer no
Buy the most advanced single card out there with largest VRAM.