HACKER Q&A
📣 andrewstuart

For AI, could 80GB GPU RAM be allocated on an integrated CPU/GPU?


I am thinking about AI applications where the amount of GPU RAM is important.

Some CPUs have a GPU built in.

I am wondering if for example you could have a PC with 128GB RAM and allocate 80GB to the GPU and use that for AI.

Or is the shared CPU/GPU memory constrained in some way?

The sort of CPU/GPU combination (APU) I am thinking about is, for example, the AMD Ryzen™ 9 7940HS Processor, 8 Cores/16 Threads, which has a Radeon 780M GPU built in.

Despite searching quite far I can't find a clear answer to this.

Anyone know?


  👤 andrewstuart Accepted Answer ✓
Further research, I found:

https://gpuopen.com/ags-sdk-5-4-improves-handling-video-memo...

Which says:

"For APUs, this distinction is important as all memory is shared memory, with an OS typically budgeting half of the remaining total memory for graphics after the operating system fulfils its functional needs. As a result, the traditional queries to Dedicated Video Memory in these platforms will only return the dedicated carveout – and often represent a fraction of what is actually available for graphics. Most of the available graphics budget will actually come in the form of shared memory which is carefully OS-managed for performance."

The implication seems to be that you can have an arbitrary amount of graphics RAM, which would be appealing for AI use cases, even though the GPU itself is relatively underpowered.

Still, the question remains open, how to precisely control APU/GPU memory allocation on Linux and what is the limitations?

https://github.com/GPUOpen-LibrariesAndSDKs/AGS_SDK/


👤 smoldesu
> I am wondering if for example you could have a PC with 128GB RAM and allocate 80GB to the GPU and use that for AI.

Kinda. Model tiling is pretty common from what I've seen, where you can keep the full model loaded in DRAM and swap it to the GPU as-needed. Even that isn't really necessary if you have a good SSD though, in my experience it's extraordinarily fast to just load from NVMe to DRAM.

> Despite searching quite far I can't find a clear answer to this.

I doubt many people have ran into it as an issue. If you have a unified memory system, your bottleneck is still the SSD or storage medium that the model lives on.

Edit: also note that this is a feature of CUDA. Page streaming from the CPU to the GPU has been working for a while now: https://developer.nvidia.com/blog/maximizing-unified-memory-...


👤 mobilio
Yes - but on Apple M1/M2.