HACKER Q&A
📣 mzubairtahir

MacBook vs. Dedicated GPU for LLM


For those who are using llms on macbook, Want to understand how macbook is different than dedicated GPU in running those models? and how to know how much a macbook is capable of running a model?


  👤 epsteingpt Accepted Answer ✓
Both are going to be super super slow and low payback.

You gotta really want it right now.

It's still early!


👤 JSR_FDED
MacBooks with their unified memory behave like a slow GPU with enormous amount of video RAM. So you can run large smart models slowly.

Dedicated GPUs have less video RAM so can run smaller less smart models quickly.


👤 browningstreet
It’s kind of amazing how steadily this question is asked in every forum where it can be asked. Kind of amazing that the answers previously given can’t reach the next person who’s going to ask it.

👤 cylentwolf
I asked a few of my friends that are ML engineers this question and all of them said to run the LLMs in the cloud with their infrastructure because it was going to be way faster. If you just want to tinker around I would look at @JSR_FDD's comment.

👤 nichch
My opinion is that you should wait for 6-12 months before making a purchase either way.

Open weight models are getting good. With GLM 5.2 now chasing Opus, I'm very excited to see a smaller model's distillation.

Plus, the OLED MacBook Pro should be released by then.


👤 jpgvm
If you want a massive MacBook anyway then it's great. They are decent for local LLMs, awesome for local image models and it's a MacBook so AppleCare+ has your back. IMO it's a no brainer if you wanted a MacBook anyway but it's a poor choice if your reason to buy it is to run LLMs.

👤 brcmthrowaway
Dual 3090 >>> Any Apple product.

👤 gizajob
Local LLMs running in LM Studio on a MacBook Pro work great, if you’re prepared to wait for the answers because using an LLM locally is much much slower than having the instant results appear when using an online LLM like ChatGPT or Claude. You can also run OpenClaw on the MacBook and have that act as the front end for the LLM, to get full interactivity and have it install command line tools on your Mac to perform whatever tasks you’ve set it.

If you don’t already have a MacBook, then there’s a bit of a sweet-spot for the AI experimenter right now, which is to buy a second-hand 16” MBP with an M1 Max chip and 64GB of shared ram. Because these are about 5 years old now, they have depreciated to the point where they can be had for around £1100 / €1300 / $1500 and make a phenomenal platform for learning because the 64Gb of shared memory means you can host models up to about 48GB in size, and then task them to do interesting things with coding without ever having to worry about token burn.

The downside is that they’re slow, and prone to having to be nudged to keep them on track, but that’s part of the fun too. The “latency” is atrocious granted - you ask something and the machine thinks for a few minutes before saying anything which is a different experience to using Claude. But… it does work. You can think of yourself more like a manager with a junior member of staff and set the machine running and leave it to do its thing for a couple of hours which can be actually useful work, but this approach will likely be shouted down by some commenters here who treat Claude like some kind of expensive and quick-fire dopamine pump. Can also use a Mac like this for running diffusion models for image generation and suchlike in ComfyUI, even though, again, results will be slow. Spending more money on a more recent MBP with as much RAM as you can afford will deliver the same results more expensively in a quicker and quicker time.

To get the same kind of size of model you’d have to combine a couple of Nvidia 3090 24GB cards in a decent workstation with the PCI capacity to handle them, or hack some kind of solution to hang GPUs off the back of a motherboard on ribbon cables with the GPUs running on their own PSU, which is what I’m building next… the difference is those cards have 24GB of vram and cost about $1000 each second-hand, but will operate much much faster than the M1 Max MBP, or even the most recent M5 because they have so much more bandwidth (because they’re burning 350 watts on GPU compute rather than 140 watts total which is what a super efficient MBP has for the cpu/gpu/screen/everything).

So say you had $6000 to spend today, you could buy a second hand workstation and craft a solution with external GPUs which would completely smoke any Mac in existence, even though macs have the edge in the size of model you’d can run (slowly) due to their shared memory. External GPUs and access to the Nvidia frameworks and general CUDA ecosystem wins out on the performance front though. A real sweet spot is to buy an M1 Max MBP and have that as your front end to a Linux workstation full of GPUs.

But any apple silicon MBP is a totally competent gateway drug to local agentic computing.

Google Gemini could give you an in-depth and useful discussion about this exact question.


👤 dust42
With a M5 16c 48GB and Qwen 3.6 35B Q4 I get up to 1900 PP/s and 80 TG/s. With an Nvidia 5090 I get 7800 PP/s and 280 TG/s.

Together with pi mono I wouldn't want to go back to Claude & Co. Speed, quality of the answers, short answer times at any time of day - once you have eaten from the fruit your definition of SOTA will change...

For reference, I do software development since 30 years, I am not vibe coding the umpteenth todo list.


👤 EagnaIonat
With a dedicated GPU, the lag is in transferring data to the GPU. You don't have that lag in ARM.

But it really depends on what it is you want to do. An MLX optimised recent model will run fine and at decent speeds. Granite4.1 (a few months old) for example takes up 2GB of memory, insanely fast and results are good vs much bigger models like gpt-oss-120b (a year old). It even runs on an M1 mac with good speeds.

The models are only getting better.


👤 alecco
MacBooks have lots of RAM and no PCIe but ~10x fewer FLOP/s than a cheaper Nvidia GPU. Test LLMs on rented GPUs on vast.ai or other similar services (beware storage etc). Don't spend thousands before trying and knowing exactly what you get.

Also beware local models tend to be slow. Also, the main optimization trick for LLM inference is running large batches (concurrent users) and you won't take advantage of this (batch=1).

IMHO using Macs for LLMs is a fad. An expensive fad.


👤 g-technology
Around February or march I started looking into hardware options to help me start learning about training models and working with them. My budget was limited and an apple refurbished 32 gb Mac mini was far and away the best option for my budget. I wish it was faster but I can let it run 24/7 with no noise and minimal power draw. I just arrange long running tasks for when am asleep or at work. Then as a huge plus I have an awesome daily driver machine for whatever else I want to do

👤 tosh
macbooks (macs in general) are a good package for llms because they come with so much RAM

and for llms more RAM means access to better models

macbooks might not be as fast as a GPU with similar amount of RAM but more affordable and well integrated

last but not least: compared to a PC+GPU the macbook is either silent (air) or at least way less annoying when you care about noise

for ultimate flexibility and low noise: GPU in the cloud for when you need/want it is probably also most cost effective if you don't have workloads that need to run 24/7