HACKER Q&A
📣 2Gkashmiri

Any way to borrow compute from Apple M1


Hi.

i have a friend who owns an m1 max. i would like to "borrow" his gpu for llama 3 or SD. is there a way for me to use his compute when idle ? i do not want to remote into his machine, an easy local api would be fine (i could tailscale/zerotier) and then get the api that way.


  👤 enoch2090 Accepted Answer ✓
For llama3 just ask him to install ollama and serve the model. Ollama has auto memory management and will free the model when not used, and whenever you make a call to the API (do let your friend know before you do this) ollama will reload the model back to memory again.

Not sure whether there are anything similar for SD though.


👤 PLG88
There are tons of options - https://github.com/anderspitman/awesome-tunneling. I will advocate for zrok.io as I work on its parent project, OpenZiti. zrok is open source and has a free SaaS.

👤 nomadic-coder
[zrok](https://zrok.io/), an alternative to ngrok does access management too. It's like tailscale but can give access to a specific service.

👤 lordnacho
> i do not want to remote into his machine

> tailscale/zerotier

Same thing isn't it?

In any case it wouldn't be hard for you to just have an account on his machine, tailscale being perhaps the simplest setup. SSH in, cook his laptop at your leisure.


👤 condiment
I had to do this fairly recently to make krita-diffusion available for my friends and family who don't have a 3090ti laying around. Probably the simplest way would be to run a local http service on your friend's M1 that is ssh tunneled to a server that you'll access over http. On the server you'll need to reverse-proxy the tunneled port to a public address and port.

You make http requests to the shared server, those get proxied via the ssh tunnel to his machine, and the client on his machine could make the determination when/whether to run the workload.


👤 whywhywhywhy
M1 can’t really handle SD the inference times are closer to a minute and with SDXL you can feel the machine straining under it, battery depletes quick and the machine often completely freezes up for a second if you’re trying to do other things at the same time (M1 Max 32gb).

Think you’d be way better off just paying for a service designed for this or renting a GPU from a service set up for this cost won’t be that significant.


👤 neximo64
Use Ollama's api

👤 BossingAround
An off-topic question, are Apple's M-series chips any good at current AI/ML work? How does it compare with dedicated GPUs?

👤 loktarogar
This was bouncing around the last few days, if you have a few devices as well as the M1 (though i'm not sure it able to work over the internet as opposed to a local network): https://github.com/exo-explore/exo

Otherwise set up Ollama's API


👤 neom
I feel like Ollama just came out and now y'all are doing model based laptop resource sharing.

Should I take this as an indicator that embedded GenAI is moving quite quickly?

(Also just wanted to say I find this thread incredibly cool generally, some very interesting stuff going on!!! :D )


👤 paxys
The entire point of embedded models is that you can run them locally. If it'll anyways take an internet roundtrip then what's the point of connecting to your friend's laptop over a cloud GPU or a managed service like ChatGPT-4o?

👤 thisconnect
If you are in the same network try https://pinokio.computer

👤 exe34
With "borrow" in scare quotes, do you intend for him to be aware of his generosity?