HACKER Q&A
📣 schappim

What is the safest way to replicate Repl.it like sites?


I’ve often wanted to run end user code on my platform, but I hesitate to do so out of security concerns.

I’m wondering in 2021, what is the safest way to replicate Repl.it like sites?


  👤 arthurcolle Accepted Answer ✓
I think sandboxing is a popular way to do this. Obviously super open to abuse so you would want to spin up some kind of container or VM (I guess per session?) and only expose a subset of the API of a language (no networking, no OS/filesystem libraries)

So the path would look something like:

1. user makes request to the site

2. site spins up an ephemeral instance with $LANGUAGE installed

3. site renders some client side code that looks like a shell, or is a text field with the ability to type code into it (maybe a button to "run" the code)

4. that code then gets executed as long as it doesn't contain disavowed libraries, etc. or maybe just if your backend logic.

Another thing is you probably want to terminate execution if it takes longer than some pre-set amount of time, just to prevent whatever you'd call a logic or fork bomb.

You'll notice that in another of these sites, they don't let you just import requests and then go to town, there is a relatively strict policy in place to not let you do things.

That's why it's super impressive that Google can run something like Colab, where the previously strict guidelines from other code exec platforms are made super flexible and you can basically do anything.


👤 bem
This post from fly.io [1] has a pretty comprehensive survey of the tech available for running users' code safely. It's a good read.

I've been investigating something similar for a feature I want to launch. I'm currently leaning towards running users' code in Kubernetes using Firecracker or gVisor.

My main takeaway has been that while there are good solutions for isolating users' code, there's going to be a lot of worked involved in orchestrating it at scale. I.e. building and storing images, spinning up containers, managing storage, tracking/billing minutes and bandwidth, killing timed-out containers, etc. I have not found a good library for that. It seems like a good use-case for a Kubernetes operator, so I think that's what I'll wind up building.

[1] https://fly.io/blog/sandboxing-and-workload-isolation/


👤 firefly284
Have you checked piston?https://github.com/engineer-man/piston

👤 Jugurtha
We built https://iko.ai, a machine learning platform, to speed up our work as a boutique machine learning consultancy.

It offers live collaboration on notebooks, scheduled long-running GPU notebooks that you can watch live while they run even when closing the browser, automatic experiment tracking for notebooks, parameters, and models, plus one click model deployment and a live dashboard to monitor models.

We use containers and Kubernetes (we have executors for Docker and Kubernetes). We currently run things on our GCP cluster, but are working to use users' clusters in a "bring your own cluster" setting, so users can choose the specs and use their billing.


👤 imhoguy
I think the killer product would be one which runs a bunch of languages in a browser thru WASM, see Python[0] for start - no need for server side infra and cost, no shared env thus less security worries, unlimited user scalability etc.

[0] https://github.com/pyodide/pyodide


👤 1cvmask
I think you meant safest from a security perspective?

Because replit sued open source competition:

https://news.ycombinator.com/item?id=27424195

https://intuitiveexplanations.com/tech/replit/