HACKER Q&A
📣 disintegore

Who operates at scale without containers?


In other words, who runs operations at a scale where distributed systems are absolutely necessary, without using any sort of container runtime or container orchestration tool?

If so, what does their technology stack look like? Are you aware of any good blog posts?

edit : While I do appreciate all the replies, I'd like to know if there are any organizations out there who operate at web scale without relying on the specific practice of shipping software with heaps of dependencies. Whether that be in a container or in a single-use VM. Thank you in advance and sorry for the confusion.


  👤 smilliken Accepted Answer ✓
My company runs without containers. We process petabytes of data monthly, thousands of CPU cores, hundreds of different types of data pipelines running continously, etc etc. Definitely a distributed system with lots of applications and databases.

We use Nix for reproducible builds and deployments. Containers only give reproducible deployments, not builds, so they would be a step down. The reason that's important is that it frees us from troubleshooting "works on my machine" issues, or from someone pushing an update somewhere and breaking our build. That's not important to everyone if they have few dependencies that don't change often, but for an internet company, the trend is accelerating towards bigger and more complex dependency graphs.

Kubernetes has mostly focused on stateless applications so far. That's the easy part! The hard part is managing databases. We don't use Kubernetes, but there's little attraction because it would be addressing something that's already effortless for us to manage.

What works for us is to do the simplest thing that works, then iterate. I remember being really intimidated about all the big data technologies coming out a decade ago, thinking they are so complex that they must know what they're doing! But I'd so often dive in to understand the details and be disillusioned about how much complexity there is for relatively little benefit. I was in a sort of paralysis of what we'd do after we outgrew postgresql, and never found a good answer. Here we are years later, with a dozen+ postgresql databases, some measuring up to 30 terabytes each, and it's still the best solution for us.

Perhaps I've read too far into the intent of the question, but maybe you can afford to drop the research project into containers and kubernetes, and do something simple that works for now, and get back to focusing on product?


👤 toast0
I worked at WhatsApp, prior to moving to Facebook infra, we had some jails for specific things, but mostly ran without containers.

Stack looked like:

FreeBSD on bare metal servers (host service provided a base image, our shell script would fetch source, apply patches, install a small handful of dependencies, make world, manage system users, etc)

OTP/BEAM (Erlang) installed via rsync etc from build machine

Application code rsynced and started via Makefile scripts

Not a whole lot else. Lighttpd and php for www. Jails for stud (a tls terminator, popular fork is called hitch) and ffmpeg (until end to end encrypted media made server transcoding unpossible).

No virtualized servers (I ran a freebsd vm on my laptop for dev work, though).

When WA moved to Facebook infra, it made sense to use their deployment methodology for the base system (Linux containers), for organizational reasons. There was no consideration for which methodology was technically superior; both are sufficient, but running a very different methodology inside a system that was designed for everyone to use one methodology is a recipie for operational headaches and difficulty getting things diagnosed and fixed as it's so tempting to jump to the conclusion that any problem found on a different setup is because of the difference and not a latent problem. We had enough differences without requiring a different OS.


👤 wanderr
Grooveshark didn't use any of that. We were very careful about avoiding dependencies where possible and keeping our backend code clean and performant. We supported about 45M MAU at our biggest, with only a handful of physical servers. I'm not aware of any blog posts we made detailing any of this, though. And if you're not familiar with the saga, Grooveshark went under for legal, not technical reasons. The backend API was powered by nginx, PHP, MySQL, memcache, with a realtime messaging server built in Go. We used Redis and Mongodb for some niche things, had serious issues with both which is understandable because they were both immature at the time, but Mongodb's data loss problems were bad enough that I would still not use them today.

That said, I'm using Docker for my current side project. Even if it never runs at scale, I just don't want to have to muck around with system administration, not to mention how nice it is to have dev and prod be identical.


👤 maxk42
Back in 2010 I built and operated MySpace' analytics system on 14 EC2 instances. Handled 30 billion writes per day. Later I was involved in ESPN's streaming service which handled several million concurrent connections with VMs but no containers. More recently I ran an Alexa top 2k website (45 million visitors per month) off of a single container-free EC2 insurance. Then I spent two years working for a streaming company that used k8s + containers and would fall over of it had more than about 60 concurrent connections per EC2 instance. K8s + docker is much heavier than advertised.

👤 tptacek
Ironically, here at Fly.io, we run containers (in single-use VMs) for our customers, but none of our own infrastructure is containerized --- though some of our customer-facing stuff, like the API server, is.

We have a big fleet of machines, mostly in two roles (smaller traffic-routing "edge" hosts that don't run customer VMs, and chonky "worker" hosts that do). All these hosts run `fly-proxy`, a Rust CDN-style proxy server we wrote, and `attache`, a Consul-to-sqlite mirroring server we built in Go. The workers also run our orchestration code, all in Go, and Firecracker (which is Rust). Workers and WireGuard gateways run a Go DNS server we wrote that syncs with Consul. All these machines are linked together in a WireGuard mesh managed in part by Consul.

The servers all link to our logging and metrics stack with Vector and Telegraf; our core metrics stack is another role of chonky machines running VictoriaMetrics.

We build our code with a Buildkite-based CI system and deploy with a mixture of per-project `ctl` scripts and `fcm`, our in-house Ansible-like. Built software generally gets staged on S3 and pulled by those tools.

Happy to answer any questions you have. I think we fit the bill of what you're asking about, even though if you read the label on our offering you'd get the opposite impression.


👤 q3k
Depends what you mean by 'container runtime' or 'container orchestration tool'...

For example, Google's Borg absolutely uses Linux namespacing for its workloads, and these workloads get scheduled automatically on arbitrary nodes, but this doesn't feel at all like Docker/OCI containers (ie., no whole-filesystem image, no private IP address to bind to, no UID 0, no control over passwd...). Instead, it feels much closer to just getting your binary/package installed and started on a traditional Linux server.


👤 alex_duf
Hey former Guardian employee here.

The Guardian has hundreds of servers running, pretty much all EC2 instances. EC2 images are baked and derived from official images, similarly to the way you bake a docker image.

We built tools before docker became the de facto standard, so we could easily keep the EC2 images up to date. We integrated pretty well with AWS so that the basic constructs of autoscaling and load balancer were well understood by everyone.

The stack is mostly JVM based so the benefits of running docker locally weren't really significant. We've evaluated moving to a docker solution a few times and always reached the conclusion that the cost of doing so wouldn't be worth the benefits.

Now for a company that starts today I don't think I'd recommend that, it just so happen that The Guardian invested early on the right tooling so that's pretty much an exception.


👤 kuon
We use ansible on bare metal (no VM) to manage about 200 servers in our basement. We use PXE booting to manage the images. We use a customized arch linux image and we have a few scripts to select what feature we'd like. It's "old school" but it's been working fine for nearly 20 years (we used plain scripts before ansible, so we always used the "agentless" approach). Our networking stack uses OpenBSD.

👤 armcat
Not sure if this counts, but for more than a decade I was at a telecom vendor, working with radio base stations (3G, 4G and 5G). That (to me), is probably one of the most distributed systems on the planet - we worked across several million nodes around the globe. I've been out of the loop for a bit, but I know they now have vRAN, Cloud RAN, etc (basically certain soft-real time functions pulled out of base stations and deployed as VMs or containers). But back then, there was no virtualization being used.

The tech stack was as follows: hardware was either PowerPC or ARM based System-on-Chip variants; we initially used our own in-house real-time OS, but later switched to a just-enough Linux distro; management functions were implemented either in IBM's "real-time" JVM (J9), or in Erlang; radio control plane (basically messages used to authenticate you, setup the connection and establish radio bearers, i.e. "tunnels" for payload) was written in C++. Hard real-time functions (actual scheduling of radio channel elements, digital signal processing, etc) were written in C and assembly.

Really cool thing - we even deployed a xgboost ML model on these (used for fast frequency reselection - reduced your time in low coverage) - the model was written in C++ (no Python runtime was allowed), and it was completely self-supervised, closed-look (it would update/finetune its parameters during off-peak periods, typically at night).

Back then, we were always self-critical of ourselves, but looking back at it, it was an incredibly performant and robust system. We accounted for every CPU cycle and byte - at one point I was able to do a walkthrough (from-memory) of every single memory allocation during a particular procedure (e.g. a call setup). We could upgrade thousands of these nodes in one maintenance window, with a few secs of downtime. The build system we always complained about, but looking back at it, you could compile and package everything in a matter of minutes.

Anyway, I think it was a good example of what you can accomplish with good engineering.


👤 mumblemumble
I don't know that I'd say "web scale", in part because I still don't think I know exactly what that means, but I used to work at a place that handled a lot of data, in a distributed manner, in an environment where reliability was critical, without containers.

The gist of their approach was radical uniformity. For the most part, all VMs ran identical images. Developers didn't get to pick dependencies willy-nilly; we had to coordinate closely with ops. (Tangentially, at subsequent employers I've been amazed to see how just a few hours of developers handling things for themselves can save many minutes of talking to ops.) All services and applications were developed and packaged to be xcopy deployable, and they all had to obey some company standards on how their CLI worked, what signals they would respond to and how, stuff like that. That standard interface allowed it all to be orchestrated with a surprisingly small volume - in terms of SLOC, not capability - of homegrown devops infrastructure.


👤 efficax
Back in 2016 at least, Stack overflow was container free https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

No idea how much has changed since then


👤 onebot
We use freebsd jails and a lightweight in house orchestration tool written in Rust. We are running hundreds of Ryzen machines with 64 cores. Our costs compared to running equivalent on Amazon is so much less. We estimate our costs are about 6x lower than AWS and we have far better performance in terms of networking, CPU, and disk write speed.

Jails has been a pleasure to work with! We even dynamically scale up and down resources as needed.

We use bare metal machines on Interserver. But there a quite a few good data centers worth considering.


👤 camtarn
Don't know if they still use it (I suspect so!) but at least as of 2015 Amazon was using a homebrewed deployment service called Apollo, which could spin up a VM from an internally developed Linux image then populate it with all the software and dependencies needed for a single service. It later inspired AWS CodeDeploy which does the same thing.

I remember it being pretty irritating to use, though, since it wasn't particularly easy to get Apollo to deploy to a desktop machine in the same way it would in production, and of course you couldn't isolate yourself from the desktop's installed dependencies in the same way. I'm using Docker nowadays and it definitely feels a lot smoother.

This is a nice writeup: https://www.allthingsdistributed.com/2014/11/apollo-amazon-d...


👤 jake_morrison
AWS has a fine stack for deploying "cloud native" apps on top of EC2 instances.

Build a base AMI using Packer and launch it to an Auto Scaling Group behind a load balancer. Deploy code to the ASG using CodeDeploy. Use RDS for the database.

This is a good match for languages that have good concurrency like Elixir. They benefit from deploying to big machines that have a lot of CPU cores, and keeping a common in-memory cache on the EC2 instance is more efficient than using an external cache like Elasticache. It also works well for resource-hungry systems with poor concurrency like Ruby on Rails. Putting these kinds of apps into big containers is just a waste of money.

Here is a complete example of that architecture using Terraform: https://github.com/cogini/multi-env-deploy

Similarly, bare metal can be really cost-effective. For $115/month, I can get a dedicated server with 24 VCPU cores (2x Intel Hexa-Core Xeon E5-2620 CPU), 64 GB RAM, 4x8 TB SATA, 30 TB traffic (see https://www.leaseweb.com/dedicated-servers#NL). That would be an order of magnitude more expensive on AWS with containers.


👤 Nextgrid
I've been at a company where they weren't (yet) using containers nor K8S.

The build process would just create VM images with the required binaries in there and then deploy that to an autoscaling group.

It worked well and if you only ever intend to run a single service per machine then is the right solution.


👤 cyberge99
I believe tools like nomad and consul shine here.

Using nomad as a job scheduler and deployer allows you to use various modules for jobs: java, shell, ec2, apps (and containers).

I use it in my homelab and it’s great. That said, I don’t use it professionally.

I think Cloudflare is running this stack alongside firecracker for some amazing edge stuff.


👤 abadger9
I have a private consulting company which has delivered some pretty sizable footprints (touches most fortune 500 companies via integration with a service) and i prefer deploying without containers. In fact i'll say I hate deploying with containers, which is what i do at my 9-5 and i've lost job opportunities at growth startups because someone was a devout follower of containers and i would rather be honest than using a technology i didn't care for.

👤 jedberg
Netflix was container free or nearly so when I left in 2015, but they were starting to transition then and I think they are now container based.

At the time they would bake full machine images, which is really just a heavyweight way of making a container.


👤 zemo
when I was at Jackbox we ran the multiplayer servers without containers and handled hundreds of thousands of simultaneous websocket connections (a few thousand per node). The servers were statically compiled Go binaries that took care of their own isolation at the process level and didn't write to disk, they were just running as systemd services. Game servers are inherently stateful, they're more like databases than web application layer servers. For large audience games I wrote a peering protocol and implemented a handful of CRDT types to replicate the state, so it was a hand-rolled distributed system. Most things were handled with chef, systemd, terraform, and aws autoscaling groups.

👤 locusofself
I work at Microsoft and we have a lot of big services that run on Windows Servers. There is orchestration with a system called "Service Fabric" that schedules the applications and handles upgrades sortof like kubernetes does, but for the most part there are no containers involed.

👤 popotamonga
Containers yes but nothing else.

Running 1B valuation with manually going to instances and docker pull xxx && docker-compose down && docker-compose up -d. EC2 created by hand. No issues.


👤 oceanplexian
I worked at (massive live-streaming website) and my ops team operated tens of thousands of bare-metal machines. Not to say we didn't have an enormous amount of containerized infrastructure in AWS, but we had both.

When the company was younger containerized networking had latency and throughput issues, especially when you were trying to squeeze every bit of traffic you can from a white-box bare-metal server, i.e. bonding together 10Gb or 40Gb network interfaces. The other thing is that the orchestration engines like K8s simply had maturity issues when not using Cloud Load Balancers.

As for the implementation details, I've worked at lots of companies doing metal and they look a lot alike. PXE and TFTP, something like Chef, Puppet, Ansible (But at a certain scale you have to transcend those tools and come up with better patterns), you need services to manage IPMI or console servers, power strips, etc., you need a team of folks to rack and stack things, you need inventory, you need network engineers, and so on. At a certain scale you can simply push code around with SSH and a build system, at a scale beyond that you need to come up with some special sauce like P2P asset distribution or an internal CDN. At the pinnacle of bare-metal, you'd ideally have a very evolved control-plane, a slimmed down OS that runs a single static binary, and a stateless application. It takes a lot of work to get there.

Of course, getting servers to run some code is scratching the surface. Service discovery, network architecture, security, etc., are all things that require specialized skill sets and infrastructure. You also need to build and maintain every "glue" service you get from a cloud provider, you need to run file servers, you need to run repositories, you need to run and manage your own databases, and so on. Sometimes you can hybridize those with cloud services but that opens up yet another can of worms and teams of people who need to answer questions like.. what if The Cloud(tm) goes down? What if there's some kind of split brain scenario? What if there's a networking issue? How does service discovery work if half of the services disappear? etc. etc.


👤 Negitivefrags
We don’t use containers with a pretty large deployment. (1k or so bare metal servers)

If you statically link all your binaries then your deployment system can be rsync a directory.

The only dependency is that the Linux kernel is new enough. Other than that the packages we deploy will run on literally any setup.


👤 cespare
We run on thousands of EC2 instances and our biggest systems operate at millions of requests/sec. No containers*. We use EC2, Route53, S3, and some other AWS stuff, plus custom tooling built on their APIs. Most of our code is Go or Clojure so deployments generally consist of self-contained artifacts (binary or jar) plus some config files; there's little to no customization of the instance for the application.

*Well we do have an in-house job queue system that runs jobs in Linux namespaces for isolation. But it doesn't use Docker or whole-OS images at all.


👤 dangus
> without relying on the specific practice of shipping software with heaps of dependencies

Many companies just rely on the practice of shipping software with heaps of dependencies.

I worked at a place that simply spun up blank AWS images in autoscaling groups and allowed configuration management to install literally everything: security/infra/observability agents, the code and dependencies (via AWS CodeDeploy), and any other needed instance configuration.

The downside of this practice was slow startup times. The upside was...I don't know, I think this pattern happened by accident. Packaging these AWS instances into images beforehand would be smarter. Newly created services were generally moved over to k8s.

These were stateless web services for the most part.

I think the lesson I learned from this was "nearly any operating paradigm can be made reliable enough to tolerate in production."


👤 boredtofears
Hashicorp packer + AWS CDK (or Terraform) can you get a lot of the characteristics of containerized deployments without actual containers.

👤 nijave
Not sure if they still do, but Github used to run a decent amount of stuff on baremetal https://github.blog/2015-12-01-githubs-metal-cloud/

Afaik Dropbox Magic Pocket is bare metal (they have an internal baremetal provisioning system) https://dropbox.tech/infrastructure/inside-the-magic-pocket

Fastly's edge network I'm pretty sure is baremetal.

JP Morgan Chase has a significant amount of non-containerized workloads including one of the largest IBM mainframe setups in the world (can't find a source but internally they claimed they were IBM's biggest mainframe customer). They run high availability apps using some parallel sysplex setup where they can failover an app at the hardware level (https://en.wikipedia.org/wiki/IBM_Parallel_Sysplex#Geographi...). Their largest apps were usually either DB2 based on the mainframe or some J2E setup on distributed/x86 with Oracle databases (Websphere, Tomcat, WebLogic). They also still had a pretty big HP Nonstop setup afaik (they're so big they pretty had at least one of every Big Enterprise Thing)

For Java apps, you basically just build a war/jar/ear and publish it to a Java repo where it gets deployed to servers. The handful of things I know about all had shell or Perl scripts and operations teams to manage deployment. It's effectively the same thing as a container but all Java. Some of those stacks like WebSphere run as a compute cluster that does similar things to container orchestrators like deploy management, config management, scheduling

>without relying on the specific practice of shipping software with heaps of dependencies All that stuff still had heaps of dependencies be it internal or external.


👤 freemint
Almost every HPC center. Tech stack: Linux (RHEL-like); MPI as middle ware for distributed communication through vendor specific communication hardware (also called interconnect); shared high performance network filesystem usually setup on login node, scheduler like SLURM, IBM Spectrum LSF Suites or others to launch jobs from login node which accessed via SSH. This setup scales to tens of thousands of machines.

👤 paxys
I work for one of the largest enterprise productivity companies out there. We have tens of thousands of self-managed EC2 VMs, no containers.

We use Chef and Terraform for the most part.


👤 oneplane
One of my customers used to deploy on "Application Servers" which were essentially just VMs with CentOS and Tomcat. You push a WAR to an HTTP endpoint and that's that. The servers themselves were just images made with Packer and Ansible (two staged; first get a public image, update and preconfigure it, then the second stage templates Tomcat). While running ~1K nodes that way works, the problem wasn't "can it do the job" but "can we find enough people with knowledge and will this scale with more people". The answer was no and migration to first Mesos and then Kubernetes was done.

For developers, not much changed in the end (push code, receive rollout progress), technically, traffic is still the same too (external -> Load Balancer -> Application), but the glue is much more 'standard' making it way easier to support, buy support, and hire people that already know how it works.

Our "Big Data" is just a fat cluster of machines with Apache Spark, no containers. Technically we could add a container in there somewhere, but there is no benefit, so we just build server images that do exactly the same thing. I think that containers only make sense if you need to run multiple things per node, or if you want things that you can re-use (knowledge, existing software, local vs. remote environments). For everything else it doesn't really matter, a static binary does the same thing. That said, the people finishing software engineering school hardly know how to compile their down code, let alone link it statically.

While containers can solve a technical problem, they are mostly solving organisational and knowledge problems. Applies to microservices in some scenarios as well.


👤 tombert
I don't think this will get me in trouble; the iTunes side of Apple, for a long time (they might have changed semi-recently), was using a home-spun, container-free distribution platform called "Carnival". It was running JVMs straight on Linux boxes, and as far as I know, was not using any kind of containers.

There was talk of moving to k8s before I left, so it's possible this is no longer true.


👤 bittermandel
My previous employer is one of the biggest game providers (in terms of daily users) in the world, and we ran a incredibly stable infrastructure based on VMs running on KVM with a home-built bash script to deploy our application servers everywhere.

Every bare-metal server is essentially setup identically using Saltstack, and then each virtual machine is setup identically on start as well. This allowed us to spin up, down or replace 1, 10, 100, 1000 stateless VMs for each game in a very short time period, with all the servers having identical configuration and deployed on.

Databases also have a similar setup, though a lot more complex as they are stateful and cant be removed with short notice. Stateful workloads are really hard and require more domain knowledge than a java application, so we decided to not virtualize the user-critical databases and keep them as they were to not embrace the complexity.


👤 deckard1
Many years ago I worked at a place that deployed thousands of bare metal servers. We were effectively running our own cloud before the cloud became a thing.

The way it worked was simple. We created our own Red Hat variant using a custom Anaconda script. We then used PXE boot. During OS install this script would call back to our central server to provision itself. You can do that a few ways. If I recall, we baked in a minimal set of packages into the ISO to get us up and then downloaded a provision script that was more frequently updated to finish the job.

This is still a fine way of handling horizontal SaaS type scaling, where you do a sort of white labeling of your service with one customer per VM. Swap Postgres/MySQL for SQLite on each node and everything is just stupidly simple.


👤 seti0Cha
Place I worked until recently had (and probably still has) the majority of the site running on bare metal. Java stack, home grown RPC framework, home grown release system that boiled down to a whole lot of rsync and ssh commands by one controller script which knew how to do things in parallel. Configuration was through files which lived in source control. Our servers were hand packed, which would have sucked except we had a fairly limited number of (gigantic) services. We handled load on the order of millions of requests per minute. It actually worked surprisingly well. Our biggest pain point was slow startup times from giant services and sometimes needing to move groups of services around when load got too heavy.

👤 claytongulick
I've done relatively large scale projects without containers.

In one case, we were running something like 80% of all auto dealer websites on two bare metal web servers and one bare metal SQL Server db, with a fail-over hot replica. Quite a bit of traffic, especially on holiday weekends, and we never came close to maxing out the machines. This was in 2007, on fairly modest hardware.

I used to write fulfillment systems for Verizon, we handled about 30,000 orders and 10,000ish returns per day, with pretty complex RMA and advance-replacement logic, ILEC integration and billing in Business Basic on NCR Unix, with complex pick/pack/ship rules and validation. Again, that was a single bare metal db server, SQL Server and a web server with SOAP/XML/WSDL services (this was in early 2000's, on laughable hardware by today's standard).

I was part of writing a healthcare claims processing system that did about 1TB per day of data processing and storage, on a single bare metal SQL Server instance and OLAP cubes for analytics.

I've also been involved in projects that took the opposite approach, Kubernetes, Kafka, CQRS, etc... in order to do "massive scale" and the result was that they struggled to process a few thousand health care messages per day. Obviously the devil is in the details of implementation, but I wasn't particularly impressed with the "modern" tech stack. So many layers of abstraction, each has a performance and operational cost.

These days I mostly use Node and Postgres, so I haven't had a lot of need for containers. npm install is a pretty simple mechanism for dependencies, I try to keep the stack minimal and lean. With the current cloud offerings of hundreds of VCPUs, hundreds of gigs of memory and petabytes of storage, it's difficult for me to envision a scenario where vertical scale wouldn't meet the needs of any conceivable use case.

This works for me, partly because I'm a fair hand at sysadmin stuff on linux and prefer maintaining a well-tuned "pet" over a bunch of ephemeral and difficult to debug "cattle".


👤 20after4
Wikipedia is just recently in the process of adopting containers for a lot of services, however, up until recently everything pretty much operated on bare metal and many things, most notably the mediawiki web services are still on bare metal.

All the infrastructure configuration is managed with puppet and that's all in a public git repo:

https://gerrit.wikimedia.org/g/operations/puppet/+/refs/head...

How I know: I worked for the Wikimedia Foundation for ~7 years, until February of this year.


👤 dottedmag
Our use-case might be idiosyncratic, as we try our best to have software that does the least amount of work possible. Nevertheless, we've got a sizable codebase.

We use Docker for registry and code delivery. We use Kubernetes for rollouts. These are not critical, we periodically reevaluate if we want to continue using them. We've designed our software to work well even if some components shut down, so we don't need any fancy rollout strategies: just take everything down and boot a new version.

Our strategy:

1. Go codebase. No CGo. Build artefacts are static binaries. They are copied into Docker containers. 2. That's all.


👤 KaiserPro
Large VFX places used to not use containers for large scale stuff.

Where I worked we had something like 40k servers running about 80 different types of software. It was coordinated by pixar's alfred, a single threaded app that uses something that looks suspiciously like the athena widget set. ( https://hradec.com/ebooks/CGI/RMS_1.0/alfred/scheduling.html )

It was wrapped in a cgroup wrapper to avoid memory contention


👤 dijit
depends on what kind of scale.

I used to make online games and our gameservers at launch were in the order of 100,000 physical CPU cores and about 640TiB of RAM spread across the world.

But we did this: On windows, with a homegrown stack, before kubernetes was a thing (or when it was becoming a thing).

With the advent of cloud we wrote a predictive autoscaler too. That was fun.

I don't work there anymore, and they moved to Linux, but they're hiring: https://www.massive.se/

You can learn a lot from a technical interview ;)


👤 blacklight
Booking.com - at least until I left in 2018, not sure about now.

Their code base is (still) mostly in Perl (5), running on uWSGI bare instances, installed on KVMs deployed on a self-hosted infrastructure.


👤 0xbadcafebee
I mean, sure, any statically compiled application can be deployed at scale without any dependencies at all. If you don't consider the application's 1GB worth of compiled-in SDKs and libraries to be bundled dependencies. :)

Back in the day we used to continuously deploy to thousands of servers without VMs or containers. Probably the largest-traffic sports site of the 2000s. But the application was mostly mod_perl, so dependencies still had to be managed, and it was intermittently a tire fire. There was no abstraction to run the applications immutably or idempotently. We actually acquired a company that had built a whole immutable build/deploy system around RPMs, so things became more predictable, but still host OS limitations would cause random bugs and the management interface was buggy. Re-bootstrapping hosts to shift load in the middle of peak traffic is a huge pain.

Containers would have been great. We could've finally ditched Cfengine2 and most of the weird custom host-level configuration magic and just ship applications without fear that something that worked in dev would break in prod due to a configuration or build issue. We also could have changed load patterns nearly instantly, which you can't do with VMs as you have to spin up new VMs to have a new host OS (not that we had VMs, what a luxury!)


👤 AtlasBarfed
I had to build bespoke orchestration and access layers.

The gulf between "parallel ssh and a bunch of bash scripts" and "Ansible/Salt/etc" is a huge one.

I wrote an access layer framework that abstracts cli command delivery to nodes, has a configuration layer where env / cluster / datacenter / rack / individual node configuration settings can be applied in a cascading/overriding fashion.

I use this to do adhoc stuff (using the groovy shell) as well as write backup and restore and other automation programs. They can run from the IDE and be debugged, they can run from CLI.

I have a crude directory tree organized "workflowish" organization to spawning cassandra, kafka, zookeeper, elasticsearch, etc clusters and tearing them down.

It works better than parallel. ssh and bash scripts by far, but I'm the only one that knows it. Is it "better" than just using Ansible or Salt or Kubernetes? Uh... don't know. Ansible got thrown out as messy (it became a massive monorepo). Our stateless servers haven't even fully jumped to k8s. Salt has too many servers.

I'll try to open source it, but I doubt it will get traction. It will be useful for certain operations in Cassandra / Kafka / etc in that it is framework agnostic and isn't trying to take over the world, so we'll see.


👤 jesterson
CTO at e-commerce. Operating full-chain of e-commerce from customer to supplier, including backends for planning, support, marketing, operation etc. Thousands of orders daily, millions of visitors on webfront.

Not a single container. Don't see any possible use or benefits of it whatsoever. It's trendy, cool, but if you just want to get things done, avoid mess in your infrastructure and avoid accumulation of tech debt it's probably not the tool to go with.


👤 chewmieser
AWS-specific but prior to our migration to ECS containers we used their OpsWorks service. This worked reasonably well - we would setup clusters of servers with specific jobs and autoscaling groups would spin up servers to meet demand using Chef cookbooks to set them up.

We used a bash script to handle what we now use GitLab's CI system for. Deployments were handled through CodeDeploy and infrastructure would be replaced in a blue/green fashion.


👤 tschellenbach
We do, power activity feeds and chat for a billion end users using cloudformation, cloud-init and puppet on the infra side. Code is based on Go, rocksdb and raft.

Don't see why we would want the overhead of K8 and docker. Infrastructure is already defined in code, everything is redundant and automated, why have 1 extra layer that can break and cause performance issues?

Use docker for local dev though its good for that.


👤 ex_amazon_sde
Amazon.

It uses an internal tool but the implementation is not important. Applications and libraries and packaged independently just like any Linux distribution.


👤 freedomben
A previous company I was at we built a new AMI for each prod release, and used EC2 auto scaling groups. I much prefer k8s, but that method worked fine since we were already vendor-locked to AWS for other reasons.

I'm not sure what you mean by:

> without relying on the specific practice of shipping software with heaps of dependencies.

Do you mean like Heroku? That usually gets expensive really quickly as you scale.


👤 time0ut
I manage a mid-size fleet of EC2s across 4 AWS regions with several auto scaling groups per region. The VMs themselves are general purpose and the image is built with our standard set of dependencies. All of our applications are homogeneous as far as stack, so the process for managing the VM ecosystem is pretty simple.

The various application binaries are deployed through a small set of SSM documents (with an automation layer built on Jenkins in front and using S3 to shuffle things around). The design predates me, but I've made small improvements to it. It works fine and its cheap as far as AWS stuff goes.

We are slowly migrating applications to ECS Fargate which I have yet to form a strong opinion on. Docker is a nice experience even if the leanest artifacts are still chunkier than a naked application.

For reference, I have built some larger and some smaller stuff using k8s and even docker+machine once.

May not be at the scale you are thinking, but it serves maybe 100k users spread across 10k customers working in 30 something countries.


👤 xeus2001
We currently run a Java Monolith that is build on every push to master on Gitlab pipeline, when successful the fat JAR is copied to S3 (including all resources) and then a config file is pushed to S3 to ask DEV servers to run it. The machines are pure EC2 instances with the service registered in systemd with auto-restart after 5s. A simple shell script downloads the config file, the fat JAR and runs it. We detect the environment and machine we're at using the EC2 meta- and user-data, which is set when the EC2 instance is launched. All of this is basically simple plain batch script using JQ and other standard tools. It is a little more complicated, because we have graceful restart by first asking a specific instance in each cluster to update itself and when it reports that is is updated, the next instance will be asked to update. All the EC2 instances in a cluster are behind a simple Global Accelerator and are in multiple regions. The service has an API that is invoked by Gitlab with a token to ask for graceful restart, then it will report back to GA that it is unhealthy for some time, so it can finish pending requests (close sockets) and then it simply terminates, the rest is back up to bash script and systemd. To deploy or redeploy to an environment is as well a simply click in Gitlab UI and especially important for On-Call, so you can see which version was deployed to which environment when by whom. Additionally some JIRA tickets are automatically created. Eventually this only needs Gitlab, EC2 instances, Global Accelerator and Bash scripts, while allow us to be multi-region and have stateful connections, we can even ask clients to directly connect to specific instances or ask GA to redirect certain ports to specific instances. Basically GA is our load balancer, router, edge location. It is stable, fast and easy with the smallest amount of pieces involved. We can remove individual instances, update one instances in a specific region for testing to a new version and so on.

👤 jshen
I’m not fully sure I understand your question, particularly the part about scale. By scale do you mean a large organization (I.e. thousands of developers with highly heterogeneous tech stacks), or by scale do you mean a high volume of traffic?

If you mean high volume, I’ve seen people do it with serverless stacks like Google app engine or AWS lambda and some managed databases.

If you mean large organization, then I haven’t heard of anyone doing it without containers or VMs. It’s not that alternatives don’t exist, it’s that it’s nearly impossibly to get everything onto an alternative and containers are extremely flexible. For example, large companies often buy other companies and it’s prohibitively expensive to rebuild it all relative to the cost of putting it in a container and leveraging the build/deploy pipelines that they have already. Same thing with ancient legacy tech, far easier to containerize it than rebuild it.


👤 musicale
1. Containers != Docker.

2. If all you need are resource limits, processes in cgroups work pretty well.

3. Networking adds complexity - avoid network namespaces if you can, and use use an abstraction layer so the application doesn't have to worry about things like wire formats, encryption, TCP connections, IP addresses, and port numbers.


👤 tikkabhuna
The finance company I work at has historically been copying jars and booting them up with some scripts. Servers all bare metal and some tuned for performance. We have a lot of pet servers as different users needed different tools and asked a sysadmin to install it.

We’re now heavily moving towards containers and the primary motivator is choice of languages and standardisation. Being bare metal is fine when you use a single language, but you’ll find you shoehorn other languages with the same process. Interpreted languages (Node/Python) are a nightmare and you’ll have to find a pattern for running multiple versions on the same host.

Containers is just a real nice deployment unit and is well supported by other tools.

If you are really keen on this path, do consider up front how you will handle version upgrades of the runtime or dependencies.


👤 lowbloodsugar
Is your question "How do I write software without tons of dependencies?" or "How do I ship software that has tons of dependencies but without shipping the dependencies with them?" or "How do I ship software with tons of dependencies without using containers or VMs?"

👤 wizwit999
Most AWS dataplane services are on ALB + EC2, some of the newer higher level ones do use containers.

👤 jokethrowaway
Can't say I'm working at scale but one of my product is latency sensitive and we stripped docker because it was slowing down each request.

I never really got to the bottom of it, someone linked me a bug in the interaction of docker / linux kernel (now fixed) which could have caused it, but I don't have time to waste chasing docker performance.

Ours is a fairly simple setup: one postgres db per machine, one python app per machine on $cheapVPSProvider; number of instances goes up and down based on traffic (basically cloning one of the machines); a load balancer in front; data gets updated once per day and replicated; auth / subscription status data is stored in redis


👤 tiffanyh
I'd imagine the answer largely depends on whether or not your company either builds or buys software.

If your company builds - it's no garuntee containers are used (but it's a choice).

If your company buys software - I highly doubt containers are used at all.


👤 nailer
Probably most people? Newer apps often use MicroVMs, older apps often use Xen VMs. Containers aren't the only containment mechanism and some implementations are known as complex time sinkholes.

👤 DeathArrow
Right now I don't see a good alternative to containers and orchestration. At my last workplace we developed a microservice based apps which run in Kubernetes. It was running smooth, it was easy to load balance, easy to scale and we had a very good availability.

At my current workplace we develop a microservice based app (kind of) but each service is running on the physical same server. No load balance, no failover and I don't want to know what the availability will be when it will be exposed to outside world.


👤 ryanjkirk
Up until recently, I was the steward of a large, distributed, and profitable app that used no containers. The infrastructure was managed with packages and puppet, and it worked well.

👤 kristjansson
> ... without relying on the specific practice of shipping software with heaps of dependencies. Whether that be in a container or in a single-use VM.

Either you have to package your dependencies with your software, rely on your deployment environment to have all of the dependencies available, and correctly versioned, or write software without dependencies.

Since you seem to have a pretty specific pattern in mind, I'd be curious to know more about what you've envisioned or are dealing with.


👤 asciimov
I know a few places that do. Their systems were already super-reliable and the decision makers don't feel the need to change things just because new tools are available.

👤 brunojppb
StackOverflow is known to operate their own servers on their own datacenters and as far I recently read, they are still tuning with no container tech. https://nickcraver.com/blog/2016/05/03/stack-overflow-how-we...

👤 kbsali
See Pieter Levels who is running alone (with the help of a freelance sys admin I believe) multiple high traffic web projects on a single VPS with crazy numbers (and supposedly a unique index.php for each project ) https://twitter.com/levelsio/status/1506202608104783878

👤 pdx6
When I worked at Okta, they used EC2 without containers for their primary monolith workload. Scaling would be manual via ElB. The latest hack shows they are using NLB now, but I suspect it is still EC2 only otherwise.

To distribute database load, a cell or “block” of instances would be filled until the largest database was at 80% write capacity or so, then a new account and group was spun up for newer customers.


👤 jdavis703
I worked one place that did so. Our traffic was sharded over a couple dozen sites, but combined we were at US top-100 scale.

Every site was on a very large bare metal box (sites were grouped together when possible, IIRC only one required it’s own dedicated machine). Each box was a special snowflake.

The DBs were on separate hardware.

When I left they were starting to embrace containerized microservices.


👤 jatins
Another benefit of containers is how simple deployments and scaling are with some of the managed services like Google Cloud Run.

Reading through the comments, the options used by others for deploying on VMs (Nix, Rsync etc) seem much harder to manage. Curious if there are simple-r deployment strategies, or tools/services for deploying on VMs.


👤 krageon
What does "at scale" mean? The stage at which distributed systems become absolutely necessary depends strongly on the technical culture and the expertise of the engineering team. Unless you are an absolutely giant vendor, you probably don't need such a stack to begin with.

👤 tyingq
There's certainly a lot of large scale VmWare out there, though that's somewhat the same idea.

👤 robot
We use Elastic Beanstalk with NodeJS. It works seamlessly, it does use containers as the underlying infrastructure but we don't see them, or manage them. Deploying takes not more than 10-15 seconds. It doesn't break. Devops maintenance work is very little.

👤 bullen
I use sandboxes instead of containers.

If you can stick to using Java the JVM can hot-deploy code in real-time without any downtime. It leaks memory but if your systems are minimalist you don't need anything else than one JVM process per machine for everything!


👤 pjmlp
We do in plenty of our workloads, using classical Windows VMs or AppServices.

👤 firebaze
We do. Our containers are AWS EC2 instances. We briefly used Kubernetes successfully, but decided that it's not worth the maintenance burden.

(small SaaS, max out at ~100k concurrent users right now, but growing fast)


👤 alyx
Windows, bare metal, 800+ processes, no containers. Guess the service ;)

👤 w10-1
Perhaps eBay? For a decade or so they had their own home-grown stack and ran auctions with transactional semantics worldwide on their own farms, long before netflix or AWS even existed.

👤 donatj
We're still nginx + PHP + Aurora something like 30 million users in and ten years later. Horizontally scales beautifully. Couple small microservices but no containers outside of CI.

👤 songeater
As a longtime pseudo-lurker on these boards, and def not a software dev - this seems to be the platonic deal of an ASK HN question. I'll ASK HN if that is the case.

👤 andrewfromx
Container technologies at Coinbase Why Kubernetes is not part of our stack

By Drew Rothstein, Director of Engineering

https://blog.coinbase.com/container-technologies-at-coinbase...


👤 anonymoushn
While I was at Cloudflare we used bare metal servers and we deployed software and configuration using internal debian repositories and saltstack.

👤 jcadam
Is using CUDA inside a container still a massive PITA?

👤 Groxx
Everything that runs the container systems does, if that counts :)

So basically everyone that's not 100% third-party hosted. If only a little in many cases.


👤 kristianpaul
Containers run on vm’s so those needs to scale as well anyway. So probably cloud provides dont run everything on containers…

👤 nunez
Stack Overflow/Stack Exchange maybe?

👤 DGAP
Slack is not a fan of containers evidently.

👤 frellus
Great talk here from HCA Healthcare about their architecture and tech stack using Elixir OTP, Riak:

https://www.youtube.com/watch?v=cVQUPvmmaxQ

TL;DR - Elixir/OTP for fault tolerance and process supervision, Riak as a KV store (no other database) and an interesting process-actor model for patients I found delightful. Bonus: hot code patching, zero downtime


👤 zn44
king games (candy crush) run on prem without containers while i was there (until 2016)

👤 unixhero
The 90s and 2000s did.

👤 seb1204
Wikipedia?

👤 inopinatus
> are any organizations out there who operate at web scale without relying on the specific practice of shipping software with heaps of dependencies

Yes there are. Ultimately you want to put files on a server and start a process. The disposition of the scaffolding is a matter of taste. Aside from containers and virtualisation, there are two common alternatives I've seen:

1. deploy code from a tag or branch in a revision control system (perforce/git/hg etc), with OS-level dependencies and provisioning handled by one of the so-called "infrastructure as code" tools e.g. puppet/chef/salt/ansible etc.

2. package code for deployment using OS-native packages (e.g. .debs), with OS-level dependencies handled natively, and provisioning in a pre/post-install package script.

How they are similar:

* Both still allow use of userspace partitioning, whether it's a dumb chroot, jail, zone or whatever.

* Both still need some kind of workload scheduling, and a release tool.

* Both approaches are from the school of "choose boring technology".

* Compared to containers, both approaches tend to leave resources underutilized, unless the scheduler's bin packing is very good.

* Both are slightly more coupled to OS release cycles than a container or virtualisation approach to distribution.

For scaled entities the glue between the parts, any dashboards, and often the scheduler, are likely to be bespoke. Some PaaS vendors were explicitly designed around the first type (e.g. Engineyard, Opsworks) but like Mesos they're half dead now. I've seen one very large brand try to run this inside their CI/CD pipeline, which you can shoehorn in, but it doesn't fit well due to conceptual mismatch.

How they differ, in my experience:

The first is really easy to get going with, since it's barely a conceptual step beyond installing locally by hand. Your release tool may ultimately be a wrapper around git. It suffers from long deploy times (generally due to local compiles) and can be brittle, in part because config management scripts are an afterthought for many developers. Unwinding a broken deploy is (usually) horrible. Developers often like this style, but the ops tech debt builds rapidly.

The second kind (native packages) is more easily managed and certainly deploys faster, and has a lower attack surface. You need a more sophisticated and probably centralized build service+package repository (a wise architect will use whatever the OS vendor uses). Unwinding a broken deploy is (usually) easy. Developers often complain about this style because they're forced to consider operational concerns, although I personally think that's an excellent forcing function at work. If you like the idea of shipping a compiled statically-linked binary to bare metal, this is probably the best way to enable it, short of baking an entire machine image.

If it wasn't obvious from the remarks, I am personally quite fond of the second option.


👤 softwarebeware
Is this part of some new anti-container sentiment?