Any interesting stories to tell?
I'm trying to get a sense of what MySQL is like at massive scale.
Thanks ^_^
There were always concerns about the latency involved in the replication. Whenever I did large scale batch operations that needed to write lots of records in master servers, I would monitor the replication queue length and suspend operations when it got too far behind to keep things sane and healthy.
The biggest failure in this architecture was sharing sql queries as opposed to sharing service calls. This is something that was being fixed as I left. Switching it to work with service calls gives more flexibility in the underlying implementation and opens more options for improving the system.
For most kinds of large systems I’d recommend thinking about the architecture in terms of distributed Actors hosted in something like DAPR, Orleans, Akka, or Erlang/OTP and then the technical details of the underlying database become less of a constraint. It grants you flexibility for your underlying storage that you would not otherwise have.
Fun side fact: Bos, the CTO of meta, was a key person working on making all the magic happen there back then.
(though it's only used as the backing data store for their graph database https://www.usenix.org/system/files/conference/atc13/atc13-b...)
At this point differences between databases start to shrink. Most of the complexity and features are being moved to the application layer, and your database is becoming dumber.
More subtle (and frankly boring) features are becoming important. Like how cost-effective is it to run, observe, backup/recover, etc.
Honestly, I feel like most interesting stories are before "at scale". Stories about how you delayed the need to "scale" for several years (and probably saved the company by allowing to focus on product instead of infra). But unfortunately those stories aren't something people tend to brag about. Nobody writes stories about how they chose to be pragmatic and not scale because they could just buy a bigger server or add a caching layer.
1) RDS/Cloudsql is awesome but too expensive, we can't afford it. The things it provides are not that special with VMs anyway (images, backups, replication).
2) The industry moved to vitess db, but converting to this is too daunting, we are focused on automation and cost reduction instead.
3) Use index based sharding (no math shards % total, no consistent hashing, no drama). When you get a new user, setup where they should be on each shard and keep that on a single table [user_id,video_cluster_id,session_cluster_id,..]. This way moves are easy. Then video_cluster_id is a load balanced host infront of a fleet of replicas.
4) The hardest thing you will run into is write saturation. You can't write fast enough to a disk. Advanced topologies and 3) helps here. Example: [MasterA] <- [Slaves,0..n] <- [MasterB] <- slaveb,0..k]. This way you can push more writes to MasterB, and add replication filters so that slaveb doesn't need to perform all the writes MasterA needs.
5) SSDs. Don't bother with anything else.
6) We would pay a million dollars for this tool: https://docs.percona.com/percona-toolkit/pt-online-schema-ch...
7) GTID took away the hard parts of mysql replication.
8) Hardcore mysql experience is disappearing. With RDS/Cloudsql/Postgres less and less developers are going able to patch a production problem.
I gave a bunch of MySQL conference talks from 2011 through 2017, including several focused on scaling and sharding at social networks. Looking back through them today, a lot of content from the scaling/sharding ones honestly isn't too useful if you're trying to scale up a new MySQL deployment right now.
What are some Postgres examples?
As a Postgres fan, I'm genuinely curious. Please don't take my comments as troll-y.
Does anyone know if MySQL 9 is on the roadmap? Or are we going to get mostly small iteration and bug fix release only?
https://www.google.com/search?q=mysql+site%3Ashopify.enginee...
In 2021: "We saw 1.8M queries per second on our massively sharded MySQL databases." https://twitter.com/codeascraft/status/1466174452459196420
Here's a post from a decade ago: https://www.etsy.com/codeascraft/two-sides-for-salvation
Sharding is at the root of storage management, which let's users design a strategy that fits their storage and request benchmarks. The biggest difference that I've seen is that you no longer get to use many of the MySQL operations and functions that you may be used to like relationships. It's analogous to the best and slightly painful parts of MySQL and NoSQL. Things like relationships get relegated to the application.
I _highly_ recommend having someone or some team in charge of db architecture. Use the "remarks" column to make a data dictionary that's stored right in the table. Have an _exact_ specication for table names (plural nouns), column names (agent_id (always a fk), agent_name), index names (udx-agents-agent_name), fk names (fk-leads->agents).
But, here in lies the rub: ALWAYS implement today's requirements, not next-years. If your app can be powered by a single mysql instance, _just do it_. You'll be in a much better position to scale later that if you pick some excessive technology you _think_ you might need in the future.
It's been 9 years since I worked at deviantart.com but last I knew the site still operated on a large sharded MySQL cluster divided into something like 32 individual shards where each user's data was on a shard determined by their userid.
Essentially just `shard = userID % shardCount`;
The application would connect to all of the servers and for many queries you have to send the query to all 32 servers, or some subset based on the records you seek, and then merge the results of all of those queries.
Large but smaller ones were federated (divided by tables rather than rows) also with group replication and replicas. The most interesting case was switching over a MySQL 5.6 InnoDB to a 5.7 TokuDB storage engine. Bringing up a new replica used to take hours to catch up depending on how old a snapshot you were starting with. Using TokuDB it took 10s of minutes, well under an hour. TokuDB isn't quite as effective as LSMTrees but MyRocks/RocksDB wasn't (isn't?) ready for 5.7 and last I checked had some stricter requirements.
The best think about MySQL is how (relatively) easy it is to set up replication with multiple-writers, and even load-balance/failover behind a TCP proxy like keepalived. Backups are pretty easy to script, stop replication on an instance, stop it, copy files, resume (and let it catch up replication, then re-add to query pool).
When you scale vertically to 512GB memory servers using NUMA memory, things get tricky. Had to tweak the BIOSes to get the same spec'd machines to behave mostly the same w.r.t. thread utilization and settle on a memory buffer partitioning scheme that minimized lock waits. Also split off slow queries to 'reporting' read-replica instance(s).
Nowadays folks would probably prefer managed services like AWS Aurora.
Many budget hosting providers supporting CMSs like WordPress and Joomla run very large multitenant MySQL or MariaDB instances. These instances often get into performance trouble when one or more of the customer sites on them grow large or get traffic spikes. The CMSs offer elaborate caching strategies (based on redis, memcache/d) to help mitigate this. But of course the multitenant nature of these instances means that individual customers must configure the caching. And the lowest-end hosting providers don't offer redis or memcache/d.
It's possible to monitor various operational parameters (via `SHOW GLOBAL STATUS` and similar SQL statements) to detect problems. It's very hard for customers to mitigate configuration problems (like insufficient buffer pool sizes or too few maximum connections). But customers can change table indexes.
Other hosting providers operate large numbers of small separate MariaDB and MySQL instances, one per customer. Considering that WordPress powers something like 30 - 40% of the sites on the internet, the total number of instances is vast.
WordPress.org, a vast site, uses multiple read replicas.
Version upgrade inertia is a big problem for hosting providers. One provider (whom I will not name) runs a MySQL version that reached end-of-life well over a year ago. New-start MySQL / MariaDB project designs should make provisions for DBMS version updates. Updates are worth applying because the development teams do lots of performance-enhancement work.
Took a bit of fiddling around to have things run stable. Currently migrating to Redshift, which has stellar performance but comes with its own problems.
Newer features are being built on other data stores, and some things are being migrated away, but the core stuff is still in MySQL.
depends what you mean by massive scale, but mysql can certainly handle a website of say, stackoverflows size, on beefy hardware with read replicas. facebook would be another story entirely
High QPS is in the eye of the beholder, but let's say something like a few hundred thousand QPS and up.
If you have scale in just one of these 3 dimensions, the solutions are relatively easy (except maybe for high write QPS, depends on the nature of the writes). If you are at scale on all three dimensions, solutions get harder (sharding, non-innodb storage engines).
Of course, there is also the wildcard: the nature of your queries/transactions. Large transactions along with scale will make your life a living hell.
Our scale has handled by sharding based on userId and geo. We could dynamically shift the shards when scaling the cluster up and down. I wish we had open sourced that sharding layer, it was pretty nifty.
[1]: https://www.xarg.org/2011/10/optimized-pagination-using-mysq...
https://www.uber.com/blog/schemaless-part-one-mysql-datastor...