If so why? They all offer almost identical services. Do small (but maybe significant?) differences or unique products (e.g. Spanner) make such a big difference that it has swayed someone to switch their cloud infrastructure?
I wonder how much these little things matter and how such a transition (in partial or as a whole) went along and how key stakeholders (who were possibly heavily invested in one cloud or felt they were responsible for the initial choice) were convinced to make the switch?
I'd love to hear some stories from real world experiences and crucially what it was that pushed the last domino to make the move.
Their savings from using the credits were at least 20x what the migrations cost.
We did the migration by having reverse proxies in each environment that could proxy to backends each place, set up a VPN between them, and switched DNS. Trickiest part was the database failover and ensuring updates would be retried transparently after switching master.
Upside was that afterwards they had a setup that was provider agnostic and ready to do transparent failover of every part of the service, all effectively paid for through the free credits they got.
I think at some point Azure announced $X in free credits for YC members, and GitLab determined this would save us something like a year's worth in bills (quite a bit of money at the time). Moving over was rather painful, and I think in the months that we used Azure literally nobody was happy with it. In addition, I recall we burned through the free credits _very_ fast.
I don't recall the exact reasoning for switching to GCP, but I do recall it being quite a challenging process that took quite some time. Our experiences with GCP were much better, but I wouldn't call it perfect. In particular, GCP had/has a tendency to just randomly terminate VMs for no clear reason whatsoever. Sometimes they would terminate cleanly, other times they would end up in a sort of purgatory/in between state, resulting in other systems still trying to connect to them but eventually timing out, instead of just erroring right away. IIRC over time we got better at handling it, but it felt very Google-like to me to just go "Something broke, but we'll never tell you why".
Looking back, if I were to start a company I'd probably stick with something like Hetzner or another affordable bare metal provider. Cloud services are great _if_ you use their services to the fullest extend possible, but I suspect for 90% of the cases it just ends up being a huge cost factor, without the benefits making it worth it.
Running our own stuff on high powered servers is very easy and less trouble than you think. Sorted out the deploy with a "git push" and build container(s) meant we could just "Set it and forget it".
We have a bit under a terabyte of Postgresql data. Any cloud is prohibitively expensive.
I think some people think that the cloud is as good a sliced bread. It does not really save any developer time.
But it's slower and more expensive than your own server by a huge margin. And I can always do my own stuff on my own iron. Really, I can't see a compelling reason to be in the cloud for the majority of mid-level workloads like ours.
And while we made it work by sticking to the least common denominator which was FaaS/IaaS (Lambda, S3, API GW, K8s). It was certainly not easy. We also ignored tools that could've helped us greatly only against a single cloud in order to be multi cloud.
The conclusion after 2 years for us is kind of not that exciting.
[1] AWS is the most mature one, Azure is best suited for Microsoft products and Old Enterprise features. And IBM is best if you use only K8s.
[2] Each cloud has a lot of unique closed code features that are amazing for certain use cases ( Such as Athena for S3 in AWS or Cloud Run in GCP). But relaying on them means you are trapped in that cloud. Looking back, Athena could have simplified our specific solution if we were only on AWS.
[3] Moving between clouds, given shared features, is possible, but is definitely not a couple clicks or couple of Jenkins jobs away. Moving between clouds is a full time job. Finding how to do that little VM thing you did in AWS, now in Azure, will take time and learning. And moving between AWS IAM and Azure AD permission? time, time and time.
[4] Could we have known which cloud is best for us before? No. Only after developing our product we know exactly which cloud would offer us the most suited set of features. Not to mention different credits and discount we got as a startup.
Hope this helps.
I went from AWS (cost ~£25/mo) to Microsoft Azure DE because I didn't want any user data to be subject to search by US law enforcement/national security. I thought the bill would be about the same, but it more than quadrupled almost overnight even though traffic levels, etc., were the same (i.e., practically non-existent).
What was happening was Azure was absolutely shafting me on bandwidth, even with Cloudflare in the mix (CF really doesn't help unless you're getting at a decent amount of traffic).
In the end I moved to a dedicated server with OVH that was far more powerful and doesn't have the concerns around bandwidth charges (unless I start using a truly silly amount of bandwidth, of course).
Honestly? It's quite fun. Despite considering myself more of a programmer than devops, I really like the devops stuff - as long as I'm doing it as part of a team and I know the domain and the code - and not being that general devops guy who gets dropped into project, does devops for them and gets pulled into another one to do the same.
Working out all those little details of switching from AWS to Azure is fun and enjoyable and I also feel like I'm doing something meaningful and impactful. Luckily there's not much vendor-locking as most of the stuff lives in k8s - otheriwse it would be much trickier.
Anybody's cloud strategy should try and stick to the most basic services/building-blocks possible (containers/vms, storage, databases, queues, load balancers, dns, etc) to facilitate multi-cloud and/or easy switching.
Not that each cloud doesn't have its quirks that you'll have to wrap your head around, but if you go all in with the managed services you're eventually going to have a bad time.
Whole process took around 3 months, that was from start of creating AWS account to finish when all production environments were running on AWS and Heroku was "shut down". There was some planning ahead of this as well, so actual time varies.
Heroku was heavily limiting platform (for example, they didn't and still don't support http2) and we needed more control over our infrastructure to support further growth without paying enormous costs (for example, redis prices in Heroku are just mind-blowing).
Also as we were about to open few new markets, Heroku would have required a lot of manual work to get everything working, something which is really, really simple with Kubernetes.
Our monthly costs did go up vs what we had at Heroku at that time, but we're getting a lot more control and bang out of the buck.
Regarding convincing stakeholders, you really need to have good reasons to do it. These kind of switches are not cheap nor easy and come with bunch of risks. The easiest thing to sell is always pricing, but in that case you have to show calculations (big guys like AWS and Google have pretty decent calculators you could use) which show the switch is worth it.
As I was moving from small player (Heroku) to a big player (AWS) I also had other good reasons (better CI, better logging, better performance overview, more control in general). So it really improved a lot of things for the developers, devops and users.
I've done Aws, Azure, Google.
My basic impression, as a software engineer/site reliability engineer, is that Google >> AWS >> Azure.
This relates to sophistication of offering and design of cloud.
The dominant questions look like this: - what is the familiarity to the infra people - can we implement appropriate governance concerns - how tightly bound is your important code with the specific cloud?
I have generally focused on Kubernetes in the last 5 years, to allow the service layer to be relatively portable. This is very useful in the switching/migrating question.
My general thought process is not to use cloud services unless its very obvious (ec2, s3, etc); prefer to have k8s services provide that capability and use the cloud provider as portable COTS.
It was painful but Az has improved a lot of the sharp edges I encountered.
Our AWS bill was the main reason. It was far higher than it should have been for the traffic we were serving. Even after we'd halved our AWS bill (the original bills had been crazy), it was still kinda high
Fly was a pretty clear choice when we looked at the lower costs and ease of transitioning from single-region to multi-region infrastructure
I'd been nudging the CEO about doing a migration for about a year before we decided to make the move. When I found that I couldn't really get our AWS costs any lower and did a full cost estimate of Fly vs AWS, the wheels moved reasonably quickly
The CEO primarily cared about lowering our monthly costs and being able to do the migration reasonably quickly (~1 month)
Feature-wise, I'm just as happy. However, I trust Google more, but that probably boils to my hatred again. :)
Growing out of that mode has the team mostly focused on a single cloud provider, with a few things that'll remain on alternatives because they're better suited, and projects will clean up the rest in a couple of years.
1) Under some circumstances we might want to give very stringent uptime guarantees for some systems, and I do not trust providers to have zero global (cross AZ) outages. Having a hot standby or even load balancing across clouds could be tempting.
2) One cloud provider is very keen to get into our sector and has made extremely generous overtures which we’d be silly to completely ignore.
As I say, not something we’ve followed through as yet but both are serious considerations.
We used kafka mirrormaker 1, with 2 ways sync (new cluster have separate topics for write that are synced to old cluster, and all topics from old cluster synced to new cluster) For postgres failover switch to new master required about 1-2 minutes of downtime
We migrated ~80 microservices within 8 months and now our infra cost about 1/4 of what we paid to aws, completely worth the effort!
I mostly use only basic services so pretty much any cloud provider can fit my use case. It took some time but I have peace of mind now.
Sometimes one adopts a technology too early...
We had a longer selection process between AWS, GCP and Azure. AWS was difficult because some of our customers see Amazon as a competitor. However today we also offer the option to run on AWS. GCP won over Azure.
It failed, horrendously. Even though multiple people in the organization were calling out how bad of an idea it was they still moved forward. Google has some nicities with how things connect, setup, etc, but at the end of the day they are cloud providers and not everything they provide is a silver bullet.
This project was canceled after 3 years after spending millions on migrating since the migration was not a drop-in replacement (no one thought it was except the "tech lead").
There are a lot of things besides tech that can affect these projects. If you hire AWS experts and they are having to be AWS experts expect to need to hire GCP experts.
(At one point somebody accidentally spent £30,000 in data transfer costs with one key press.)
The project was never completed and the CTO just moved on to another fancy CTO position.
Why? Incoming CTO signed a massive GCP deal probably because it was marginally cheaper than AWS (while probably ignoring the migration costs).
Every provider has severe downtime (when even phone lines are not operational) so we do failover across several providers. Saved a lot of uptime for us.
Also we do not use (almost) vendor-specific solutions. Almost everything your cloud provider sells up to you can be achieved without using the provider lock. Will save time later when provider quality goes into sh*t (eventually happens) and you have to migrate your infra somewhere.
Company policy change. We went from each office (or even department) basically doing their own thing and having their own billing accounts and negotiating (or not) their own deals, to one central cloud deal with central billing and administration.
The change from everybody doing their own thing to having a central devops strategy we all had to work within was a much bigger change than the actual changing of cloud providers.
The SaaS tool was mostly cloud-agnostic, so the changeover was not terrible. Changing the deployments to use DO's CSI storage, setup secrets, deploy services. I stood up the entire infra on DO, then moved the DNS over one subdomain at a time.Took about 4 days to make the move, including validating everything, and finally cutting off Linode totally.
Mainly because of price: more CPU/RAM/Storage for a lower price.
I think my previous server was underpowered, because it kept swapping. Now it runs as smooth as it can (it never swaps). Migrating is a bit of a hassle though and things might not work as you expect [2]
[1] https://j11g.com/2021/12/28/migrating-a-lamp-vps/
[2] https://j11g.com/2022/01/03/bypassing-hetzner-mail-port-bloc...
Also, in both cases, it was moving from AWS to GCP, and in both cases we were using Kubernetes and not really using much of the provided services of the platforms. I suspect this is the biggest reason for Google to push Kubernetes; abstract away compute so it's easier to switch.
I wrote a blog before the implementation. [move to k3s cluster](https://tim.bai.uno/how-to-deploy-a-django-application-to-a-...)
After futzing with this stuff for years though I really would only use the IaaS options in clouds if you want to consider portability. Network, storage, compute and nothing else. The neutral abstraction is Linux for me these days, not a specific vendor!
It was a chance to re-architect the platform, make things simpler and cost effective. Huge success on all of those. Costs were reduced by millions a year.
Other than GKE, there was not a significant technology reason to move to Google Cloud. AWS didn't even have a managed version of Kubernetes at the time.
Good luck searching for a solution. I just spent 2 hours trying to figure this out and it seems impossible :(
Worst problem in my experience is all the stuff that creeps upon you with time that assumes hardcoded IPs and service names.
It was a single web app though. Not too complicated.
More often than not they realised a super dynamic cloud infrastructure was completely overkill for their business needs.
We also replaced our ~4y old very manual setup with a ground-up rewrite with terraform.
Zero regrets, but we didn't have a lot of vendor-lock in with <100 VMs and S3 + a couple of SSL things.
Google sales people made it worth the companies while and it was a good time for us to refactor the orchestration stack. BigQuery is pretty sweet.
Services were done one at a time with a VPN holding the two together till everything was migrated over.
Most because of cost mitigation.
I talk about it in detail in a google cloud podcast: https://www.gcppodcast.com/post/episode-265-sharkmob-games-w...
The primary reasons were: Ease of Use, Support and Cost (in that order).
I had a bunch of what I call "3am topics" which inhibited our ability to perform stressful operations in the middle of the night meaning we minimise our opportunity of successful outcomes when on-call... I'm not a fan of that.
I also argued (quite successfully) the case that AWS was not saving us money or much time when the alternative was renting a handful of servers from a provider.
There were attempts by AWS staff to lock us into the platform but those services (cloudfront and ECS being large ones) worked so poorly that caching/reconciliation layers were added into the product to build resiliency: all that was needed was to replace what populated the cache with something else (eventually we moved to Kubernetes which worked much better).
Cloudformation was so hard to work with (at least our implementation) that replacing it with terraform was easy: the hard part was understanding what was needed and what was fluff.
We also had to care a _lot_ about how the network was setup in AWS, there were issues with MTUs not being aligned by default in some cases for example so we had to write workarounds, and the VPCs being zonal by default meant we had much more complex setups.
Another ease of use topic was the sizing of instances, instead of specifically saying what shape of machine most matches your workload in GCP you allocate a number of cores and you're kinda done. Another ease of use topic was that discounts are provided retroactively via sustained use (though, they do have commitments too) where as in AWS you needed to very carefully carve up your requirements to get significant discounts, or write your application to be as stateless as possible (which you can do in GCP too). This is not a lot better than physical machines because the upfront work of trying to capacity plan is still there... at least if you care about cost.
Regardless, our dev cycles are much more streamlined now, it's rather easy to deploy an entirely new environment, the operations can be handled by a single individual on a part-time basis, which in my opinion is the point of a cloud provider: to save you time.
I can go into much more detail if you have any specific questions.
Saved myself a boatload and don’t regret it for a second.
and sometimes service reliability
The difference in terms of services was negligible, because they all offered almost identical services. But we did it because:
* We were looking for someone to manage the hardware and infrastructure for us.
* Rackspace's managed hardware offered higher availability than what we were able to achieve on our own.
* We had a relationship with Rackspace and they understood our needs, so we felt comfortable switching over entirely.