I'm skeptical - I was under the impression that serverless was for small "burstable" apps with relatively low traffic, or background processing.
The two products I work on are both REST APIs that send and receive data from a user interface (react) with roughly 60 API routes each. They have about 100 concurrent users but those users use the apps heavily.
The consensus on the internet seems to be "serverless has its use cases" but it's not clear to me what those use cases are. Are the apps I'm working on good use cases?
Good news: your gut feeling is correct.
Bad news: you will likely lose this battle, unless you're good at playing company politics.
Here's how it typically goes:
1. A new lead/architect/manager joins the company.
2. They push for a new hyped technology/methodology.
-> you are currently here <-
3. The team is split: folks that love new things embrace it, folks that care hate it, rest are indifferent.
4. Because the team is split, the best politician wins (usually the new hire).
5. Switch happens and internally you know it's fucking disaster, but you're still forced celebrate it.
6. When disaster is becoming obvious people start getting thrown under the bus (usually junior engineers or folks that opposed the switch).
My $0.02, having used serverless before. Those use cases are:
* Very very low traffic apps. POST hooks for Slack bots, etc.. Works well!
* Someone who is an "architect" can now put "experience with Serverless" on their CV and get hired somewhere else that is looking for that keyword in their CV scans.
Those are any and all uses cases.
We've used AWS Lambda for about 4 years, and it's been so good and so cheap that I'm shifting literally everything (except Redis) to serverless. Also, GCP has a better serverless offering (Cloud Run, Spanner), so we're switching from AWS to GCP to take advantage of that. I bet we're going to see a massive cost reduction, but we'll see.
Things I like about serverless (again, from the perspective of a very small startup, with 5 engineers, and me being the primary architect):
* It's so liberating to not worry about EC2 servers and autoscale and container orchestration myself. All our Cloud Formation templates add up to around 3,000 lines, which maybe doesn't sound like a lot, but it's a lot. There are tons of little configuration things to worry about, and it adds up. (Not to mention the sheer amount of time it took to learn.) ECS Fargate takes care of some of this, but it doesn't autoscale based on demand or anything (not without settings things up yourself). (This is a big reason why I want to switch to GCP: Cloud Run is like Fargate in that it runs containers, but unlike Fargate it autoscales from 0 based on load.)
* It's very cheap in practice, at least for loads like ours that respond to events: API services that sometimes see a lot of use and sometimes see very little use; queue consumers sometimes have a lot to do and sometimes have very little to do. AWS Lambda bills down to the milisecond in terms of resolution, and GCP Cloud Run/Cloud Funcitons bills down to the next 100 miliseconds. These are very fine resolutions and for us at least, we've seen costs be small.
* For database serverless products (like DynamoDB for example), it's very liberating to never have to think "Hm, do we have enough CPU provisioned?"
Things I don't like about serverless
* Pushing source code sucks. Lambda will just one day decide your version of Python or whatever isn't good enough and force your customers to upgrade all their user-written code to the latest Python version. (But! Cloud Run supports containers, and so this won't be a problem.)
Everywhere else, I went with traditional deployment of a monorepo.
You only have 100 concurrent users. You do not need serverless. You could serve 100x that amount easily with a simple nginx reverse proxy to your webserver.
This is almost comical enough for me to suspect that it is satire, but unfortunately there are too many examples out there of this type of thinking. It's just infra bloat and a waste of money.
What problem does migrating to a new architecture solve? Does the current deployment have scaling, maintenance, or other troubles? Going from something that's broken to something that works is one thing, but going from something that works to something else that works is pointless unless there are tangible benefits.
If the only reason is to make the Cloud Architect feel better or pad people's resumes, the correct answer is no.
I think Serverless is good in some areas, S3 and Dynamo are both good products for example.
I have a few big issues with serverless: 1. It is harder to develop for. Sure you get to ignore server configuration but honestly a well made infra team should be removing that concern for the development team anyways. The problem is when you are running it locally you so rarely can actually run the code. So setting up things is annoying, especially when you get into the final stage of serverless which is some object lands in SQS which fires a lambda, which puts puts another object in another queue which fires another lambda which load s3 which writes to a db, etc. This all ends up more complicated than just writing an application for it, but its harder to develop for it. Often the only way to actually run this stuff ends up being setting up the whole infra in the cloud and running it through that way, so that means dealing with deploys, and you lose a lot of debugging ability.
2. It doesn't save money. A single lambda that runs quickly on some event does save money vs a server all of the time. But most companies seem to over-provision servers so that's easier. But once you include the prod environment, dev environment, and the serverless things running all of the time, it does not save money since often 100 lambdas could be a single instance.
3. It doesn't save time. Developer messing around with setting up hundreds of new services and the corresponding rules and configurations and deployments and cicd pipelines , I don't think saves dev time vs a normal well maintained infra with servers and a good a cicd pipeline. Often the time savers are vs like manually configured bare metal servers moving to serverless, but there are better ways to save time.
Key concerns:
1. You're locking your platform into one supplier permanently. The exit fee is starting again. Literally burn it to the ground.
2. You're going to introduce problems when you migrate it. The ROI is negative if you spend money and achieve more bugs without improving functionality.
3. The cost estimation of every pure serverless platform is entirely non-deterministic. You can't estimate it at all even with calculators galore.
4. The developer story for serverless applications is quite frankly a shit show. Friction is high, administrative host is high and the tooling is usually orders of magnitude slower and more frustrating than local dev tools.
5. It's going to take time to migrate it which is time you're not working on business problems and delivering ROI to your customers.
As always ask yourself: is the customer benefiting from this now or in the future? If the answer is no or you don't know, don't do it!!!! Really sit down, find a sound business decision analysis framework and put all the variables in and watch it melt instantly.
All you're going to do here is put a "successful" (pah!) project under the architect's belt before he pisses off and trashes someone else's product.
As a somewhat extreme opposite of this, I would at this point never allow my cloud estate to progress past portable IaaS products and possibly Kubernetes control plane management. Anything else is a business risk.
* Scalability is real. We have some bursty traffic, sometimes with extreme burst, and we've had no problems scaling to meet that need.
* Our traffic is still predominately during business hours in the U.S. That's an extremely important point - because our site is effectively being used for only 12 hours or so per day. The remainder of the day and on weekends it's unused. We looked at the cost of using EC2 instances and Elastic Beanstalk and the full serverless is still cheaper.
What we've discovered in our cost analysis is if you have a site that's hit 24x7, 7 days per week then you'd be better off hosting on EC2. If your traffic is constant and there's not much variability over time then it may make more sense to host on-prem. In our case we have highly variable traffic during standard business hours. Serverless is the way to go for that scenario.
I think most people don't realize just how "burstable" their own traffic is. If you're looking at graphs with one-hour resolution, remember that AWS bills for lambda at 1ms increment. Not sure about Azure, though.
My company has moved several "regular" websites to serverless. In fact, we just took the existing websites (which were often Django, sometimes huge ones) and dumped them into Lambda. The exact opposite of every "what serverless architectures are for" article you've ever read. And you know what?
It's awesome.
It's way cheaper than running it on EC2, and I never have to reboot a server or worry about their disks filling up or anything. Then when traffic spiked hard during the covid lockdowns? I did nothing. Lambda just handled it.
The only serious change we made to the setup was preloading a percentage of machines at all times to remove cold starts.
I'm not saying it's trivial (zappa, serverless, CDK), but usually one guy gets it working and the rest of the dev team changes nothing at all.
Serverless functions are great if you have a lot of small services that need to be "on standby at all times".
For example if you have 5000 separate services, it doesn't makes sense to have them all running all the time if 4,000 of them have very low traffic. So one of the main benefits is that you get the ability to "increase your library of services at a very low cost". Serverless also really shines with quick stateless actions.
However, converting an app to all serverless is a huge task and for most apps it doesn't make sense.
Two major drawbacks:
1) You're bound to only the language versions that are currently supported.
2) You're writing code specifically for the platform so without a heavy lift you're "locked in"
If the goal is to go serverless and get rid of the server management, I'd suggest looking into containerizing your existing apps and deploying on a "serverless" managed service like Fargate (or your favorite cloud provider's equivalent). This approach is also lets you go to a different cloud provider if you want... or even move back to your own datacenter with no code changes.
Just don't get attached to company and don't work late.
Whether or not it's for you has a lot to do with what's important to you. How much weight you put on runtime cost, versus ease of development / deployment, versus whatever other benefits it might bring (integrated logging, monitoring, multi-region deployment, etc). And, of course, whatever downsides it brings...like cold starts, team ramp up on how it works, etc.
You are right that your use case doesn't make sense for going full serverless. An application with heavy and predictable usage doesn't gain anything by becoming serverless. All you're doing is raising your cloud hosting bill.
You can scale down app service/VM based infra either on a timer or in response to metrics, so it's not like serverless is your only option if cost is the motivator.
The question I'd have is what is the driving force behind the initiative to move to Functions? If the answer is "reduce infrastructure costs", I'd ask serious questions if the "juice is worth the squeeze" for a transition, and then create estimates for the cost of transition versus cost savings. For a 100 user app it is likely not going to have much payoff unless your infrastructure bill is a lot higher than I expected ($10-12k).
However, if the answer is "to create a better integration architecture between our apps and services" then you should engage with that. Azure Functions pushes developers really hard toward creating APIs that are discoverable and reusable, especially in a Microsoft oriented enterprise where you start seeing other tools like logic apps or the power platform start being able to produce and consume for custom functions. Over time, I've watched benefits accrue from common integration points functions drive across the organization.
So, ask questions, but make sure you understand what the organization is trying to achieve with the recommendation.
Disclaimer- I'm a Microsoft employee, but opinions my own.
Some of the possible reasons why you'd go down this path would include 1. Cost optimisation, generally not a good driver unless you have very spikey workloads (which you don't) 2. Resilience/availability, this is a pretty good driver especially if you've had issues recently, moving to serverless takes away almost all maintenance tasks and solves a lot of potential problems
Some of the main trade-offs include 1. Developer velocity, generally it's pretty hard to debug locally which you will need to spend time working out how to do it or do your debugging in the cloud 2. Cold start ups, this can be largely solved with solutions such as GraalVM however you do need to invest time to implement these solutions 3. More complex internal application architecture, you need to either deploy your entire application as a single function or break it out into multiple and you'd need to do the analysis of how it should work and the performance tuning of each option
That being said, I find the best way to look at these situations from a political perspective is to have a quick chat with the architect and look to understand what problem he's trying to solve and for you to mention your costs and take it into a cost/benefit discussion
If he says its for cost benefits, you could say it will be a x week migration timeline, which has a developer salary cost of y and delay a new feature which is expected to bring z revenue, 10-20% operational delay to pushing out new features. So what would be the cost saving total and ROI?
If there is no local state than serverless is a feasible solution, if not the best. If there is then you need to find some substitute for that local state and the case for serverless is much worse.
Operationally: You need a few more/different specifics to avoid talking in generalities. How many requests per second? What's the floor? What's the peak? How bad a sudden surge ("thundering herd") do you ever see? What's the heaviest request / worst case response time?
Then you can start comparing the two solutions under various scenarios. How much will our average RPS cost us? Will the service deal well under very low or very high load? What happens when your worst-case thundering herd hits? Does your heaviest request fit comfortably within limits?
I tried hard to think of advice on how you may remove the glasses if that is the situation here, but in honesty it is a tricky one. It is akin to a little worldview; bubble, and those are tricky to try and actively shift in others (an attempt at a suggestion: I think perhaps it would be best to come into team discussions around this not as being on "one side" but rather as being the reasoned, dispassionate expert on all sides; the whole question).
I say this is someone who up until recently wore the glasses (in my case it was for Kubernetes) - it took me a failed project to take them off, I hope that does not happen to you.
Make sure the serverless model include all gimmick you currently have such as firewall, waf, cache, ssl termination, load balancer, current traffic levels etc.etc.
The title says it all, they might as well work for Azure.
... It's likely everything is an Azure shaped nail to them, do not trust them.
But if you compare Azure Functions with stored procedures in a DB, then it's pretty cool to have a kind of hot swapping at the function level.
I'd be cautious, but with a gradual migration there's hopefully time for reflection as well. Going 100% on anything is rarely a good idea, so hopefully your architect isn't religious about this.
Serverless can also mean something like EKS with Fargate. You get to use Kubernetes without managing any servers. Azure AKS has something similar with virtual nodes as I understand, though I haven't used them. I do think this model is better for long running services than serverless functions.
Then if you ever have a bug, you can easily set a breakpoint, step through the entire execution, and fix it.
Agree that this MUST always be possible whatever architecture changes occur.
Then have a flag/config to allow specifying certain things to communicate over network instead of as function calls, and using serverless functions.
I've NEVER seen teams do this though. It's like no one can imagine they will write a bug.
We've dealt with "new guy wants to overhaul ..." scenario. When I joined this company we were a C++ shop with some Perl and bash. Multiple new recruits successfully lobbied to implement refactors, or new projects in a hot language/framework. Several of the refactors were a huge waste of resources that either didn't come to fruition, or were only partially successful.
Now, we are a Perl shop with active development in 3 other languages(not counting front end), and we're maintaining legacy apps in an additional 4 languages. And we've deprecated apps in at least 3 additional languages.
I guess I should be thankful none of them have lobbied for switching databases. :-O On any given year, we average 3-4 programmers and 2-4 contractors(mostly front end) Two of us have been there 15+ years, but the other full timers seem to move on around the three year mark. Because of that all the hot shots have left. When a major bug is discovered in their code it can take a long time to fix, and any breakage due to upgrades is quite a hassle since those of us left aren't experts at every language we have to maintain.
1. The same stocks in $cloud_platform_provider that your architect has bought.
2. A bunch of certifications for $cloud_platform_provider so you also want to lock everyone and their mother down into that platform.
What benefits does the "cloud architect" say the migration will bring? It sounds like you have a reasonable backend api setup that works. There needs to be a strong motivation to do a migration like that.
I'm also not convinced you're at a scale where you need a cloud architect, but it's hard to say from your description. I bet their main motivation is delivering a project that justifies their role.
We run a lot of services in Kubernetes, some of those services also run background jobs (same container serving both HTTP and doing bg processing). I want us to migrate background jobs from our containers to a dedicated platform (e.g. Lambdas), because we can scale to 0 when not needed, we'll offload our Kubernetes cluster (our cluster will serve only HTTP traffic that is easy to scale for us) and if done right, we should have better debuggability/observability. Also right now we orchestrate our jobs with redis which means we need a redis instance for each service with bg jobs, but I want to move orchestration to a separate service that will store the data in postgres so instead of running x redis clusters we'll just have 1 postgres.
The tricky thing is the rewrite, but frankly, we still need to do it and we don't need to rewrite whole services, just the code responsible for bg jobs.
Here are questions to ask
What is the current monthly spend? What is the estimated monthly spend in the new system?
Perhaps the new serverless system is easier for operations and deployments. Does the new system provide for better uptime/monitoring? How is monitoring done on the current system? If there is a problem, like the service returning 500s, do you have the tooling to diagnose the issue? How does this change in the new system?
What is the developer experience on the new system? Is it easy to deploy to staging and production environments? How long does it take to create a new feature? What does the develop/test/debug loop look like in this system? How does this compare to the current system?
Ask yourself and others these type of questions. Maybe migrating to serverless is better, but it should depend on the answers to questions/concerns that I listed above.
In the azure world, a more modern option going forward is azure container apps which just run docker containers, but you still will have 8+ second cold starts and will need to run at least a single instance full time, but it's cheaper than functions premium. Also would suggest looking at an evented architecture using dapr which is built into ACA. In the GCP world cloud run is frankly amazing.
If I was starting from scratch, I'd use serverless. If you're migrating everything, I think that is a giant project that needs justification. I'd ask, "What specific current problem do you have that it would solve?"
Cold boot is only a (minor) issue on the first hit, that's quickly amortized.
Some Async use cases is great but large scale apps becomes clusterfuck. Experience: we made a whole feature on AWS lambda. Sucked. 2 years later its a spring app in a container now.
The question is: what alternative to do you propose. How does your alternative reduces hardware when load is low. How does your alternative orders more hardware when load is high. How much time does it take? What's your plan if your data center is cut from the Internet because of bad router configuration?
A proper alternative for serverless is Kubernetes cluster. It'll likely cost less (for big application) but it'll require more knowledge to properly manage it.
You can use simplistic setup with dedicated server or virtual machines with manual operations, but at your load I'd consider that not appropriate.
Anyway is management decided to hire Azure Cloud Architect, the decision is already taken and I suggest you to relax and enjoy new experience.
Several folks have written about it (Architect Elevator[0] is a good blog on these types of topics, as he routinely talks about tradeoffs and ROI to the business). High Scalability's[1] "what the internet says" posts frequently highlight serverless projects (both pro and con)
_______________
[0] https://architectelevator.com/blog
[1] most recent - http://highscalability.com/blog/2022/7/11/stuff-the-internet...
It's OK... it would work.
It could be somewhat more expensive or less expensive to host, and somewhat more or less performant. (You didn't say where the data lives, but if it's in a database and you aren't doing something extra with it, then this layer might not be that important, one way or the other.)
For a 100-user app, I'm guessing the major cost here is the switchover cost. Whether it makes sense or not depends on details of where you are now and what problem(s) this is meant to solve.
No one here knows that (maybe even you don't either?) so we can't really give you an answer, just some general pros and cons of serverless.
For my own project (uptime monitoring + status pages), I got to about 500 users before serverless costs were eating enough of my profits to make me want to move to VMs. It was nice to be able to validate the idea on a service that costs zero if no one is using it.
With continuously running applications (100 concurrent users), it makes zero sense to use serverless as you're paying a high premium over a continuously running VM. I'd just use a VM and scale the number of instances serving the API.
The main issues: 1. Unpredictable performance - latency (with cold start), concurrency limits (how quickly can we scale to X concurrent requests), etc? We spent many hours with AWS support before moving away from lambda. 2. Short running process are terrible in many ways - no DB connection pooling, no in memory cache.
I'd be much more happy if AWS fixed scale-up speed of ECS tasks so you can scale up your services in a reasonable time, than having these one-shot tasks.
Personally, I was excited for serverless, but after using API Gateway and Lambda to serve a simple REST API it seemed like more work compared to using a load balancer to route requests to a container running in ECS. ECS can autoscale too, so you can scale up and down as required.
But if you have an API that is getting sustained traffic, Lambdas probably aren't your best bet -- you're going to want a container that is always running.
But to be honest, with 120 routes and 100 users, it sounds like Lambdas are a good way to go.
From my experience of 4 years with serverless in AWS following problems have been identified:
- Difficult to debug
- Difficult to collect logs - Lambda@Edge
- Slow cold starts
- Frontend and NodeJS bundling are problematic - size limits, slow and unpredictable problems
- Pricing are difficult to estimate
- Careful planning needed for network and architecture - how lambdas work together
- Workflow orchestration might be needed
What is serverless good for?
- Queue processing
- Event processing
- Internal infrastructure code
This is most likely a waste of time and the "cloud architect", like most cloud proponents, has no fucking clue what they're talking about.
It should mean “keeping state to an absolutely minimum, and relying on event-based architecture.”
Are you familiar with event-based architecture? Are you familiar with functional programming?
This is your time to shine.
There’s a strong possibility you’ll end up with Lamdas (or whatever) that are just CRUD endpoints.
That would be bad.
So be prepared to fight out what comes next.
- Easily scalable/autoscaling
- Drastically reduced operations/maintenance/devops overhead
- CI/CD can be much simpler
- Observability is built in (metrics, logging, alerting is built in)
- Built in connections to other cloud products
If you can stomach the vendor lock-in then it might not be so bad.
The advantages are:
* Lower costs from much better resource utilization rates. Comparisons against a perfectly sized fleet of servers is inherently flawed. Sure, you can make sure auto-scaling happens, but that costs time and energy to get right. Even then, you're always going to be having to leave some buffer room. Instead of saying serverless is good for bursty/low traffic, I'd frame it as serverless is great for any workload that isn't close to a fixed load. Dev and other non-prod environments also basically cost nothing instead of potentially being quite expensive to replicate multi-AZ setups. In practice, serverless is going to be cheaper for a lot more use cases than you may think at first.
* Tight integration with IaC. Your application and infra logic can be grouped based on logical units of purpose rather than being separated by technology limitations. This is especially true if you use things like CDKs.
* Zero need to tune until you get to massive scale. We went from our first user to hundreds of thousand of users with no adjustment needed at all. Even at millions of users, there's little you'd need to change from the infra side beyond maybe adding a cache layer and requesting limit increases. Obviously app/db optimizations might be needed, but for the most part, scaling problems become billing problems.
* A simpler threat model. If you're running servers, keeping them secure is not trivial. There's just a lot to less to do to keep serverless apps secure.
* Ability to avoid Kubernetes and other complicated infra management. One could argue that you're just trading Kubernetes complexity for cloud specific complexity. That's true, but it's still a net reduction in complexity.
* Operational overhead is way down. A base level of logging/tracing/metrics comes out of the box (at least on AWS, not sure about Azure). No need to run custom agents for statsd/collectiond/prometheus/opentelemetry/whatever. No need to spend any time looking at available disk space metrics or slow-building memory leaks that creep up over weeks. It just works.
* Easy integration with lots of cloud managed services. Want to deploy an API endpoint? Want to build a resolver for an AppSync GraphQL field? Want to write code that runs in response to some event or alarm going off? Want to process messages from a queue without spinning up a fleet to longpoll from it? Want to write code that applies transforms on a data stream before writing to your data warehouse? The infra definitions for all of these all share a foundation. You have a unified API for everything.
Having your API's in a bunch of different App Services is sort of a bad idea. You can do it, but you're likely going to have "fun" with how much complexity is involved with setting up the VNETs, Private Endpoints, Custom Domains, DNS stuff and different Subnets that can't be shared across App Service Plans for all those apps and their deployment slots. You're likely also going to be a significantly higher price for it than the alternatives, especially if you use containers, but it's "significantly higher" in a way that's "unimportant" because it's likely peanuts compared to developer salary, total IT expenses and so on.
That being said, an Azure Function App is still an Azure App Service, so unless your Architect means that you should consolidate your different backend App Services into fewer Function Apps, then I don't see the benefit. If you're unsure what I mean by this, it's that you can replace the 60 API routes with 60 functions in an Azure Function App.
> I'm skeptical - I was under the impression that serverless was for small "burstable" apps with relatively low traffic, or background processing.
You're not correct about this. They scale just fine, and they can handle huge workloads, sometimes a lot better than their alternative, though at the cost of locking yourself into your cloud provider.
> The consensus on the internet seems to be "serverless has its use cases" but it's not clear to me what those use cases are.
I can't speak for AWS, but the basic way to view an Azure Function is to use a simple Express NodeJS API as an example. In a standard Azure App Service you're going to write the Express part, you're going to write the routes and you're going to write middleware for them. In a standard Azure Function App you take the Express part out, because that part is handled by the Azure Function.
Azure Functions have the benefits of integrating really well with the rest of Azure, and in many cases can be really good. It's also much easier to work with them because you don't have to care about the "Express" part and can simple work on the business logic. The downside is that you're limited to what Microsoft puts in the Azure Function functionality, and that you lock yourself into Azure.
With C# you further have to consider whether you want to run your Azure Function as an Azure dotnet, or an dotnet-isolated. Again dealing with the degrees of which you'll want to lock yourself into Azure.
> So what should you do?
I think your Cloud Architect should look into Azure Container Apps, or AKS if you want less lock-in. Both are kubernetes, but Azure Container Apps sort of handle the heavy lifting for you, again, though with some of the highest lock-in that you'll find in any Azure product.
It depends a little on your actual circumstances, but generally speaking, your backend service will have an easier life in AKS once you're up and running. I wouldn't personally touch Azure Container Apps, but I'm in a sector of EU where we might be forced to leave Azure. If you're not, it's a much easier road to kubernetes greatness than AKS.