I have 18 years of experience as a software engineer, and I'm considering a role inside Google's SRE (Site Reliability Engineering) organization. I have very little operational experience; my career so far (and certainly in recent history) has been almost pure development.
I'm wondering if anybody could share their experiences from inside the Google SRE group.
1. For people who went from being a pure developer to a SRE, how hard was the adjustment?
2. How do you feel about being on-call?
3. How much will my lack of operational experience hurt me?
4. What is the balance between operational work and project work?
5. For people who decided to leave the SRE group, why did you decide to leave?
6. Any regrets?
I have another potential match with a non-SRE team, and I'm weighing both options.
Thank you in advance for any information or advice. It's a big decision.
1. It was an interesting adjustment. The work is qualitatively different but it's still software engineering. Just at a higher level.
2. On-call is the best I've ever had at any company. It's 12h a day max and overtime is compensated. You won't be woken up in the middle of the night. There's different tiers as well. I'm tier 1 so that means a five minute response time. That's too much for some people and I don't blame them.
3. My team is ramping up people straight out of university. You'll be well trained and have half a year or longer depending on how critical the service is.
4. In my org it's something like 25% of your time on-call at most. Depending on the size of your team it's less.
5. Not applicable. My first SRE job.
6. None so far. Google is very nice to work for. Like all jobs it'll be stressful at times but it's better here than anywhere else I've worked.
2) I have been on call for most of my career, but google is the only employer that has actually provided compensation. Depending on your lifestyle, the extra pto can be really nice (for example it’s easy for me to ski on less-crowded days), but as I’ve gotten older being on call has definitely gotten more painful (now that I have a serious partner, not being able to do things on the days she has off can be frustrating.) The actual difficulty/stress of the pages you get will be highly team-dependent, but SRE does a reasonable job of training and tracking pager load.
3) This is highly team-dependent, but there are many teams where systems skills are not a big deal. You need to be able to think in a reliability-focused way, though.
4) Some teams actually develop their own code, other teams rely on swe teams for most of the code. You will read a lot more code than you will write, and you will write less code than they say you will during the interview process. This is also true for SWEs - everyone spend more time big-company-ing than computer-ing.
5) I have not done this, but I have considered moving to a SWE role because there’s more open source opportunities on that side.
6) I’ve found google to be the best place to work of the mega corporations I’ve been, but honestly if the money was the same I’d prefer a smaller company. I’m better at tasks like “chase this bug though a bunch of layers until you find some weird kernel behavior” than “explain why your rollout plan is compliant with our reliability directives.”
In my case, the adjustment from SWE to SRE was not very difficult since I had done a lot of operational work before.
The oncall was not very intense even though my team supported a critical product. IMO, Google handles operations and SRE much better than most other companies I have worked at. My SRE team was split between two locations in different timezones so, I never had to be woken up at night. I was also very well compensated for time spent oncall (as bonuses) which I have not seen at other companies.
I don't think your lack of operational experience will hurt, if you have the learning mindset. My goal was to learn how Google operates services at scale and even with previous operational experience, I learned a lot. I had coworkers who came from a pure development background and thrived in the role.
The split between operational work and project work can depend upon the team. The goal is to have a healthy balance and some teams manage this better than others. However, project work may not always involve development and it is easy to feel distanced from the user. This was one of the reasons I switched out of SRE after a few years - I felt like I wasn't coding enough and building features for users.
I definitely have no regrets about being on an SRE team - I learned a lot and would consider going back to join one again.
edit: for better readability
Some thoughts:
- being on-call can be anything from soul-sucking to mildly annoying depending on what team you land on, but a lot of SWEs are also on-call, including at Google
- a lot of the entry SWE positions at G (any level) are things that they have trouble recruiting for internally, so unless you know exactly what team you're joining you may not find it as exciting as you would like
- Google's SREs are top notch and you can expect to develop some new skills and learn a lot
- SRE work on a good team is actually pretty fun if you like problem solving and can deal with mild stress. AFAICT most SREs are very big into no-blame culture, at least at good organizations
- retrospectively I probably would have enjoyed Goog SRE as much, maybe even more than I did being a SWE, which was: mostly
However, your life as SRE after Google if you don't take an SWE role can be very different to what you experience at Google. The reality is in majority of the companies system administration teams labelled as SRE as hiring tactic and code base quality, the type of things you develop, amount of project vs toil varies massively towards negative.
Source: Did interviews in early, passed HC for senior SWE in SRE in July with a team match, and now i'm stuck in team match purgatory.
When you were doing "pure development", were you involved in the operational side of things at all?
Personally, I am also what I would call a "pure developer", and I find SRE work really stressful and very different skillset from what I'm good at. I've often thought that calling ourselves "software engineers" is pretentious, but I would be okay with SREs calling themselves engineers because of the kind and style of work they do. I would say the things I do as a "pure developer" are more akin to creative expression rather than engineering though.