I'm wondering at what point it becomes essential for a company to give a bit of thought to incident management and start investing in a process or a solution for this.
All of the major SaaS providers have good blog posts and documents on this process. You don't need to automate and build in rostering and shit, just have a Slack channel like #on-call that says "@dave is watching outages, shift finishes at 9AM and hands over to @jess" then at 9AM @jess ack's, and posts the same message with the person who plans to take over next. It's the person who's currently on-call's responsibility to find a replacement if the other person doesn't show up*.
The main issue you'll have is rostering. If you're super early stage, everyone will be happy to do this. Once you have 10-20 employees though, you'll need to distribute this in a fair way so you don't burn anyone out - and make an exceptions process that everyone agrees to (eg @tu worked until 3AM last night on feature for customer X, @sam will take over for them).
* only do this with people who are seniors and are comfortable with having candid conversations, and won't martyr themselves.
edit: document you processes, keep updating, nothing worse than waking up at 3 am and having no clue who to contact or what to do ))
edit: update previous incidents/post-mortems
When an incident is declared, a slack room is spun up that auto-includes links to a zoom, links to the incident response process, and pages important people. We have someone other than the responding engineer be the "incident commander." Their job is to make sure we dot Is, cross Ts, follow up on action items, and generally keep the ball moving forward. "Amanda, you were going to pull the db records, how's that going? Does anyone have insights that could help John?" They work with support to get a user facing message published. They send out periodic updates so higher-ups know what is going on with a focus on customer impact. The initial goal is to mitigate impact, then figure out a fix. During the blameless postmortem, we focus on processes that failed and aim to remove human components in failure. We use these to also share processes on how people found information, fixed things, etc. From here, we decide on what is a system improvement in need of priority and design work vs things we can address at a individual or team level. We then have an internal SLO that says we will address the fixes within N sprints and higher ups pay attention.
This can all be manual to start and then you can start integrating with tooling. I enjoyed working with splunk for log analysis, graphs, and alerts, prometheus alerts, pager duty for paging, and jira for tracking betterments attached to incidents. Right before I left my last gig, we started with some saas incident/post-mortem tool and it was pretty decent but nothing to write home about. I've forgotten their name and can't find anything similar via Google.
As for when you should do this? As soon as your team is big enough that you need to write things down, share information with others, and have something worth improving.