Now the real problem is how to make them production ready. If we add TLS, it will become super slow to connect new connections. I think per core can handle few hundred new connection with TLS. Reconnect can be faster.
How did you solve the TLS with websocket problem? What happens when 1 million connections get disconnected and try to reconnect at the same time? What is your reconnection rate per core?
https://github.com/prettydiff/share-file-systems/blob/master...
As a bonus, it also provides you the ability to cycle (redeploy/restart) your services without your clients having to reconnect (that's where the name comes from). And as you can imagine - because communication with your services is entirely stateless it scales like crazy.
I think the distributed system term for your problem is called the 'thundering herd problem,' so searches that involve that would likely be fruitful. "Thundering herd websockets" would probably be fruitful.
From a reliability perspective, implement exponential back-off on the client that includes jitter. This is a core necessity in all clients. I only skimmed this article, but it looked right: https://aws.amazon.com/blogs/architecture/exponential-backof...
When Signal had outages from the increased load during the WhatsApp exodus, it was due to this not being implemented in their clients.
Additionally, consider your load balancing architecture. If one machine goes down, do all reconnects go to that machine, or do the reconnects get distributed to all the machines? Can you administratively drain a machine? Can you quickly allocate some spare capacity?
Lastly, you can get into situations where your entire infrastructure is overloaded. You will need a throttling mechanism. That throttling mechanism can synergisticly work with your load balancer or client. If you benchmark your server and it can only handle 500 concurrent re-connections, then that is a hard limit you know you can enforce fail-fast behavior with.
Summary:
Clients implemented with exponential backoff and jitter
Loadbalancer architecture
Defensive "fail fast" throttling or ability to administratively throttle.
And if the servers are so overloaded that the load level endpoint fails to respond? That’s fine because that’s answer also.