HACKER Q&A
📣 0xdeafbeef

Would you load balance traffic without CDN


I have an average traffic of 5 Gbps to the service. Currently, I'm using managed Kubernetes in GCE and paying $20,000 for traffic. I'm planning to move to another provider, but the question is how to handle load balancing. My idea is to create several A DNS records and let the client choose a random IP, but this is incompatible with Cloudflare, as it uses round-robin under the hood. After testing, it seems like Cloudflare sends all the requests to the first record. So, the second solution is to create several sub-domains and choose randomly from them. All managed load balancers will cost a ton, so that's not the way. Is there any better solution? Ideally I'd like to have one ip without any client load balancing


  👤 sredevops01 Accepted Answer ✓
Currently, how are your clients setup? What are their www and root records pointed to?

For load balancing, all you need to do is CNAME your customer to your firewall/load balancer. So you aren't using A records for this. For example, in Azure, if you spin up a traffic manager, you would get an cname like "mytrafficmanager.trafficmanager.com" and your CNAMe for www.mysite.com would point to mytrafficmanager.trafficmanager.com.

However, in this case, you would also want your customers to point to something like customer.mysite.com so that if you move from GCP/Azure to something else, you can handle that record and migrate them during a failover, incident, or any other reason.

Edit: And have customer.mysite.com point to the "mytrafficmanager.trafficmanager.com"


👤 toast0
5Gbps could be a single box, if you're ok with all that encompasses.

Otherwise, if you're serving all the traffic with a single IP, you need to do some sort of load balancing. Haproxy + CARP + cold failover is operationally simple, but you lose sessions when your Haproxy box needs maintenance.

ECMP works if your hosting allows for it. You'll still lose sessions during changes though.

Maybe something something pfsync. Or something with proxygen or some other load balancing that came out of Facebook. (I worked there, but not on their load balancers)

More details on what you're planning to do with 5Gbps helps you get advice. I'm assuming https, because cloudflare. Is it mostly static content, or mostly dynamic or mostly proxying? Does it burn a lot of cpu (so you need many boxes anyway) or ? Are you likely to attract DDoS, so you need more inbound bandwidth to accept and drop abuse? Do you expect to provide users with a SLA / what service level do you want to provide? Etc


👤 stephenr
For something that has any expectation of fairly high uptime, I think the more important aspect here is failover (either due to actual failures, or due to maintenance) not specifically load balancing, although at this level of traffic, it's definitely helpful to be able to spread the traffic over numerous internal resources rather than relying purely on vertical scaling of the backends.

If you're able/willing to manage the LB systems yourself, generally I'd stick two balancers (running HAProxy) in front of the application servers, and do IP failover between them.

If your service uses internal IP connectivity to e.g a database or redis cluster or what have you, I'd generally have each of the balancers configured as a primary for either public or private traffic, and as a backup for the other traffic. If your service doesn't use a clustered database service or anything like it, you can obviously omit the config that balances private traffic, and it's possible this aspect is "taken care of" for you using k8s.

The exact method of IP failover is going to depend on who is hosting your machines and what their network is configured to allow (i.e. Linode previously supported VRRP, but now do not, and BGP is used instead). AFAIK none of the "standard" IP failover techniques will work in GCP/etc. Whether this is due to legitimate technical limitations of business/profit-driven decisions is left as an exercise for the reader to further analyse.

Also consider that if you're using managed k8s, the provider of that service almost certainly has a managed load balancer service. I'm not sure what you mean by "all managed load balancers will cost a ton" - Linode's balancer as an example, has a $10 monthly flat fee.

You mention "5Gbps to the service" - does that mean the data is mostly inbound, or did you mean requests result in outbound traffic averaging 5Gbps?

It's not a coincidence that GCP's load balancer will charge you, according to my calculation and their pricing, $18 per hour at 5Gbps sustained - that's the entire business model of 'big cloud' businesses. Have you looked at any providers besides GCP/AWS/Azure for hosting this service? Some guesses about what you're using from your post + reply below plus some quick calculations suggest this would be orders of magnitude cheaper to host on Linode, or a similar "not AWS-alike" provider.