When your data center has millions of devices, how this problem is solved at scale ? Any readily available tools ? Have a big operations team who will do only patching/rebooting ?
To look at what companies like the FAANGs are doing, see how Hashicorp's Vault models threats[1]. When in High-Availability mode, physical presence of operators is only required for Sealing/Unsealing[2]. In essence - There is a cluster of servers that can lose members, and the other members retain access to the "Master Secret". Members can part and rejoin without the "Master Secret" being lost, and will automatically distribute the secret back to those that have rejoined, if they're still included in the cluster and have their own private key. These few servers do require manual rebooting to access their private key again, but there's enough redundancy in the cluster that they can simply be rebooted during normal operator hours.
The threat model is that you will not be able to break in to a datacenter and physically suborn a server before being caught, but you might be able to walk away with a server or plug in a new one with old leaked secrets.
[1] https://www.vaultproject.io/docs/internals/security/ [2] https://www.vaultproject.io/docs/concepts/seal/