One of the biggest challenges with cloud-based inferencing for LLMs is keeping user data private. Is it possible to use both local and cloud machines together to solve this?
For example, could we run the first and last layers of an LLM on a local machine to protect the data privacy and use the cloud for the rest to speed things up? We could fine-tune the first and last layers locally to change the weights and keep them away from the cloud.
Please let me know if there's any ongoing researches using such approach for privacy-aware inferencing.
Thank you.
You should instead try looking into Homomorphic Encryption:
https://huggingface.co/blog/encrypted-llm
It is resource intensive and slower but it serves your purpose better, in my opinion.