Apparently, you can send a video clip of yourself and the model will be trained on it and then you can write any text and generate a video of yourself saying those words.
Does anyone know how it works? What foundational model they might be using? I'd love to replicate this locally.
👤 ilaksh Accepted Answer ✓
I assume it's a SOTA version of "talking head generation" or something related.