1. Build a repo of Transformers and all it's variants and try to benchmark the efficiency of each variant.
2. Train a small transformer on a small dataset that I can extract from the internet.
3. Try to reproduce a small scale version of RLHF (maybe not realistic at even a small scale)
I'm biased towards projects that people find useful beyond me just learning more about the field and assume I have access to like a few hundred dollars of AWS GPU compute. Thanks a lot for your help
LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
https://chrisdonahue.com/LakhNES/
The NES Music Database (5k+ scores)
https://github.com/chrisdonahue/nesmdb
The Lakh MIDI Dataset v0.1 (175k+ files)