https://huggingface.co/docs/transformers/training
with the caveat that the default training rate is too high and you get better results with 2e-5
Google put out a paper last week that talks about some of this and proposed a new method that outperforms previous: https://news.ycombinator.com/item?id=35810663