learn-tilde-ath asked:
If I wanted to try fine-tuning gpt2 (possibly just the smallest version if that needs less data?), do you know a rough lower bound on how big a corpus I would need in order for that to work ok? I want to use text I've written, but I'm not sure that there's enough of it (I have a little over 2MB handy). I've tried casually to look this question up a couple times but my googling didn't find much of an answer.
possibly just the smallest version if that needs less data?
Train on the biggest version you can, actually – bigger ones are more data-efficient, not less.
do you know a rough lower bound on how big a corpus I would need in order for that to work ok?
It depends on what you’re going for, but 2MB is definitely enough to be worth trying.
The smallest (?) corpus I’ve ever done was 1.8 MB of a friend’s tweets and blog posts, a long time ago, and he was impressed and amused with the results.








