Some folks have observed that language models like GPT-3, GPT-2, Megatron-LM, Turing-NLG, continue to improve in terms of evaluation tests as they are trained further. A GPT-2 model might have more than $50,000 spent on training it. And a GPT-3 model might have more than $5 million in cpu time spent training it to completion. But along with that observation has been 'frustration' that ultimate applications ('downstream tasks') do not necessarily get any better. This is not necessarily a bad conclusion.
I'll turn this observation on its head. It could mean that businesses do not need to train AI models to completion to get just as much value from a language model in terms of their actual business needs. This observation could save a lot of money and time. In other words, do test your half baked models on your actual application before you waste thousands or millions chasing training metrics that may have no business relevance.
At East Agile we are building enhanced versions of GPT-2 class language models to help businesses use their proprietary knowledge for competitive advantage.