Running out of good data
This episode of Pretrained discusses the projected exhaustion of internet data by 2026. It also covers the varying value of transcripts, the practice of training AI on multiple languages, and the financial support for generating new datasets.