Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...
After seizing the summer with a blitz of powerful, freely available new open source language and coding focused AI models that matched or in some cases bested closed ...