Study: Transparency is often lacking in datasets used to train large language models