By examining whether your own or familiar works are included in datasets scraped from shadow libraries like Library Genesis (LibGen), you can gain valuable insight into the impact of AI development on intellectual property rights.
This activity encourages critical engagement with questions of authorship, consent, and compensation—core concerns in both education and content creation.This activity emphasizes the importance of transparency and ethical AI practices in shaping the future of digital scholarship.
Challenge
Generative AI models are often trained on massive datasets scraped from the internet—some of which include pirated content from shadow libraries without permission from the original authors. These practices raise serious concerns about copyright, intellectual property, and the ethical treatment of authors and educators.
One high-profile case involves the use of Library Genesis (LibGen), a well-known shadow library, in AI training datasets. With tools like the one developed by The Atlantic, you can now search these datasets to determine whether your work or works you know have been included without authorization.
For this challenge, you’ll search the LibGen-based Books3 dataset and then use your findings to reflect on the broader ethical and legal issues surrounding generative AI and authorship.
Instructions
- Ensure you have a ChatGPT account.
You’ll use ChatGPT to document your reflections after completing the search. - Visit The Atlantic’s searchable tool to explore the dataset.
Use the tool to look up your own published works or those of colleagues, instructors, or well-known authors in your field. - Search for Publications.
Enter titles, author names, or keywords to check whether specific works are included in the dataset. - Reflect on What You Find.
If you find a work that was used:- Was permission granted for inclusion?
- Are there inaccuracies in how the work is listed?
- What are the potential consequences of its use in AI training?
- Use ChatGPT to Reflect.
Ask ChatGPT to help you generate a short written reflection, discussion prompt, or classroom activity based on your findings. Consider including:- The ethical implications of using pirated content in AI development
- Potential legal consequences for companies and institutions
- Ideas for how this topic could be introduced in a course on digital ethics, media studies, or intellectual property
Reflect and Share
What does this challenge reveal about the role of AI in teaching and learning? Try it out, and share what you discovered in the comment box below. Whether it’s your final product, a reflection, or a surprising insight
Leave a Reply