{"id":4082,"date":"2023-02-02T10:13:46","date_gmt":"2023-02-02T17:13:46","guid":{"rendered":"https:\/\/blogs.ubc.ca\/chendricks\/?p=4082"},"modified":"2024-08-14T15:38:07","modified_gmt":"2024-08-14T22:38:07","slug":"ethical-considerations-chatgpt-llms","status":"publish","type":"post","link":"https:\/\/blogs.ubc.ca\/chendricks\/2023\/02\/02\/ethical-considerations-chatgpt-llms\/","title":{"rendered":"Some ethical considerations in ChatGPT and other LLMs"},"content":{"rendered":"<p>Like many others, I&#8217;ve been thinking about GPT and ChatGPT lately, and I&#8217;m particularly interested in diving deeper into ethical considerations and issues related to these kinds of tools. As I start looking into this sort of question, I realize there are a <em>lot<\/em> of such considerations. And here I&#8217;m only going to be able to scratch the surface. But I wanted to pull together for myself some ethical areas that I think may be particularly important for post-secondary students, faculty, and staff to consider.<\/p>\n<p>Notes:<\/p>\n<ul>\n<li>This post will focus on ethical issues outside of academic integrity, which is certainly an important issue but not my particular focus here.<\/li>\n<li>An area I barely touch on below, but plan to look into more, is AI and Indigenous approaches, protocols, and data sovereignty. One place I will likely start is by digging into <a href=\"https:\/\/www.indigenous-ai.net\/position-paper\/\">a 2020 position paper<\/a> by an <a href=\"https:\/\/www.indigenous-ai.net\/\">Indigenous protocol and AI working group<\/a>.<\/li>\n<li>This post is quite long! I frequently make long blog posts but this one may be one of the longest. There is a lot to consider.<\/li>\n<li>I am focusing here on ethical issues and concerns, and there are quite a few. It may sound like I may be arguing we should not use AI language models like ChatGPT in teaching and learning. That is not my point here; rather, I think it&#8217;s important to recognize ethical issues when considering whether or how to use such tools in an educational context, and discuss them with students.<\/li>\n<\/ul>\n<p>Some of the texts I especially relied on when crafting this post, that I recommend:<\/p>\n<ul>\n<li><a href=\"https:\/\/autumm.edtech.fm\/category\/chatgpt\/\">Autumm Caines has been writing some really insightful blog posts about ChatGPT<\/a> that include numerous ethical consideration.<\/li>\n<li>A very in-depth discussion of ethical and social risks of large language models can be found in Weidinger et al. (2021), &#8220;<a href=\"https:\/\/doi.org\/10.48550\/arXiv.2112.04359\">Ethical and social risks of harm from Language Models<\/a>.&#8221; It&#8217;s an excellent read. I&#8217;ll be pulling topic areas from there, and from other places (attributed below).<\/li>\n<li>Bender et al. (2021), &#8220;<a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3442188.3445922\">On the Dangers of Stochastic Parrots<\/a>&#8221; includes detailed analyses of several ethical concerns as well.<\/li>\n<\/ul>\n<p>And shortly before publishing I learned of this excellent <a href=\"https:\/\/leonfurze.com\/2023\/01\/26\/teaching-ai-ethics\/\">post by Leon Furze on ethical considerations regarding AI in teaching and learning<\/a>. It has many similar points to the below, along with teaching points and example ways to engage students in discussing these issues, focused on different disciplines. It&#8217;s very good, and comes complete with an infographic.<\/p>\n<p>My post here has been largely a way for me to think through the issues by writing.<\/p>\n<p><!--more--><\/p>\n<h2>Bias, discrimination, stereotypes, toxicity in outputs<\/h2>\n<blockquote><p><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_414R pdfjs_internal_id_415R\">Perpetuating harmful stereotypes and discrimination is a well-documented harm in machine learning models <\/span><span dir=\"ltr\" role=\"presentation\">that represent natural language (Caliskan et al., 2017). (Weidinger et al., 2021, p. 9)<\/span><\/p>\n<p>Bender et al. (p. 613) reveal that large datasets do not equally represent online users but significantly overrepresent younger users, people from developed countries, and English speakers. This means that dominant biases are disproportionately displayed including white supremacist, sexist, and ageist views. In use, GPT-3 has been shown to reproduce subtle biases and overtly discriminatory language patterns from its training data in many contexts including gender, race, religion, and disability. (Chan, 2022)<\/p><\/blockquote>\n<p>As Weidinger et al. (2021) note, &#8220;<span dir=\"ltr\" role=\"presentation\">LMs are <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_476R\">optimised to mirror language as accurately as possible, by detecting the statistical patterns present in natural <\/span><span dir=\"ltr\" role=\"presentation\">language&#8221; (p. 11), so when the datasets they are trained on contain biases, discriminatory, abusive, or toxic language, these then find their way into the tools&#8217; outputs. As Abid et al. (2021) note, one method to address this problem is to carefully select training data; but this is not how GPT (on which ChatGPT is built) has been trained. Abid et al. (2021) were able to reduce some biased and discriminatory outputs by changing prompts fed to GPT, but this is a manual solution that relies on users choosing (and knowing how) to do so.<\/span><\/p>\n<p>ChatGPT does have content filters to try to avoid such issues. For example, as of August 2022, Open AI released a Moderation endpoint to Open AI API developers, meant to use AI to flag harmful content. According to <a href=\"https:\/\/openai.com\/blog\/new-and-improved-content-moderation-tooling\/\">the announcement about this tool from Open AI<\/a>, &#8220;When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm\u2014content prohibited by our <a href=\"https:\/\/beta.openai.com\/docs\/usage-guidelines\/content-policy\" target=\"_blank\" rel=\"noopener\">content policy<\/a>.&#8221; In their <a href=\"https:\/\/openai.com\/blog\/chatgpt\/\">Nov. 30, 2022 announcement about ChatGPT<\/a>, Open AI notes:<\/p>\n<blockquote><p>While we\u2019ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We\u2019re using the <a href=\"https:\/\/openai.com\/blog\/new-and-improved-content-moderation-tooling\/\">Moderation API<\/a> to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We\u2019re eager to collect user feedback to aid our ongoing work to improve this\u00a0system.<\/p><\/blockquote>\n<p>You can <a href=\"https:\/\/beta.openai.com\/docs\/guides\/moderation\/overview\">read more about the Moderation endpoint<\/a> in Open AI&#8217;s API documentation, including the kinds of content that it is meant to address, including hate, violence, and sexual content (among others).<\/p>\n<p>Several news reports of biased, racist, discriminatory outputs show that the filter does still have a ways to go, such as reported in <a href=\"https:\/\/www.insider.com\/chatgpt-is-like-many-other-ai-models-rife-with-bias-2023-1\">a January 16, 2023 article by Hannah Getuhan for <em>Insider<\/em><\/a>, and <a href=\"https:\/\/www.bloomberg.com\/news\/newsletters\/2022-12-08\/chatgpt-open-ai-s-chatbot-is-spitting-out-biased-sexist-results\">a December 8, 2022 article by Davey Alba for <em>Bloomberg<\/em><\/a>. The above quote, and a quote from Open AI CEO Sam Altman in the <em>Bloomberg<\/em> article , indicate that the company is relying at least in part on user feedback (such as using the &#8220;thumbs up&#8221; or &#8220;thumbs down&#8221; functionality to rate outputs) to improve the filters. And in the meantime, biased, discriminatory and harmful content is being created.<\/p>\n<p>Open AI notes, in their documentation on their API, that &#8220;From hallucinating inaccurate information, to offensive outputs, to bias, and much more, language models may not be suitable for every use case without significant modifications&#8221; (under <a href=\"https:\/\/beta.openai.com\/docs\/guides\/safety-best-practices\">&#8220;Safety Best Practices&#8221;<\/a>).<\/p>\n<p>Bender et al. (2021) note a downstream effect as well: when people use these kinds of language models and add biased or discriminatory text to the internet, it not only spreads such things further, it also can potentially contribute to further perpetuating the problem in training data for later language models (p. 617).<\/p>\n<p>For the educational context, it&#8217;s important for those who might want to use content generated by ChatGPT in teaching and learning to recognize that what they or their students produce using the tool can produce overt, or (also dangerous) subtle biases and stereotypes, potentially reinforcing those that have been fed into it through training data unless this issue is directly addressed and discussed.<\/p>\n<h2>Relying on exploitative labour practices for content moderation<\/h2>\n<p>One way that AI companies address problems such as the above is to have humans label text, images, videos, etc. with violence, abuse, hate speech, and other toxic content, so as to build AI models that can find and remove such content automatically.<\/p>\n<p><a href=\"https:\/\/time.com\/6247678\/openai-chatgpt-kenya-workers\/\">A recent article in <em>Time<\/em> magazine<\/a> explained how OpenAI relied on underpaid workers in Kenya to do this labeling for a tool that ended up as part of ChatGPT. Not only were those workers paid extremely low wages (&#8220;between around $1.32 and $2 per hour depending on seniority and performance,&#8221; according to the <em>Time<\/em> article), they are also exposed on a daily basis to material that can be horrific and traumatic.<\/p>\n<p>This issue is not just limited to OpenAI&#8217;s work, of course. An article in <em>No\u0113ma<\/em> in October 2022 (&#8220;<a href=\"https:\/\/www.noemamag.com\/the-exploited-labor-behind-artificial-intelligence\/\">The Exploited Labour Behind Artificial Intelligence<\/a>&#8220;) points out that these kinds of practices are relied on by many companies, including those running major social media platforms that rely on automated content moderation:<\/p>\n<blockquote><p>Every murder, suicide, sexual assault or child abuse video that does not make it onto a platform <a href=\"https:\/\/www.behindthescreen-book.com\/\" target=\"_blank\" rel=\"external noopener noreferrer\" data-wpel-link=\"external\">has been <\/a><a href=\"https:\/\/www.behindthescreen-book.com\/\" target=\"_blank\" rel=\"external noopener noreferrer\" data-wpel-link=\"external\">viewed and flagged<\/a> by a content moderator or an automated system trained by data most likely supplied by a content moderator. Employees performing these tasks <a href=\"https:\/\/www.theverge.com\/2019\/2\/25\/18229714\/cognizant-facebook-content-moderator-interviews-trauma-working-conditions-arizona\" target=\"_blank\" rel=\"external noopener noreferrer\" data-wpel-link=\"external\">suffer<\/a> from anxiety, depression and post-traumatic stress disorder due to constant exposure to this horrific content.<\/p><\/blockquote>\n<p>As one grapples with the lingering issues of bias and harmful content in platforms such as ChatGPT, and talks with students about them, it would be useful to also talk with students about the ethical problems involved in attempts to label and remove such content.<\/p>\n<h2>Data and privacy<\/h2>\n<h3>Privacy of those who make accounts and use the tools<\/h3>\n<p>As Autumm Caines notes, &#8220;Anytime you use a tool that needs an account the company now has an identifier in which they can track your use of the site to your identity&#8221; (<a href=\"https:\/\/autumm.edtech.fm\/2023\/01\/18\/prior-to-or-instead-of-using-chatgpt-with-your-students\/\">&#8220;Prior to (or instead of) using ChatGPT with your students&#8221;<\/a>) In its current form, ChatGPT requires a name, email address, and a mobile phone number. One can provide a pseudonym and an email address that doesn&#8217;t include one&#8217;s name; a mobile phone number could be more challenging to avoid tying to oneself, though one could use a pre-paid &#8220;burner&#8221; phone perhaps. The point is that, for example, if one wanted to ask one&#8217;s students to create an account and use a tool like this in teaching, it is important to let them know that the work they do in the tool can be tied to their identity through their account unless they take steps to anonymize it.<\/p>\n<p>It&#8217;s also important, of course, to take a careful look at the Terms of Service and Privacy Policies to get as much of a sense as possible (which isn&#8217;t always as clear as one might like) as to how one&#8217;s data will be used.<\/p>\n<p>Open AI notes in their <a href=\"https:\/\/help.openai.com\/en\/articles\/6783457-chatgpt-faq\">FAQ about the ChatGPT research release<\/a> that they may review the content of conversations one has with the tool, &#8220;to improve our systems and to ensure the content complies with our policies and safety requirements.&#8221; The FAQ note that the contents of conversations may also be used for future training of the model, which means, <a href=\"https:\/\/autumm.edtech.fm\/2023\/01\/18\/prior-to-or-instead-of-using-chatgpt-with-your-students\/\">as Caines notes<\/a>, that &#8220;Using ChatGPT is providing free labor to OpenAI in their product development,&#8221; inviting large numbers of people to participate in work needed to make the model better, by making it freely available.<\/p>\n<p>In <a href=\"https:\/\/help.openai.com\/en\/articles\/5722486-how-your-data-is-used-to-improve-model-performance\">their documentation about their GPT API<\/a>, Open AI notes:<\/p>\n<blockquote><p>As part of . . . continuous improvement, when you use OpenAI models via our API, we may use the data you provide us to improve our models. Not only does this help our models become more accurate and better at solving your specific problem, it also helps improve their general capabilities and safety.<\/p><\/blockquote>\n<p>On that page they also describe a procedure to opt out of having your data used to improve models.<\/p>\n<p>Finally, it&#8217;s also not clear what happens to one&#8217;s data once one deletes their account. <a href=\"https:\/\/help.openai.com\/en\/articles\/6783457-chatgpt-faq\">The FAQ for ChatGPT<\/a> says:<\/p>\n<blockquote>\n<p class=\"no-margin\"><b>Can you delete my data?<\/b><\/p>\n<p class=\"no-margin\">Yes, please follow the data deletion process here: <a class=\"intercom-content-link\" href=\"https:\/\/help.openai.com\/en\/articles\/6378407-how-can-i-delete-my-account\" data-is-internal-link=\"true\">https:\/\/help.openai.com\/en\/articles\/6378407-how-can-i-delete-my-account<\/a><\/p>\n<\/blockquote>\n<p>That takes you to the process for deleting an account, but what happens to one&#8217;s chat history, one&#8217;s specific prompts, and how all of that may be tied to oneself as an individual? I haven&#8217;t been able to find that information in their documentation.<\/p>\n<p>So of course I asked ChatGPT what happens to my chat history after I delete my account.<\/p>\n<div class=\"flex flex-grow flex-col gap-3\">\n<div class=\"min-h-[20px] flex flex-col items-start gap-4 whitespace-pre-wrap\">\n<div class=\"markdown prose w-full break-words dark:prose-invert light\">\n<blockquote><p>When you delete your account, your chat history with me may be deleted, depending on the service provider&#8217;s policy. It&#8217;s important to check the terms of service and privacy policy of the platform you&#8217;re using to interact with me to understand how they handle user data and chat history.<\/p>\n<p>For OpenAI, if you delete your account, the company will destroy any data associated with your account, including your chat history. This process is subject to OpenAI&#8217;s data retention policy, and will be done in a way that makes the data irretrievable. Additionally, OpenAI also has technical and organizational measures in place to protect data against accidental or unauthorized access, alteration, or destruction.<\/p>\n<p>It&#8217;s important to keep in mind that, even if the data is deleted, there may be copies of the data that exist in backups or other storage systems. These copies will be deleted over time in accordance with OpenAI&#8217;s data retention policy.<\/p><\/blockquote>\n<p>I can&#8217;t find Open AI&#8217;s data retention policy anywhere on their website, unless it&#8217;s buried in the <a href=\"https:\/\/openai.com\/privacy\/\">Privacy Policy<\/a> somewhere that I couldn&#8217;t find.<\/p>\n<p>It may seem like I&#8217;m digging into this tool more than others, but really, these are questions that one should ask before using tools oneself, or asking students to do so. For those that have a license contract with the university, it is often the case that these sorts of questions are asked and answered, and certain privacy requirements must be met (depending on the context). In British Columbia, a <a href=\"https:\/\/privacymatters.ubc.ca\/privacy-impact-assessment\">Privacy Impact Assessment<\/a> is required.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<h3>Potential privacy leaks by the tools themselves<\/h3>\n<p>Weidinger et al. (2021) note a different concern with privacy that could happen with LLMs, namely that there is the possibility of the models providing private information that was part of their training data:<\/p>\n<blockquote><p><span dir=\"ltr\" role=\"presentation\">Current large-scale LMs rely on training datasets that contain information about people. Privacy violations <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_965R pdfjs_internal_id_966R\">may occur when training data includes personal information that is then directly disclosed by the model <\/span><span dir=\"ltr\" role=\"presentation\">(Carlini et al., 2021). Such information may constitute part of the training data through no fault of the affected <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_967R\">individual, e.g. where data leaks occur or where others post private information about them on online networks <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_968R\">(Mao et al., 2011) (p. 19)<br \/>\n<\/span><\/p><\/blockquote>\n<p>I couldn&#8217;t find any clear information in Open AI&#8217;s documentation or blog posts about this (though there are quite a few blog posts and publications, and I haven&#8217;t reviewed them all!). But I did ask Chat GPT about steps that Open AI is taking to address this kind of problem. (I also asked it about myself&#8211;the Christina Hendricks who teaches philosophy at the University of British Columbia&#8211;and it couldn&#8217;t find me or any private info about me).<\/p>\n<p>After a little back-and-forth in which ChatGPT said that &#8220;my training data is periodically updated, and any private information would have been removed during the process of data preprocessing and cleaning,&#8221; I asked the following question and received a helpful (?) answer.<\/p>\n<div class=\"relative flex w-[calc(100%-50px)] md:flex-col lg:w-[calc(100%-115px)]\">\n<div class=\"flex flex-grow flex-col gap-3\">\n<blockquote>\n<div class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">ME: How does open AI clean training data to remove private information?<\/div>\n<div><\/div>\n<div class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">CHATGPT: OpenAI uses a combination of techniques to clean and preprocess the training data to remove any private information. Some of these techniques include:<\/div>\n<ul>\n<li class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">Data Anonymization: This involves replacing any personally identifying information, such as names, addresses, and phone numbers, with generic placeholders.<\/li>\n<li class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">Data Masking: This involves obscuring sensitive information, such as financial data, with a symbol or character.<\/li>\n<li class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">Data De-identification: This involves removing or altering any data elements that could be used to identify an individual, such as dates of birth or social security numbers.<\/li>\n<li class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">Data Filtering: This involves removing any data that is deemed unnecessary or irrelevant for training the model, such as duplicate or low-quality data.<\/li>\n<li class=\"relative h-[30px] w-[30px] p-1 rounded-sm text-white flex items-center justify-center\">Regular Auditing: OpenAI regularly audits the training data to ensure that it is clean and free of private information.<\/li>\n<\/ul>\n<\/blockquote>\n<\/div>\n<\/div>\n<blockquote><p>These techniques are applied to the data before it is used to train the model to ensure that the model does not have access to any private information. Additionally, OpenAI follow the industry standard and comply with GDPR, CCPA and other data protection regulations.<\/p><\/blockquote>\n<p>When I asked if ChatGPT would give me the address of a famous person (the actress Christina Hendricks, because why not), it said:<\/p>\n<blockquote><p>As a language model, I don&#8217;t have access to any personal information and I don&#8217;t have the capability to access it. Additionally, it would be illegal and unethical to share someone&#8217;s personal information without their consent. It&#8217;s not appropriate to share or ask for someone&#8217;s address without a valid reason and proper authorization.<\/p><\/blockquote>\n<p>Nice one, ChatGPT. Your filters and guardrails are working here.<\/p>\n<h3>Entering private data<\/h3>\n<p>Weidinger et al. (2021) note that there is a danger of users inputting private information, which may be exacerbated when one is chatting with a seemingly somewhat human-like bot and engaging in a conversation like one might with another human.<\/p>\n<blockquote><p><span dir=\"ltr\" role=\"presentation\">In conversation, users may reveal private information that would otherwise be difficult to access, such as <\/span><span dir=\"ltr\" role=\"presentation\">thoughts, opinions, or emotions. Capturing such information may enable downstream applications that violate <\/span><span dir=\"ltr\" role=\"presentation\">privacy rights or cause harm to users, such as via surveillance or the creation of addictive applications. (p. 30)<br \/>\n<\/span><\/p><\/blockquote>\n<p><a href=\"https:\/\/help.openai.com\/en\/articles\/6783457-chatgpt-faq\">In their FAQ on ChatGPT<\/a>, OpenAI notes that they can&#8217;t delete specific prompts from one&#8217;s history, and they say, &#8220;Please don&#8217;t share any sensitive information in your conversations.&#8221;<\/p>\n<p>It is important to share with students that they should not reveal private information about themselves or others through their prompts.<\/p>\n<h2>Reinforcing existing power structures<\/h2>\n<p>Due to the nature of training data, LLMs can reproduce, and reinforce, larger social, political, and other power relationships. As Leon Furze puts it, &#8220;The power structures reflected in the datasets become encoded in the models, meaning that any output reinforces those structures&#8221; (<a href=\"https:\/\/leonfurze.com\/2023\/01\/17\/chatgpt-in-education-back-to-basics\/\">&#8220;Chat GPT in Education: Back to Basics&#8221;<\/a>). Furze notes as an example that if the training data is heavily skewed towards English language resources with particular cultural practices or references, the tool will likely not do as well with outputs in other languages or with other cultural references. Explaining further, Chan (2022) points out that<\/p>\n<blockquote><p>GPT-3\u2019s dataset contained 93% English text and only 7% in other languages reflecting that GPT-3 is made for English-speaking (predominantly Western) countries in mind (Brown et al. [<a id=\"ref-link-section-d270981840e761\" title=\"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, p. , Neelakantan, A., Shyam, p. , Sastry, G., Askell, A., et al.: Language models are few-shot learners.\u00a0arXiv preprint arXiv:2005.14165, pp. 1\u201375. https:\/\/arxiv.org\/abs\/2005.14165 (2020)\" href=\"https:\/\/link.springer.com\/article\/10.1007\/s43681-022-00148-6#ref-CR5\" data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\">5<\/a>], p. 14). Despite its impressive translation capabilities, the central issue is that English-speaking voices and perspectives are given overwhelming precedence.<\/p><\/blockquote>\n<p>In Open AI&#8217;s own documentation on their API, they note that &#8220;our support for non-English languages is currently limited&#8221; (under <a href=\"https:\/\/beta.openai.com\/docs\/guides\/moderation\/overview\">&#8220;Moderation&#8221;<\/a>).<\/p>\n<p>Weidinger et al. (2021) note one potential concern in this arena (among others):<\/p>\n<blockquote><p><span dir=\"ltr\" role=\"presentation\">In the case of LMs where great benefits are anticipated, lower performance for some groups risks creating a <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_835R pdfjs_internal_id_836R pdfjs_internal_id_837R pdfjs_internal_id_838R\">distribution of benefits and harms that perpetuates existing social inequities (Bender et al., 2021; Joshi et al., <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_839R pdfjs_internal_id_840R\">2021). By relatively under-serving some groups, LMs raise social justice concerns (Hovy and Spruit, 2016), for <\/span><span dir=\"ltr\" role=\"presentation\">example when technologies underpinned by LMs are used to allocate resources or provide essential services. (p. 16)<\/span><\/p><\/blockquote>\n<p>There are also broader issues, such as that all LLMs can do is to repeat patterns that already exist in language, focusing on those that appear most often to determine the likelihood of what words &amp;amp; phrases come next. It makes sense to think that they can thus perpetuate beliefs, values, processes, knowledges, etc. that are currently dominant.<\/p>\n<p>Bender et al. (2021) explain, for example, that even though crawling the internet means there is a large amount of data in training sets, that doesn&#8217;t mean there is diversity amongst the viewpoints included. Differential access to and participation in generating content on the internet, as well as content moderation decisions, ways that content tends to propagate and get shared, and choices of what to include and exclude in the training data can mean that<\/p>\n<blockquote><p><span dir=\"ltr\" role=\"presentation\"> &#8230; white supremacist and <\/span><span dir=\"ltr\" role=\"presentation\">misogynistic, ageist, etc. views are overrepresented in the training <\/span><span dir=\"ltr\" role=\"presentation\">data, not only exceeding their prevalence in the general population <\/span><span dir=\"ltr\" role=\"presentation\">but also setting up models trained on these datasets to further <\/span><span dir=\"ltr\" role=\"presentation\">amplify biases and harms. (p. 613)<br \/>\n<\/span><\/p>\n<p><span dir=\"ltr\" role=\"presentation\">In accepting large amounts of web text as \u2018representative\u2019 <\/span><span dir=\"ltr\" role=\"presentation\">of \u2018all\u2019 of humanity we risk perpetuating dominant viewpoints, <\/span><span dir=\"ltr\" role=\"presentation\">increasing power imbalances, and further reifying inequality. (p. 614)<br \/>\n<\/span><\/p><\/blockquote>\n<h2>Intellectual property &amp; Indigenous data sovereignty<\/h2>\n<p>I am not a lawyer, and the issues here are complex, but there have been concerns raised and lawsuits begun against AI companies that scrape large amounts of data from the internet, including text and images, without permission from content creators. An article from <em>The Verge\u00a0<\/em>in November 2022 (&#8220;<a href=\"https:\/\/www.theverge.com\/23444685\/generative-ai-copyright-infringement-legal-fair-use-training-data\">The Scary Truth About AI Copyright is No One Knows What Happens Next<\/a>&#8220;) discusses this issue from the perspective of copyright laws, particularly in the US, and how use of such scraped data for training purposes may be considered &#8220;fair use&#8221;&#8211;though the question is far from settled. See also &#8220;<a href=\"https:\/\/techcrunch.com\/2023\/01\/27\/the-current-legal-cases-against-generative-ai-are-just-the-beginning\/\">The Current Legal Cases Against Generative AI are Just the Beginning<\/a>&#8221; (<em>Techcrunch<\/em>, January 2023).<\/p>\n<p>There are already at last a few legal cases happening in this area including <a href=\"https:\/\/www.theverge.com\/2022\/11\/8\/23446821\/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data\">a class action lawsuit around the creation of Github Copilot<\/a>, in which, the lawsuit alleges, code was used to train an AI model without crediting its creators, in violation of the licenses on that code; and <a href=\"https:\/\/www.theverge.com\/2023\/1\/17\/23558516\/ai-art-copyright-stable-diffusion-getty-images-lawsuit\">an upcoming lawsuit by Getty Images against Stable Diffusion<\/a> for scraping its copyrighted images and using them to train an image-generation tool.<\/p>\n<p>This is also an issue to pay attention to around Indigenous data sovereignty, which Tahu Kukutai and John Taylor explain as &#8220;the inherent and inalienable rights and interests of indigenous peoples relating to the collection, ownership and application of data about their people, lifeways and territories&#8221; (<a href=\"https:\/\/library.oapen.org\/handle\/20.500.12657\/31875\"><em>Indigenous Data Sovereignty: Towards an Agenda<\/em><\/a>, 2016, p. 2). According to <a href=\"https:\/\/www.lib.sfu.ca\/help\/publish\/research-data-management\/indigenous-data-sovereignty\">an SFU Library Guide on Indigenous data sovereignty<\/a>, it means that<\/p>\n<blockquote><p>Indigenous Peoples have the right to own, control, access, and steward data about their communities, lands, and culture. Information management and data collection strategies must align with the practices and culture of the Indigenous Nation, community or Peoples who are represented in the data.<\/p><\/blockquote>\n<p>A set of principles I&#8217;ve seen referenced a number of times in this area are <a href=\"https:\/\/fnigc.ca\/ocap-training\/\">the OCAP\u00ae principles developed by the First Nations Information Governance Centre:<\/a> (quotes below are from this page)<\/p>\n<blockquote>\n<ul>\n<li><strong>Ownership<\/strong> refers to the relationship of First Nations to their cultural knowledge, data, and information. This principle states that a community or group owns information collectively in the same way that an individual owns his or her personal information.<\/li>\n<li><strong>Control <\/strong>affirms that First Nations, their communities, and representative bodies are within their rights to seek control over all aspects of research and information management processes that impact them. &#8230;<\/li>\n<li><strong>Access<\/strong> refers to the fact that First Nations must have access to information and data about themselves and their communities regardless of where it is held. The principle of access also refers to the right of First Nations\u2019 communities and organizations to manage and make decisions regarding access to their collective information. &#8230;<\/li>\n<li><strong>Possession<\/strong> &#8230; refers to the physical control of data. Possession is the mechanism by which ownership can be asserted and protected.<\/li>\n<\/ul>\n<\/blockquote>\n<p>Depending on the information in the training datasets for LLMs, and how output is used, Indigenous data sovereignty rights may be violated.<\/p>\n<h2>Disparate access<\/h2>\n<p>This is an issue with many technologies, but still worth pointing out. Weidinger et al. (2021) explain clearly:<\/p>\n<blockquote><p><span dir=\"ltr\" role=\"presentation\">Due to differential internet access, language, skill, or hardware requirements, the benefits from LMs are unlikely <\/span><span dir=\"ltr\" role=\"presentation\">to be equally accessible to all people and groups who would like to use them. Inaccessibility of the technology <\/span><span dir=\"ltr\" role=\"presentation\">may perpetuate global inequities by disproportionately benefiting some groups. (p. 34)<br \/>\n<\/span><\/p><\/blockquote>\n<p>At the time of writing I couldn&#8217;t find any information on what the experience would be like trying to use ChatGPT with low wifi bandwidth, but I&#8217;m guessing there would be more slowness and errors. And as noted above, this particular tool works best with English. And those students who speak English well and have the time and wherewithal to develop more skills in prompt engineering are going to be able to produce better results.<\/p>\n<p>I would add to the above quote the issues of financial access: while ChatGPT is free of cost to use for now, that could change at any moment. OpenAI note in their <a href=\"https:\/\/help.openai.com\/en\/articles\/6783457-chatgpt-faq\">FAQ on ChatGPT<\/a> that it is free to use during &#8220;the initial research preview.&#8221; They started plans for ChatGPT Pro, and <a href=\"https:\/\/www.theverge.com\/2023\/1\/23\/23567317\/chatgpt-pro-tier-42-month-pricing-test-report\">those with some early access are quoting the price at $42 USD a month<\/a>&#8211;clearly out of reach for many.<\/p>\n<p>Just before publishing this post, OpenAI announced <a href=\"https:\/\/openai.com\/blog\/chatgpt-plus\/\">ChatGPT Plus<\/a>, for $20 USD a month. With that subscription you can get access to ChatGPT even during peak times, faster performance, and early access to new features and improvements. They say in that announcement that they will continue to offer a free option, but it seems reasonable to say that those using the free tier will be at a disadvantage in terms of access during peak times and access to features and improvements. Thus, in classes there may be some students who have access to the professional version, and many who don&#8217;t. Some students may pool resources together so that a group of them uses the same pro or plus account, while others can only access the free version.<\/p>\n<p>For educators, it is important to understand that while the tool is free right now, if you incorporate it into activities that could change. It is also important to recognize that students will have differential access to be able to use it effectively for course activities and assignments.<\/p>\n<h2>Environmental impact<\/h2>\n<p>This is also not limited to LLMs, but it&#8217;s still worth considering. Bender et al. (2021) note the significant energy usage for training language models, and point to arguments suggesting energy efficiency as an important metric for success (along with other metrics):<\/p>\n<blockquote><p><span dir=\"ltr\" role=\"presentation\">As <\/span><span dir=\"ltr\" role=\"presentation\">shown in [<\/span><span dir=\"ltr\" role=\"presentation\">5<\/span><span dir=\"ltr\" role=\"presentation\">],* the amount of compute used to train the largest deep <\/span><span dir=\"ltr\" role=\"presentation\">learning models (for NLP and other applications) has increased <\/span><span dir=\"ltr\" role=\"presentation\">300,000x in 6 years, increasing at a far higher pace than Moore\u2019s <\/span><span dir=\"ltr\" role=\"presentation\">Law. To promote green AI, Schwartz et al. argue for promoting <\/span><span dir=\"ltr\" role=\"presentation\">efficiency as an evaluation metric &#8230;. (p. 612)<br \/>\n<\/span><\/p><\/blockquote>\n<p>* <span dir=\"ltr\" role=\"presentation\">Amodei &amp; Hernandez. 2018<\/span><\/p>\n<p>The authors note further that the costs of such environmental impacts are being borne most strongly by people in marginalized groups and poorer nations, while the benefits accrue to those whose language and culture are represented in the models, and who have the financial means to be able to access and use them.<\/p>\n<h2>Conclusion<\/h2>\n<p>There are, I expect, other important ethical topics to be considered, but this post is already over 4000 words!<\/p>\n<p>I want to reiterate that my purpose with this post is to catalogue and think through some ethical considerations around LLMs like ChatGPT, but I am not concluding from all of this that they are so problematic that we ought not to use them at all in teaching and learning. There may be good cases for that to be made, but it&#8217;s not what I&#8217;m doing here. I&#8217;m trying instead to raise ethical considerations for those involved in teaching and learning to be aware of if they choose to use such tools.<\/p>\n<h2>Works Cited<\/h2>\n<div class=\"csl-bib-body\">\n<div class=\"csl-entry\">Abid, A., Farooqi, M., &amp; Zou, J. (2021). Large language models associate Muslims with violence. <i>Nature Machine Intelligence<\/i>, <i>3<\/i>(6), Article 6. <a href=\"https:\/\/doi.org\/10.1038\/s42256-021-00359-2\">https:\/\/doi.org\/10.1038\/s42256-021-00359-2<\/a><\/div>\n<div><\/div>\n<div><span dir=\"ltr\" role=\"presentation\">Amodei, D., &amp; Hernandez, D. (2018). AI and Compute.<\/span> <a href=\"https:\/\/openai.com\/blog\/ai-and-compute\"><span dir=\"ltr\" role=\"presentation\">https:\/\/openai.<\/span><span dir=\"ltr\" role=\"presentation\">com\/blog\/ai-and-compute<\/span><\/a><\/div>\n<div><\/div>\n<div><span dir=\"ltr\" role=\"presentation\">Bender, E.M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can <\/span><span dir=\"ltr\" role=\"presentation\">Language Models Be Too Big? In<\/span> <em><span dir=\"ltr\" role=\"presentation\">Proceedings of the 2021 ACM Conference on Fairness, Accountability, and <\/span><span dir=\"ltr\" role=\"presentation\">Transparency<\/span><\/em><span dir=\"ltr\" role=\"presentation\">, Virtual Event, Canada. Association for Computing <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_1866R pdfjs_internal_id_1884R\">Machinery. ISBN 9781450383097. <a href=\"https:\/\/doi.org\/10.1145\/344218\">https:\/\/doi.org\/10.1145\/344218<\/a><\/span><span dir=\"ltr\" role=\"presentation\">8.3445922.<\/span><\/div>\n<\/div>\n<p><span dir=\"ltr\" role=\"presentation\">Caliskan, A., Bryson, J. J., and Narayanan. A. (2017). Semantics derived automatically from language corpora contain <\/span><span dir=\"ltr\" role=\"presentation\">human-like biases.<\/span> <em><span dir=\"ltr\" role=\"presentation\">Science<\/span><\/em><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_1896R\">, 356(6334):183\u2013186. ISSN 0036-8075, 1095-9203. <a href=\"https:\/\/doi.org\/10.1126\/science.aal4230\">https:\/\/doi.org\/10.1126\/science.aal4230<\/a><\/span><span dir=\"ltr\" role=\"presentation\">. <a href=\"http:\/\/arxiv.org\/abs\/1608.07187\">http:\/\/arxiv.org\/abs\/1608.07187<\/a>.<br \/>\n<\/span><\/p>\n<p><span dir=\"ltr\" role=\"presentation\">Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Er<\/span><span dir=\"ltr\" role=\"presentation\">lingsson, U., Oprea, A., and Raffel, C. (2021). Extracting Training Data from Large Language Models.<\/span> <span dir=\"ltr\" role=\"presentation\">arXiv:2012.07805 <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_1905R\">[cs].<\/span><span dir=\"ltr\" role=\"presentation\"><a href=\"http:\/\/arxiv.org\/abs\/2012.07805\"> http:\/\/arxiv.org\/abs\/2012.07805<\/a>.<br \/>\n<\/span><\/p>\n<div class=\"csl-bib-body\">\n<div class=\"csl-entry\">Chan, A. (2022). GPT-3 and InstructGPT: Technological dystopianism, utopianism, and \u201cContextual\u201d perspectives in AI ethics and industry. <i>AI and Ethics<\/i>. <a href=\"https:\/\/doi.org\/10.1007\/s43681-022-00148-6\">https:\/\/doi.org\/10.1007\/s43681-022-00148-6<\/a><\/div>\n<div><\/div>\n<\/div>\n<div><span dir=\"ltr\" role=\"presentation\">Hovy, D. and Spruit, S.L. (2016). The Social Impact of Natural Language Processing. In<\/span> <em><span dir=\"ltr\" role=\"presentation\">Proceedings of the 54th Annual <\/span><\/em><span dir=\"ltr\" role=\"presentation\"><em>Meeting of the Association for Computational Linguistics<\/em> (Volume 2: Short Papers). <\/span><span dir=\"ltr\" role=\"presentation\">Berlin, <\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_2006R\">Germany. Association for Computational Linguistics. doi: 10.18653\/v1\/P16-2096. <\/span><span dir=\"ltr\" role=\"presentation\"><a href=\"https:\/\/aclanthology.org\/P16-2096\">https:\/\/aclanthology.org\/P16-2096<\/a>.<\/span><\/div>\n<div class=\"csl-bib-body\">\n<div><\/div>\n<\/div>\n<div><span dir=\"ltr\" role=\"presentation\">Joshi, P., Santy, S., Budhiraja, A., Bali, K., and Choudhury, M. (2021). The State and Fate of Linguistic Diversity and <\/span><span dir=\"ltr\" role=\"presentation\">Inclusion in the NLP World.<\/span> <span dir=\"ltr\" role=\"presentation\">arXiv:2004.09095 [cs]<\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_2035R\">. <a href=\"http:\/\/arxiv.org\/abs\/2004.09095\">http:\/\/arxiv.org\/abs\/2004.09095<\/a>. <\/span><span dir=\"ltr\" role=\"presentation\"><br \/>\n<\/span><\/div>\n<div><\/div>\n<div class=\"csl-bib-body\">\n<div><span dir=\"ltr\" role=\"presentation\">Mao, H., Shuai, X., and Kapadia, A. (2011). Loose tweets: an analysis of privacy leaks on twitter. In<\/span> <em><span dir=\"ltr\" role=\"presentation\">Proceedings of the 10th <\/span><span dir=\"ltr\" role=\"presentation\">annual ACM workshop on Privacy in the electronic society<\/span><\/em><span dir=\"ltr\" role=\"presentation\">, Chicago, Illinois, USA<\/span><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_2101R\">. Association for Computing Machinery. ISBN 9781450310024. doi: 10.1145\/2046556.2046558.<\/span><span dir=\"ltr\" role=\"presentation\"><a href=\"https:\/\/doi.org\/10.1145\/2046556.2046558\">https:\/\/doi.org\/10.1145\/2046556.2046558<\/a>.<\/span><\/div>\n<div><\/div>\n<\/div>\n<div><span dir=\"ltr\" aria-owns=\"pdfjs_internal_id_671R\">Schwartz, R., Dodge, J., Smith, N.A., &amp; Etzioni, O. (2020). Green AI. <\/span><em><span dir=\"ltr\" role=\"presentation\">Commun. ACM<\/span><\/em> <span dir=\"ltr\" role=\"presentation\"><em>63(<\/em>12,) 54\u201363.<\/span> <a href=\"https:\/\/doi.org\/10.1145\/3381831\"><span dir=\"ltr\" role=\"presentation\">https:\/\/doi.org\/10.1145\/3381831<\/span><\/a><\/div>\n<div class=\"csl-bib-body\">\n<div><\/div>\n<\/div>\n<div class=\"csl-bib-body\">\n<div class=\"csl-entry\">Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., \u2026 Gabriel, I. (2021). Ethical and social risks of harm from Language Models. arXiv. <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2112.04359\">https:\/\/doi.org\/10.48550\/arXiv.2112.04359<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Like many others, I&#8217;ve been thinking about GPT and ChatGPT lately, and I&#8217;m particularly interested in diving deeper into ethical considerations and issues related to these kinds of tools. As I start looking into this sort of question, I realize there are a lot of such considerations. And here I&#8217;m only going to be able [&hellip;]<\/p>\n","protected":false},"author":665,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2324452,1383,4816],"tags":[2324448,2324450],"class_list":["post-4082","post","type-post","status-publish","format-standard","hentry","category-ai","category-edtech","category-general-issues-in-higher-education","tag-ai_writing","tag-gpt"],"_links":{"self":[{"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/posts\/4082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/users\/665"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/comments?post=4082"}],"version-history":[{"count":23,"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/posts\/4082\/revisions"}],"predecessor-version":[{"id":4105,"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/posts\/4082\/revisions\/4105"}],"wp:attachment":[{"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/media?parent=4082"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/categories?post=4082"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/chendricks\/wp-json\/wp\/v2\/tags?post=4082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}