This article is part of our coverage of the latest in AI research.
One of the growing concerns regarding large language models (LLM) like ChatGPT is their ability to generate convincing yet false information. Despite the increasing discourse, our understanding of the potential impact and our readiness to mitigate these effects remain incomplete.
A recent collaborative study by scientists from various universities and institutions illuminates the issue of information veracity with LLMs. The study underscores two significant aspects. First, even well-intentioned LLMs can inadvertently propagate unreliable information. Second, a bigger threat is the potential misuse of AI chatbots for malicious purposes, such as writing scam emails, disseminating disinformation, or manipulating bot feeds.
The research highlights the limitations of current strategies in addressing these challenges and suggests directions for a future where ChatGPT and similar models are ubiquitous in many aspects of everyday life.
LLM hallucinations
While developed for beneficial tasks, LLMs can inadvertently cause harm by generating factually incorrect statements, or “hallucinations.” The authors of the study warn that “discerning fact from fiction in LLM-generated content is difficult, and it is further complicated by the use of eloquent language and confident tone.”
The researchers categorize hallucinations into two types: faithfulness and factualness. Faithfulness hallucinations refer to instances where the generated text deviates from the input context, while factualness pertains to situations where the output contradicts established world knowledge. Each type of hallucination can lead to its own set of negative consequences.
The researchers describe LLMs as “authoritative liars” due to their tendency to maintain a confident tone, even when the information they provide is factually incorrect. This characteristic can make it even more challenging for users to discern the veracity of the information.
While the internet has long been struggling with false information, LLMs introduce a new layer of complexity due to their human-like interactions and ease of use. The researchers warn that the implications of this are particularly concerning in critical and sensitive fields. For instance, health chatbots may become increasingly popular for information-seeking and question-answering tasks, as many people find it easier to receive health advice from a chatbot than from a physician.
Moreover, the researchers caution that a model’s proficiency in one topic might lead users to overestimate its capabilities in other open-domain conversations, known as the “halo” effect. This could result in users placing undue trust in the model’s responses, regardless of the topic.
The researchers also highlight that the response formats of LLMs differ significantly from traditional methods of online information retrieval. They write, “Responses presented in the form of answers, rather than a collection of diverse links offering varying perspectives, may produce either inaccurate or biased information.” Therefore, it is crucial for users to understand how LLMs function and not blindly trust the responses they generate.
Malicious use of LLMs
The more sinister threat of LLMs is their misuse for spreading misinformation or executing social engineering attacks. LLMs can use the content of an ongoing conversation as context when generating new text. Malicious actors can exploit this ability to insert a user’s prior emails or social media posts into a prompt to generate disinformation, phishing messages, harassment, or other harmful content on a large scale.
Moreover, these actors can fine-tune LLMs to generate text that mimics the style of any individual. They can then distribute this LLM-generated content on social media platforms in an attempt to undermine the credibility of specific users.
LLMs can also thwart fact-checkers, who typically focus on monitoring and verifying widely circulated claims. Generative models can produce many variations of the same content, effectively bypassing traditional checks. The researchers warn, “Even if each variant reaches a small number of people, the cumulative impact could add to that of a highly viral piece while remaining invisible to fact-checkers.”
Another area that can have an important impact is automated agents that employ LLMs to imitate specific personalities. These agents can be deployed en masse with relative ease, instigating large-scale online discussions over sensitive topics, and making it more difficult to tell real from AI-generated discussions.
Ensuring information veracity with LLMs
In their paper, the researchers evaluate solutions to mitigate the potential risks associated with LLMs. One of the primary methods is alignment techniques such as reinforcement learning from human feedback (RLHF), which train the model to avoid hallucinations and the generation of malicious content. However, the authors warn that “the availability of open-source LLMs suggests that the effectiveness of alignment efforts, as well as other countermeasures, such as watermarking by large AI companies may be severely limited in mitigating the potential looming threats.”
Another proposed solution is retrieval augmented generation (RAG), where the model adds the content of relevant documents into the prompt to improve the factual accuracy of its output. The researchers note, “RAG mitigates the challenge of LLMs producing inaccurate content by enhancing their capabilities with external data. However, it requires efficient retrieval of grounded text at scale and robust evaluation.”
The advent of evaluation metrics such as GPTScore and G-Eval is also significant. These metrics have shown reasonable correlations with human assessments in various tasks, including consistency, accuracy, and correctness. Yet, the researchers point out that “weak correlations (around 20–25%) remain in factuality assessments, suggesting that there is room for improvement.”
The researchers also suggest customizing factuality instructions for specific domains, such as medicine or law. They write, “Similar adjustments have demonstrated improved factuality assessment in the case of SelfCheckGPT, which is based on the idea that consistently replicable responses are rooted in factual accuracy, as opposed to those generated through stochastic sampling or hallucination, which tend to exhibit more variation.”
Another recommendation is the creation of standards for proving the provenance and authenticity of text content, similar to those being developed for images and videos. They argue, “Since AI-generated content can cause harm when it spreads on social media, provenance proofs could be imposed to limit the spread of fake content before it has reached many people.”
Regulation is another potential solution, albeit a complex one. The researchers acknowledge the difficulty of this approach, writing, “On the one hand, controlling LLMs and their users can be as challenging as handling individuals engaged in phishing and misinformation. On the other hand, bad actors using open-source models will not be bound by regulation.”
The researchers also emphasize the importance of educating both the public, who will be exposed to AI-generated content, and professionals who will be using these models: “To promote AI literacy globally, we recommend three key actions: (i) conducting AI literacy programs for people of all ages, (ii) incorporating AI education with an emphasis on ethics into the graduate-level curricula, and (iii) sensitizing digital consumers to the potential harms and root causes of GenAI.”