GenAI before the courts: the legal risks in using artificial intelligence
Good governance is critical for firms that want to avoid legal risks and penalties with the use of generative AI, writes UNSW Business School’s Peter Leonard
Artificial intelligence (AI) and other advanced digital technologies are rapidly changing legal practice. Many law firms now have innovation teams that support litigators using a variety of AI-enabled tools.
Most lawtech AI to date has been developed for a particular application, tested and evaluated by both the vendor and the law firm for suitability and reliability for use of that application, and used by specialist, specially trained, personnel. Today, that is generally not the case with generative AI (genAI).
The new challenges of genAI
We have been developing governance frameworks and assessment processes for safe and responsible use of many forms of AI over recent years. The NSW Government AI Assessment Framework, now in the course of adoption by other governments across Australia, is an excellent example.
However, these frameworks and processes were not developed to address risk assessment of genAI, and mitigation of risks of harms as may arise from use of the new forms of large language models that power genAI. These risks, and the manner of assessing and mitigating risks of harms, are addressed at length in the author’s submission Safe and responsible AI in Australia after generative AI, as made in response to the Australian Department of Industry, Sciences and Resources Discussion Paper on Safe and Responsible AI in Australia, June 2023. In the following comments we follow the Discussion Paper’s approach of using the term generative AI or genAI to refer to both (1) underlying large language (foundation) models (i.e., GPT-4, Claude 2, LlaMA 2, Orca, Cohere, PaLM-2), and (2) applications and consumer utility services built upon those models (i.e., OpenAI's ChatGPT, Google Bard and Microsoft's M365 Copilot).
The biggest challenge with utilising genAI today is that the appropriate governance frameworks and assessment processes for safe and responsible use of genAI are not yet mature: that is, not yet well understood, systematic and comprehensive (and accordingly reliable), and readily applied. Today, it is very easy for people to place excessive reliance upon unreliable and unverified outputs from the use of genAI in circumstances where those outputs may cause harm or lead to legal liability (i.e., for misleading or deceptive conduct, or product liability, or breach of confidentiality or copyright claims).
As compared to special purpose AI products, genAI introduces new and quite different challenges, principally for four reasons.
One, by its nature as a word association tool text-based genAI will produce outputs that sometimes are wrong, at unpredictable times, but presented in a highly convincing manner. The benefits from use for many tasks are so compelling that genAI should and will be used in myriad ways, but the risk that humans will place excessive reliance upon unreliable outputs is very great.
Two, genAI tools may entail prompting through use of confidential information, which may result in improper disclosure and compromise of confidential or trade secret information. This risk is likely to decline over time as in-house (controlled) implementations of pre-trained LLMs become more common, and users of genAI applications become more familiar with how to change privacy settings of consumer genAI tools to ensure confidentiality of promoting data and how to prompt without making improper disclosures.
Three, it is very easy for anyone to use genAI for myriad tasks, so users may use genAI without proper evaluation or oversight at the organisational level. GenAI is the ultimate self-serve, easy-to-use, all-purpose ‘Swiss army knife’, so it will be used in unexpected, and potentially irresponsible, ways. Universal utility, and self-serve application, of genAI carries risks of inappropriate reliance by humans upon outputs that may be unreliable for the reliance placed upon them.
Fourth, quite different risks arise from the use of open foundation models to those associated with closed models (confined and evaluated within organisations and refined using that entity’s data). Many discussions about genAI governance focus on risks arising from uncontrolled and reckless prompting of open models. Over the next 12 months, many business and government agencies will deploy and use controlled and closed genAI systems, addressing some of the current risks associated with the use of open foundation models.
Getting uses of genAI by lawyers right, and wrong
Contrast two cases.
In the first known use by British judge appeal judge, Court of Appeal judge Lord Justice Birss used an AI chatbot to write part of the judgement. As he described it, genAI was “jolly useful”. Lord Justice Birss highlighted factors that made the use of such a tool potentially useful and appropriately risk mitigated, in a speech at a Law Society of England and Wales conference. He prompted ChatGPT to provide a summary of the relevant law, a narrow, well-defined aspect of the draft judgment within the scope of the tool’s capabilities. As he noted, this task was well within his own area of expertise, such that he could evaluate the text it generated and identify potential issues. He reviewed the AI output as he incorporated it into his draft judgment, which he “could recognise as being acceptable”. He noted, “most importantly”, that he retained “full personal responsibility” for the draft, and the prompting that led to its creation.
Contrast the extensively reported case where Steven Schwartz, a New York attorney with over 30 years of post-admission experience, represented Roberto Mata in an action against Avianca Airlines for injuries sustained from a serving cart while on the airline in 2019. At least six of the submitted cases by Schwartz as in a brief to the court of the Southern District of New York court “appear to be bogus judicial decisions with bogus quotes and bogus internal citations,” said Judge Kevin Castel in a May 2023 order. In a June 2023 ruling, Judge Castel ordered Schwartz and his law firm to pay $5000 for submitting without checks a brief with fake cases and then standing by the research.
Judge Castel stated: “Technological advances are commonplace and there is nothing inherently improper about using a reliable artificial intelligence tool for assistance. However, existing rules impose a gatekeeping role on attorneys to ensure the accuracy of their filings. [The Respondents] abandoned their responsibilities when they submitted non-existent judicial opinions with fake quotes and citations created by the artificial intelligence tool ChatGPT, then continued to stand by the fake opinions after judicial orders called their existence into question.”
The sanctions did not relate to the use of the genAI, but to the professional conduct missteps downstream of the genAI error. Under most judicial systems, a lawyer is answerable for their research, arguments, and representations under their core duties to the Court and their client. These duties continue to hold true when utilising any form of legal technology, including genAI.
One view might be that the Steven Schwartz example illustrates a transitional problem, and not a lacuna in regulation. Regardless of any view as to the small penalty imposed, no sensible lawyer would wish to suffer the reputational damage flowing from global reporting as to that lawyer’s failure to understand and mitigate the misinformation risks of reliance upon incorrect outputs from the use of genAI. Smart people often do dumb things when they don’t know that they are doing a dumb thing. Having read a media report as to Mr Schwartz’s failure to be the responsible human in the loop, other lawyers may be considered unlikely to replicate this error, regardless of whether a legislature creates a relevant prohibition, or a professional standards body makes a code about use of genAI or a ruling of professional misconduct.
However, inappropriate reliance upon genAI may also arise where non-lawyers draft submissions, and outside court systems. Erroneous outputs from genAI may arise through prompting for use for myriad tasks unrelated to the drafting of court documents, for which generative AI is now being used by individuals without those individuals knowing to exercise:
• appropriate caution as to the possibility of such errors, and
• responsibility to take appropriate steps to mitigate such risks.
On 3 November 2023, The Guardian Australia reported that a group of Australian accounting academics had offered an unreserved apology to the big four consultancy firms after admitting the academics used the Google Bard genAI tool to make incorrect allegations of serious wrongdoing in a submission to a parliamentary inquiry into the ethics and professional accountability of the consultancy industry. Part of the submission relied on outputs from the Google Bard AI tool, which the academic responsible for the relevant drafting had only begun using the week the submission was made, and apparently before the widespread reporting of the Steven Schwartz case referred to above. Analogous to the erroneous outputs in the Stephen Schwartz case, the outputs from the use of genAI tool included several case studies about misconduct that were then taken up in the submission. Some of these case studies were simply wrong, so the submission was wrong. The error in reliance by the academics was downstream of the erroneous genAI outputs, and explicable and arguably understandable given limited public knowledge at the relevant time of drafting of that submission, before the widespread reporting of the Steven Schwartz case.
And the courts continue to suffer problems arising from reliance by self-represented litigants upon fictitious case references: for example, in a recently reported tax tribunal judgment in England, the unfortunate party had cited nine cases in her submissions, all of them generated by a hallucinating ChatGPT. The citations, to the unresearched eye, appear convincing. The litigant admitted that she had no idea that she could check, and should have checked, these cases on publicly available and free judgement databases.
Meanwhile, the capabilities of applications rapidly evolve. In late November 2023, it was reported that an ‘AI paralegal’ aced the English Solicitors Qualifying Exam (SQE), demonstrating functioning legal knowledge in line with the Solicitors Regulation Authority’s expectations of junior solicitors. ‘Lawrence’ scored 74 per cent, as compared to the recent human pass mark of 53 per cent. Notably, 'Lawrence’ was a trained specialist legal AI, and not genAI built on a general purpose open LLM.
The New Zealand ‘best practice’ approach
Some court systems have issued judicial guidance to judicial officeholders or lawyers as to the uses of AI. In the United Kingdom, on 12 December 2023 a group of senior judges headed by the Lady Chief Justice, Baroness Carr, issued guidance for the judiciary on the responsible use of AI in Courts and Tribunals: As they noted: “Public AI chatbots do not provide answers from authoritative databases. They generate new text using an algorithm based on the prompts they receive and the data they have been trained upon. This means the output which AI chatbots generate is what the model predicts to be the most likely combination of words (based on the documents and data that it holds as source information). It is not necessarily the most accurate answer.”
New Zealand leads the way for both creativity and comprehensiveness in coverage: the NZ guidance applies across all NZ courts and tribunals. In April 2023, NZ Chief Justice Dame Helen Winkelmann commissioned a judicial Artificial Intelligence Advisory Group, which consulted with relevant parties on draft guidelines on generative AI in the context of New Zealand courts.
The final guidelines were issued on 7 December 2023. Creatively, the guidelines are tailored for three different audiences:
- Judges, Judicial Officers, Tribunal Members and Judicial Support staff
- Non-lawyers (i.e., self-represented litigants and persons giving expert evidence)
The New Zealand approach has much to commend. The guidance:
• focusses upon the different needs of the different audiences for guidance: court officers, lawyers and non-lawyers,
• is for uniform adoption jurisdiction-wide and across all courts and tribunals, and
• focusses upon uplift of skills as to good practice, so as to mitigate risks of improper or unreliable information being placed before a court or tribunal,
• recognises that risks will evolve over time, and that guidance should be adaptive to changing circumstances and then promptly updated to reflect relevant changes in risks and possible mitigations.
Pending guidance or other pronouncements by courts and tribunals as to uses of genAI, the best cautions for lawyers are:
• genAI will be used within most organisations and professions, so don’t ban it, but set up the right governance frameworks and assurance review processes to facilitate its careful and appropriate use,
• be careful about what information is entered into prompts of public (outside the organisation) genAI applications, as you may be exposing confidential information or breaching privacy law,
• devote time and resources to training your people about how to use genAI appropriately, and
• ensure that genAI outputs are checked and verified by appropriately skilled and experienced lawyers before reliance is placed upon those outputs.
GenAI is already rapidly improving. It is already ready for use as an aid for myriad tasks. The productivity benefits are such that organisations, including law firms, should be exploring those uses, with appropriate cautions.
Safe and responsible use of genAI is all about good governance to ensure that people using genAI know how to prompt it well, and when and how to check outputs from those prompts for errors or illegal content.
Because good governance of genAI is a new thing, lots of organisations will get it wrong. This is a global problem. No country is unique or different. The problem will decline over time as the genAI improves (has less hallucinations) and people become more familiar with its safe and responsible use. However, failures in the governance of genAI are a significant problem today.
Changes to laws may assist in creating the right incentives for organisations to be careful about how they use genAI. But even with changes in laws, many organisations will get it wrong. Care, and good governance, are key.
Peter Leonard is a Professor of Practice for the School of Information Systems & Technology Management and the School of Management and Governance at UNSW Business School and immediate past chair of the Australian Computer Society’s AI Ethics Committee and a member of the NSW Government’s AI Review Committee.