Out-Law / Your Daily Need-To-Know

Large language models and the law: data sourcing considerations

05 Apr 2023, 2:23 pm

ChatGPT, Bard and large language models (LLMs) are nascent, but their widespread use is already demonstrating the potential for future tools that will provide accurate real-time information useful for solving business and personal issues.

For financial services, this technology could change every aspect of doing business: from the use of data to support personal finance, through to complex investment decisions, with interface giving users the information they need when they need, for example. It is a compelling proposition. So too is having technology that can write code in response to human instructions alone.

Every forward-looking business is investigating what this technology could mean for its future. However, before trying or buying an LLM, or services from an LLM provider or developer, organisations should be aware of the legal landscape within which this technology currently operates.

Data protection concerns about the use of LLMs have been well documented. Compliance measures should be put in place to ensure that legal grounds for processing personal data, whether based on consent, legitimate business interests or other permitted grounds, can be evidenced.

Copyright and database right infringement policies should address the risk of an LLM infringing third party proprietary rights, particularly those which relate to datasets which have been compiled for exclusive purposes.

In many jurisdictions there are also criminal laws which prohibit access to information stored on ICT systems without permission. In the UK, the main legislation that applies in this regard is the Computer Misuse Act.

Luke Scanlon

Head of Fintech Propositions

For financial services, large language models could change every aspect of doing business

The Computer Misuse Act prohibits “unauthorised” access to data stored on third party systems but does not define when access will be considered unauthorised. Determining whether any process for accessing data may be consider “unauthorised” is therefore an important factor for all businesses to consider when evaluating the risks associated with LLMs trained on third party data.

The UK is consulting on extending the application of the Computer Misuse Act. Under the plans, not only would it be an offence to access systems without consent, it would also be an offence to use data obtained from a person who has accessed data without appropriate permission. If the Computer Misuse Act is extended in this way, businesses intending to use LLMs, in addition to those which develop them, will want to ensure that the technology providers they work with have appropriate permissions in place to access all data used to train models or otherwise relevant to their use.

Risk assessment and due diligence activities should be revisited too, to account for the specific risks which arise when a business develops or incorporates an LLM into the technology it uses to conduct business or provide products or services to customers. A series of questions should be asked about the sources from where the data has been taken, the licensing arrangements which attach to that data, and the methods used to source the data.

It may be that responses to risk assessment and due diligence questionnaires set out that some of the data is publicly available and that the platforms from where the data has been obtained allow for, and even encourage, public access. However, even when publicly available data is processed and access may be enabled by technical means, processes should be put in place to ensure that legal and regulatory risks are addressed.

Lists of data sources should be comprehensive and the lineage of the data sourced should also be understood to determine whether any licensing arrangements subsequent to its creation may have an impact on its future use. EIOPA, the EU financial supervisory authority, has referenced the potential for technology supply chain complexities to “create challenges for the commissioning companies to obtain information needed to assess system compliance”, including about data provenance. Financial institutions therefore have added reason to ensure that they are obtaining sufficient information regarding the provenance and lineage of data used by LLMs.

Thorough reviews of legal terms attaching to the data should be undertaken. Reviewing legal terms is particularly important to reduce the risk that the permissions which a data provider or platform owner has attached to its use have not been set out narrowly or in a way that specifically prohibits its use in relation to training LLMs.

Case law has highlighted that where a person could have sought authority to access data but elected not to do so, the person whose systems were accessed lost the opportunity to refuse access and that this is a factor relevant to determining whether unauthorised access had occurred.

Contractually, steps can be taken to reduce the risks. Warranties can be sought regarding the comprehensiveness of lists of data sources and licensing arrangements. Protections can also be put in place which require notifications to be made on becoming aware of potential infringements relating to data sources or which require action to be taken after concerns have been identified.

For financial institutions regulated in the UK by the Prudential Regulation Authority (PRA) it is also a regulatory matter. The PRA has said they must ensure that they “obtain appropriate assurance and documentation from third parties on the provenance or lineage of the data to satisfy themselves that it has been collected and processed in line with applicable legal and regulatory requirements”.

Asking the right due diligence and risk assessment questions is the starting point. Acquiring an understanding of when responses provided are high risk and low risk may provide a level of assurance appropriate in some circumstances. In others, robust contractual protections will need to be put in place and internal governance structures, polices, processes and controls will be necessary to take advantage of the huge potential LLMs have to transform business.

Written by

Luke Scanlon

Head of Fintech Propositions

+44 (0) 20 7490 6597 [email protected]

View Profile

Latest News

08 May 2025

What sectors are you interested in?

Want to speak to an advisor from your closest office?

Out-Law / Your Daily Need-To-Know

Large language models and the law: data sourcing considerations

Latest News

Out-Law Legal News Regression test 08052025

Out-Law Analysis regression test 08052025

Out-Law Legal Update regression test 08042025

Out-Law Analysis regression test 08042025

Out-Law Legal News regression test 08042025

Editor's Pick

AI projects need not be delayed for new UK rules, says expert

What the UK’s six AI principles mean for financial services

Trending topics across the site:

Sectors and what we do

Sectors we work in

What we do

What sectors are you interested in?

Sign-up and we’ll remember your preferences

Want to speak to an advisor from your closest office?

Large language models and the law: data sourcing considerations

Latest News

Editor's Pick

Don't miss a thing

You might also like

Trending topics across the site:

Sectors and what we do

Your privacy matters to us