Copyright and the News

Aggregation, AI and the Challenges of Technological Innovation

Apr 29, 2025

A painting of a robot holding a newspaper

AI-generated content may be incorrect.

Preface

This article was inspired by a Stanford Technology Law Review piece by Olivia S. Hiltbrand entitled “Guarding the News Media’s Intellectual Property in the Age of Generative AI” (2025) 28 Stan Tech L Rev 35. The article explores the challenges and implications of generative AI on journalism's intellectual property. It highlights concerns about AI diminishing work opportunities for journalists and infringing on copyrighted content used to train AI models. The article traces the evolution of copyright law in journalism, examines the impact of content aggregation, and discusses ongoing legal battles against AI companies. It suggests legislative action, regulation, and public funding as potential solutions to protect journalists' work and sustain the industry. Ms Hiltbrand’s article emphasizes the critical role of journalism in democracy and the need for safeguards to ensure AI does not undermine this essential function.

Ms. Hiltbrand’s article views the issue through the lens of United States copyright law which has developed in a different manner from other copyright debates is known as “fair dealing” in New Zealand copyright law. Fair dealing shares a number of elements present in fair use but from a nuanced perspective has a number of different elements.

Although the question occupying my mind was whether Artificial Intelligence (AI) might have an impact on journalism and the news it meant that the enquiry had to traverse the thickets of copyright principles before addressing the question.

I prefer this type of enquiry because it serves to focus thinking and return to first principles before diving into the main question and making a number of assumptions which may or may not be justifiable.

By way of general observation the intersection of law and technology has been a study of mine for the last 25 – 30 years. It resulted in my teaching Law and IT at Auckland University Law School for 19 and has given rise to five editions of a text on Internet law and one monograph entitled “Collisions in the Digital Paradigm – Law and Rule-making in the Internet Age.”

Curiously enough, many of the cutting-edge developments in law and digital technologies involve copyright – I recall the early days of Napster and music file sharing which in New Zealand resulted in a law change.

The challenge to copyright posed by the Digital Paradigm is that “copying is” in digital systems. Copying enables digital systems to work. Now this is different from copying other people’s work which is what the current problems are about.

So I shall start with a very brief and superficial look at some principles of copyright law in the context of news before getting into the issues of aggregating news content and using news content to train Large Language Model (LLM) generative Artificial Intelligence (AI) systems.

Introduction – A Copyright Overview

Copyright law is fiendishly complex. It would be beyond me to write a full treatise of the copyright issues surrounding journalistic products and news.

What I will try to do is provide a brief summary of the issues surrounding copyright in news. This may help help explain why I am of the view that an alternative to the Fair Digital News Bargaining Bill may lie in copyright law which I have written upon elsewhere.

So I shall start with some basics, using journalism and news to provide examples.

General Propositions

I shall start with some very general propositions.

Facts in and of themselves do not attract copyright. However, if a journalist takes those facts and incorporates them into a story, he or she has created a literary work.

Literary works are protected by copyright. Section 14(1)(a) of the Copyright Act 1994 creates a special property right in an original literary work.

Copyright protects expression. If a person has an idea and writes it down that person holds the copyright, not in the idea, but in the way that it is expressed.

A journalist may ascertain a set of facts and incorporate them into a story. The facts themselves exist outside of their expression but the way that the journalist incorporates them into a story and expresses then is subject to copyright.

When it is published this material becomes publicly available. Someone else might gather the same facts and write a story incorporating them in a different way. That story, too would be subject to copyright even although the same facts were used.

It is important to note that the discovery of facts is not copyrightable but it is the originality of the expression that protects writings and other creations that are the products of intellectual labour.

Thus, copyright neither protects nor rewards the discovery of facts. While much of journalism involves finding and reporting facts, journalists also make highly creative decisions on sentence structure, word choice, and organization to deliver news with impact and accuracy.

So far, so good. Now things start to get complicated.

Infringement and Fair Dealing

If I take an article by someone else and republish it as my own I have infringed copyright. That means that I could suffer a number of rather expensive consequences depending on the nature and purpose of the infringement.

Copyright law recognises that although the intellectual property right may lie with the creator of the content, there is a benefit to the wider community in allowing some of that material to be shared.

That is what is known in general copyright terms as fair use or, in the language of the New Zealand Copyright Act, a permitted use and the term used in New Zealand law is fair dealing.

Fair use allows limited use of copyrighted material without needing permission from the copyright holder. It’s designed to balance the interests of creators with the public interest in the free flow of information and ideas.

Criticism and commentary are occasions where fair use may be justified. If I am writing a critique or a review of “The Lord of the Rings” it may help my argument if I can include quotes from the text of the book. But I can’t take too much and I certainly cannot reproduce the entire book.

Fair use\dealing is a very complex and nuanced aspect of copyright law, and I emphasise that I am giving a very broad-brush outline.

Permitted Uses – Criticism, Review and News Reporting

Part 3 of the New Zealand Copyright Act 1994 sets out the acts that may be permitted in relation to works protected by copyright. Section 42 sets out the permitted uses in the area of criticism, review and, importantly for this discussion, news reporting.

The section reads as follows:

42 Criticism, review, and news reporting
(1) Fair dealing with a work for the purposes of criticism or review, of that or another work or of a performance of a work, does not infringe copyright in the work if such fair dealing is accompanied by a sufficient acknowledgement.
(2) Fair dealing with a work for the purpose of reporting current events by means of a sound recording, film, or communication work does not infringe copyright in the work.
(3) Fair dealing with a work (other than a photograph) for the purposes of reporting current events by any means other than those referred to in subsection (2) does not infringe copyright in the work if such fair dealing is accompanied by a sufficient acknowledgement.

Let’s look at each subsection.

Subsection 1 - I can use copyright material for the purposes of criticism or review if my use is accompanied by a sufficient acknowledgement. And my use has to amount to fair dealing.

Subsection 2 – I do not infringe copyright in a work if I am reporting current events. And again, my use has to amount to fair dealing.

Subsection 3 – expands on subsection 2 – the fair dealing for the purpose of reporting current events must be accompanied by a sufficient acknowledgement.

Sufficient acknowledgement requires that the title of the work be identified and (save in certain specific circumstances) the author. The same applies to fair dealing with any work other than a photograph for the purposes of reporting current events by some means other than a sound recording, film, or communication work.

Where the fair dealing is for the purposes of reporting current events by means of a sound recording, film, or communication work no acknowledgment is needed.

Fair Dealing – An Expanded Discussion

A further common feature in all those subsections is “fair dealing” which, as I have already said, is termed fair use in US copyright law. Fair dealing does not have a specific definition in the Copyright Act. As is the case with so many aspects of the law, it depends on the circumstances. And the definition of fair dealing has been developed by the Courts.

In the United States the Courts consider four factors when evaluating fair use. These are:

Purpose and character of the use
Is the use commercial or for nonprofit/educational purposes? Transformative uses (that add new expression or meaning) are more likely to be fair.
Nature of the copyrighted work
Using factual or published material is more likely to be fair than using creative or unpublished works.
Amount and substantiality of the portion used
Using a small excerpt may favor fair use—unless it’s considered the "heart" of the work.
Effect on the market
If the use could harm the market or potential market for the original work, it's less likely to be fair.

The New Zealand approach to fair dealing has been influenced by UK law because the UK copyright legislation is similar to ours. The tests that have been developed are similar to those applied in the United States.

The factors that the Courts in New Zealand may consider are

Purpose of the dealing
– Whether it falls under the specified permitted purposes such as:
research or private study (s42)
criticism or review (s42)
Nature of the work
– Some works may have stronger protection, such as unpublished or confidential material.
Amount and substantiality of the portion used
– Both the quantity and the quality (i.e. the “heart” of the work) are relevant.
Effect of the use on the potential market
– If the dealing could substitute for the original or harm its market, it’s less likely to be fair.
Availability of alternatives
– If the user could have used something else or less of the work, that may affect fairness.

Thus the focus is upon whether the use:

· Falls within a permitted statutory purpose,

· Uses no more than necessary,

· Has minimal market harm,

· Is contextually fair.

The following checklist may summarise and help explain the situation

1. Purpose of the Use

Must fall under a permitted purpose: research/private study, criticism/review, reporting current events, or parody/satire.

✔ If for one of the permitted purposes and non-commercial.

2. Nature of the Work

Consider if the work is published, creative, confidential, etc.

✔ If the work is factual or published. ✖ If unpublished or sensitive.

3. Amount and Substantiality

Looks at how much of the work was used and whether the "heart" of the work was taken.

✔ If only a small, non-essential portion is used. ✖ If a substantial or key part is used.

4. Effect on Market Value

Does the use act as a substitute or harm the market for the original work?

✔ If minimal or no economic impact. ✖ If it harms potential sales or licensing.

5. Availability of Alternatives

Could the user have conveyed their point using something else?

✔ If no reasonable alternative exists. ✖ If other options were available.

6. Commercial vs Non-commercial Use

Whether the use was for profit or private purposes.

✔ Non-commercial/private study. ✖ Commercial exploitation.

7. Transformative Nature (influenced by UK law)

Does the use add new expression, meaning, or message?

✔ If it comments, critiques, or transforms the original. ✖ If it merely copies without change.

Having discussed the broad principles surrounding fair dealing I will now turn to look at the question of copyright and news in a little more detail.

What happens when one news agency uses material published by another news agency?

Examples from the Cases

A couple of examples from decided cases are helpful. Both involve television broadcasts.

The first case is Media Works NZ Ltd v Sky Television Network Ltd (2007). Mediaworks – the owner of TV3 – successfully tendered to telecast the 2007 World Rugby Cup. As broadcaster it had the copyright in the broadcast.

Sky was an unsuccessful tenderer. It had a subscriber audience. Sky began showing excerpts from TV3’s footage. These excerpts were of short duration - usually no more than a minute. TV3 sought an injunction to restrain any use by Sky of these excerpts outside particular news programmes. The principal issue in the proceedings was whether Sky’s use of the footage fell within fair dealing for the purpose of reporting current events under s42 Copyright Act 1994. For the purpose of the hearing Sky did not dispute that use of the excerpts was use of a substantial part of the work.

The threshold issue was this - did this amount to fair dealing for the purpose of reporting current events. The Court observed that in some cases it may be clear that the material is not being used for that purpose and therefore infringing. In less clear cases the factors relevant to both this threshold issue and whether something is a fair dealing, may be so indistinguishable or so connected that it is necessary to step back and consider the matter as a composite test.

The term “for the purpose of reporting current events” involved an objective test. The fact that “news” coverage (the statutory term in Australia) is interesting or even entertaining to some people does not negate the fact that it may be news.

There can be an overlap between news coverage and entertainment but at a certain point, use of footage will cross the line into entertainment and therefore fall outside the fair dealing exception.

The content of the report had to be considered in the context of the nature of the programme incorporating the material. In other words the purpose for which the footage was being used fell to be determined by reference to the content of the programme and the context in which the material was used.

Sky had used the footage in a number of different programme formats on its separate channel, Prime, in a Sky news programme, on Sky Sport Channels 1, 2 and 3 and on the Rugby Channel.

No objection was taken by TV3 to the use of Prime News or Sky News but objection was taken to the programmes 365 Headlines, The Cup, Reunion and The Crowd Goes Wild. The Court accepted that Sky Sport 1, 2, and 3 and Sky Sport Highlights were to be taken as a single offering.

The Cup, The Crowd Goes Wild and Reunion were magazine programmes and the footage used in them could not be described as occurring in the context of current events. By contrast, 365 Headlines was a sports news update in format.

Fair dealing was not available in the case of The Cup, The Crowd Goes Wild and Reunion. The footage used in these programmes was not the reporting of current events.

When considering whether the use was fair, the rate of repetition of TV3’s footage across all of the programmes and the use of that footage in magazine style programmes, unfairly undermined TV3’s ability to exploit its copyright. That factor was not outweighed by the public interest.

Sky was achieving an intensive level of broadcasting of TV3 sourced footage. The use of the material in magazine programmes would inevitably compete with TV3 in a key area that TV3 was targeting. Sky’s coverage went far beyond what was necessary to meet the public interest.

In the second example, (Sky Network TV Ltd v Fairfax New Zealand Ltd [2016] 3 NZLR 854; [2016] NZHC 1883) Sky was the plaintiff. The facts were these.

Sky entered into a NZ Media Rights Agreement (Agreement) with International Olympic Committee (IOC) and purchased exclusive broadcast and exhibition rights to Rio Olympic Games.

Sky and Fairfax attempted to negotiate NZ Supplementary News Access Rules (SNAR) to establish extent to which Fairfax could use Sky's video material when reporting on the Olympics but negotiations broke down.

Fairfax streamed footage from Sky's Olympic broadcasts on its Stuff website. Sky alleged that the streaming breached Copyright Act 1994 and commenced urgent proceedings three days after Games' opening ceremony.

Fairfax conceded that it operated outside SNAR and News Access Rules (NAR) issued by IOC and also conceded an automatic playlist function prior to hearing was not within fair dealing exception under s42 of Act. This allowed viewers to watch several up-loads of Olympic material simply by waiting through advertisements. This feature meant Fairfax had free and greater access than even the supplementary licensee's were contractually permitted.

One of the principal issues was whether or not conduct by Fairfax could be justified as fair dealing with Olympic broadcast material for purpose of reporting current events.

The Court held that Fairfax's conduct in relation to the automatic playlist function was outside any sensible interpretation of fair dealing under s 42(2) of the Copyright Act 1994. The automatic playlist could not be defended as fair dealing and Sky had a reasonable argument of a significant infringement of the Act.

There was gross divergence between the standards in the supplementary news access scheme and Fairfax's automatic playlist feature. The online streaming feature was a significant and blatant deviation from what was considered acceptable under the supplementary news schemes and agreements. Whilst it was not appropriate for the court to create a rule demarcating what amounted to fair dealing, it was appropriate to identify egregious conduct.

The Judge in this case made some observations about the decision in the Media Works Case. He said:

“The reason I am not going to refer to her decision is that it was decided nine years ago, which is a long time in the digital world. What may be understood as a reasonable time for news coverage in 2007 may not be the same now. Indeed, there is expert evidence filed in the Court that public expectations to see news is much higher now than they were then. And I recognise it is inevitable, and indeed reflected by all the NARs, that on-line reporting of news events is typically these days, where possible, accompanied by a short stream of film or photographs. Indeed, I think that can be accepted as a notorious fact without the need for proof by experts.”

Following from this very general discussion I now turn to the question of whether or not the aggregation of news content infringes copyright.

Aggregation of News Content

What do I mean by the aggregation of news content.

The term aggregation of news content refers to the process of collecting, organizing, and republishing news stories and related media from various sources into a centralized platform or location for easy access and viewing. This practice is commonly carried out by news aggregators, which are websites, apps, or software tools designed to gather news content from multiple publishers and present it in one place.

News aggregation has evolved from traditional syndication practices where newspapers republished articles from other publishers. In the digital age, it has become more sophisticated with automated tools and algorithms that enable real-time updates and broader coverage

The key features of news aggregation are:

[1] Centralized Access: Aggregators compile news articles, headlines, and updates from diverse sources into a single interface, making it easier for users to find relevant information without visiting multiple sites.
[2] Automation and Syndication: Many aggregators use automated technologies like RSS feeds or algorithms to continuously pull updates from news websites. Some platforms also rely on human curation to select and organize content.
[3] Content Summarization: News aggregation often involves reshaping or summarizing content for quick consumption while crediting the original sources. This allows users to scan key points without reading full articles.
[4] Formats and Types: Aggregated content can include text articles, videos, podcasts, images, and social media posts, depending on the aggregator's focus

There are a number of benefits in this process.

· News aggregation is efficient in that it saves time by consolidating updates from multiple sources into one location.
· Aggregation helps users find relevant news on specific topics or trends without manually searching through numerous websites.
· Finally, aggregators make information readily available on platforms like Google News or Yahoo News, often tailored to individual preferences.

What is important is the amount of content that is aggregated from the source news website. If the entire article appears on the aggregator’s site, arguably that would not be fair dealing even if there is attribution.

But if the content that is aggregated is a “snippet” like the results of a Google search, with a link to the full article on the source website that would amount to fair dealing in my view.

One of the problems that aggregators face is the different approaches to copyright law in different jurisdictions.

A common feature is that facts and ideas in and of themselves are not protected by copyright. However, the way facts are expressed (e.g., specific language, structure, or creative elements) is protected under copyright law. Aggregators that use headlines, ledes (opening sentences), and excerpts may infringe if these elements contain original expression.

In the U.S., fair use may apply if the aggregation transforms the content (e.g., by adding commentary or analysis) or uses only a small portion for purposes like criticism or education. However, this is a defense raised in court rather than a clear-cut permission.

In other jurisdictions, like the EU, "fair practice" under copyright laws may impose stricter limits on reuse without explicit permission.

A further complication arises where publisher websites may encourage link backs and canonical tags acknowledging them as the content generator.

In this way linking drives traffic back to the original source and canonicals allow webmasters to prevent content duplication, thus aiding search engine optimization. News aggregators rely on fair use or fair dealing principles and the fact that they are driving traffic back to the source.

News aggregation impacts copyright in journalism in the following ways:

· Unauthorized Use of Content: Aggregators often use excerpts or summaries of news articles without proper licensing or compensation, raising concerns about copyright infringement.
· Erosion of Revenue: By republishing content, aggregators divert traffic and advertising revenue away from original news publishers, undermining their financial sustainability.
· Blurred Lines Between Use and Infringement: Courts have struggled to determine whether aggregation constitutes fair use or copyright infringement. Factors like the extent of content used and whether it substitutes for the original source are key considerations.
· Reduced Incentives for Original Reporting: If aggregators profit from republished content without compensating journalists, it discourages investment in original reporting, which is costly and resource-intensive.
· Legal Precedents: Cases like Associated Press v. Meltwater (US) have found that some aggregators act as closed systems that replace the need for original subscriptions, which courts have ruled as infringing. However, fair use has been upheld in cases where aggregation drives traffic to original sources.
· Bargaining Power Disparities: News publishers often lack the leverage to negotiate fair compensation with aggregators, leaving them at a disadvantage.

Thus, while aggregation can provide benefits like faster access to news, it poses significant challenges to copyright protections and the financial viability of journalism.

Even so the position of news aggregation remain fuzzy. Most cases of any significance have been settled out of Court.

Google signed a licensing agreement with Agence France Presse (AFP) consequent to an out-of-court settlement over a copyright infringement dispute. AFP had taken Google top Court for indexing AFP content without prior authorization. The licensing agreement was seen as a way forward. Licensing allows both publisher and aggregator to maximise their reach by leveraging each others strengths without being encumbered with the litigation process.

This approach, with a few changes to our Copyright Act, would be an alternative path to compensating publishers rather than the Fair Digital News Bargaining Bill.

Training AI with News Content

The copyright implications of training AI with news content are complex and evolving, as they intersect with issues of fair use, authorship, and liability.

Training AI models often involves scraping large volumes of data, including copyrighted works such as news articles, images, and videos. This process may infringe copyright laws if done without permission.

For instance, Getty Images has sued Stability AI for using its copyrighted photos to train AI models, highlighting potential liability for developers using unlicensed data. Getty argued that Stability AI unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images without a license to benefit Stability AI’s commercial interests and to the detriment of the content creators.

Getty Images provided licenses to leading technology innovators for purposes related to training artificial intelligence systems in a manner that respects personal and intellectual property rights. Stability AI did not seek any such license from Getty Images and instead, according to Getty, chose to ignore viable licensing options and long‑standing legal protections in pursuit of their stand‑alone commercial interests.

On 14 January 2025 Justice Joanna Smith in a decision on a procedural matter surrounding the case expressed concern at the delays experienced in the case. The matter has a trial date in June 2025.

Across the Atlantic the New York Times has brought proceedings against OpenAI and Microsoft alleging that the defendants were responsible for inducing users to infringe its copyrights. The Times sued OpenAI and Microsoft in 2023, accusing them of using millions of its articles without permission to train the large language model behind its popular chatbot ChatGPT.

Judges are beginning to consider whether the tech companies are immune from the main allegations based on U.S. copyright law's fair use doctrine, which allows for the unauthorized use of copyrighted works in some circumstances.

The Judge in The New York Times Co v. Microsoft Corp allowed the Times to continue pursuing claims that OpenAI's output contained copyrighted material and led to user infringement, breaking with California judges who have dismissed related allegations.

However, Courts have not yet addressed the core question of whether tech companies' unauthorized use of material scraped from the internet to train AI infringes copyrights on a massive scale.

The New York Times said “All of our copyright claims will continue against Microsoft and OpenAI for their widespread theft of millions of The Times's works, and we look forward to continuing to pursue them."

In the case of Bartz v Anthropic PBC brought by some authors against an AI company Anthropic, it was argued by Anthropic that its training was protected by the copyright doctrine of fair use, which allows for the unauthorized use of copyrighted material under certain circumstances. Anthropic said its AI training was a "transformative" act that U.S. copyright law "not only allows but encourages" because it promotes human creativity.

The company said its system copies the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." The case is proceeding.

The “transformative” argument may have merit.

How do generative AI models use the text that has been scraped.

LLMs are trained on massive corpora of text— books, websites, articles, forums, code, etc.—to learn the general structure and use of language.

What happens in pretraining is that the model gets a chunk of text and is tasked with predicting the next word (or token).

It doesn’t "know" grammar or facts—it just learns patterns by adjusting billions of parameters to reduce its prediction error.

Example: Given “The lawyer wrote a detailed…”, the model might learn to predict “brief”.

This process is self-supervised, meaning the labels (the next word to predict) are naturally part of the text itself.

Thus, given the way that LLMs work on a predictive “next word” based on learned patterns the “fair use” argument is developed in this way.

Fair use applies to AI training by allowing the use of copyrighted works without express authorization under certain conditions.

In the context of AI training, fair use, using the test developed in the US, is evaluated based on four factors:

1. Purpose and Character of the Use: AI training is often considered highly transformative because it involves analyzing existing works to derive metadata or uncopyrightable abstractions and associations, rather than communicating the original expression to a new audience. This transformative nature can favor a finding of fair use, especially if the AI model does not produce outputs that are substantially similar to the training data.
2. Nature of the Copyrighted Work: This factor typically has less impact in cases involving nonexpressive use, such as AI training. Courts have generally placed less weight on whether the work is more creative or factual when the use is transformative.
3. Amount and Substantiality of the Portion Used: Although AI training often involves copying entire works, this is considered reasonable if it is necessary to extract unprotected elements like facts and ideas. The complete reproduction is justified as an intermediate technical step in an analytical process that does not lead to communicating the underlying original expression to a new audience.
4. Effect of the Use on the Potential Market: Using copyrighted works for AI training is unlikely to have a significant impact on the market for the original works, as long as the use is non-expressive. Courts have generally rejected arguments that copyright holders have a right to charge for non-expressive uses, focusing instead on whether the use serves as a substitute for the original work.

I emphasise that this analysis is based on US copyright principles and is speculative only.

Once again different approaches to copyright law complicate the matter. For example the NY Times Case has a cause of action in contributory copyright infringement which is not available under New Zealand law.

The European Union, on the other hand, allows rights holders to object to the use of their works for AI training.

In China, courts have upheld copyright protection for human-authored components but denied it for fully automated outputs.

In New Zealand, there is no case law yet on AI copyright infringement, but companies like Stuff have proactively blocked AI scraping of their content. As AI becomes more prominent, companies appear to be taking copyright and intellectual property issues into their own hands.

Copying someone else’s copyrighted works in order to ‘train’ AI does not appear to fall under any of the exceptions and may be a breach of New Zealand copyright laws.

This is different from the United States, which has much broader “fair use” exceptions to copyright protection and allows copying of works where it is not for a commercial purpose and it is used to create something new (or for transformative purposes).

Generative AI impacts upon journalism in a number of ways. These can be summarised as follows:

· Unauthorized Use of Copyrighted Content: Generative AI models are often trained on large datasets, including copyrighted journalism work, without permission or compensation. This infringes on journalists' intellectual property rights and deprives news outlets of critical revenue.
· Fair Use Challenges: AI companies argue that using copyrighted content for training constitutes "fair use," but this remains legally uncertain. Courts have yet to establish clear standards for whether such use is transformative or infringes on copyright.
· Difficulty in Tracing Outputs: Generative AI creates new outputs based on training data, making it challenging to trace specific outputs back to original copyrighted sources. This complicates efforts to prove infringement.
· Impact on Revenue and Viability: By generating content that mimics or summarizes news articles, AI tools can reduce traffic to original news sources, undermining their financial sustainability.
· Barriers to Legal Action: Journalists face obstacles in pursuing copyright claims, such as the need for copyright registration, which is impractical for the high volume and time-sensitive nature of news content.
· Misinformation Risks: Generative AI can distort or misrepresent journalistic content, further complicating the role of journalism in providing accurate and reliable information.

The main problem is that the use of the copyrighted material lies in the training of generative AI or Large Language Models. The result is that when a prompt or question is posed of an LLM it is rare that the language of the article involved will be reproduced verbatim although in the New York Times Case it is alleged that text has been reproduced. But if we put that to one side generally the ultimate use of the material is transformative in that the content is articulated in a different way.

But more significantly the content is used to train the AI model to make associations between words rather than reproducing content. It analyses the relationship between words in an effort to provide meaning in response to a prompt or query. It is in this respect that the use of content from journalistic (and other) sources may not be infringing.

The Getty case is different because it involves the use of images to train an AI model and those familiar with AI will know that images can be created from an AI prompt as well as language. In addition an image from an AI prompt my be required to be in the form or style of a particular artist. Thus while not reproducing an actual Dali painting, the AI may use motifs from Dali paintings that have been used to train the model, thus resulting in a “Daliesque” product.

In Conclusion

Content creators should be mindful when using AI to create works. AI generated content may be a derivative of a copyrighted work, depending on the source and information used to train the AI software.

Content creators should also consider whether information or works can be easily scraped from their websites by AI companies and whether copies or derivatives of their works are already being generated by AI.

Anyone using AI should be conscious of the terms and conditions that each AI software implements and who owns the content generated by the AI software.

The AI and intellectual property landscape are quickly unfolding and will likely become a normal consideration for all content creators in the near future. What will be of interest will be the outcome of the Getty case in England and the New York Times case in the US.

Very much a case of “watch this fascinating space”.

Deborah Coddington

Apr 30

David, I am a huge fan of your writing, and I say this with the greatest respect. You trained in law. I trained in journalism. Could I please ask you to edit your posts before you publish them. It would make them much easier to read. Increasingly I find myself trying to edit them so I can make sense of them. I'm sorry to cause offence, but if I do this in my posts, please point it out to me too. Thanks.

Expand full comment

3 replies by A Halfling’s View and others

3 more comments...

A Halfling's View

Discussion about this post