A Hybrid Path: How India Might Balance AI Innovation and Creative Rights
Background
Imagine teaching a machine to paint like Raja Ravi Verma, write like Arundhati Roy, or compose music in a style reminiscent of A.R. Rahman. To achieve that, the machine must absorb thousands of hours of music, millions of pages of text, and vast collections of visual art. This is the central reality of generative AI. It also brings us to one of the most consequential legal questions of our time: can AI models lawfully train on copyrighted material without the creator’s permission?
For several years, this issue appeared to be a remote controversy unfolding primarily before courts in the United States and Europe. That is no longer so. With the Delhi High Court having reserved judgment in ANI Media Pvt. Ltd. v. OpenAI, India’s first significant AI-copyright dispute, the question has now come squarely before Indian courts. The Court’s decision is likely to play an important role in shaping the initial contours of India’s approach to AI regulation and copyright policy. It will indicate whether India is prepared to leave the matter largely to market forces and litigation, or whether legislative intervention will be needed to redraw the boundaries of creativity and technological development.
The Global Playbook: How Other Jurisdictions Are Responding
To understand why the decision in ANI v. OpenAI matters, it would be useful to begin with the approaches emerging in other parts of the world:
- United States: the “fair use” argument. Technology companies in the US rely heavily on the flexible doctrine of ‘fair use’. Their position is that AI training is transformative, comparable to a person reading books in order to learn how to write. Creators, unsurprisingly, characterize the same process as large-scale unauthorized copying.
- European Union: the “opt-out” model. European frameworks have leaned toward text and data mining exemptions. In practical terms, AI developers may scrape material unless rights holders take affirmative technical steps to opt out.
- Japan: the “AI-friendly” model. Japan has taken perhaps the most generous view of AI training. Its copyright law broadly allows works to be used for information analysis, including text and data mining, so long as the AI developer is not trying to enjoy or reproduce the creative expression in the work. In simple terms, Japan treats AI training more like machine reading than human copying, while still leaving room to object where the use unfairly harms rights holders or results in substitutional reproduction.
- India: the “evolving model”. India’s position is still evolving. Unlike the US, India does not have a broad, open-ended fair use doctrine; it relies instead on the narrower concept of fair dealing. This means that mass scraping of copyrighted works for commercial AI training does not comfortably fit within any existing exception. The result is legal uncertainty: AI developers need clarity to innovate, while creators and publishers want assurance that their work will not be absorbed into training datasets without consent or compensation.
The Dispute Before the Delhi High Court
In ANI v. OpenAI, news agency ANI alleged that OpenAI unlawfully scraped and used its journalistic archive to train large language models. OpenAI’s defence follows a familiar pattern. It argues that it relied only on publicly available material, that its systems process facts rather than protected expression, and that ANI’s domain has since been blocked from future scraping.
The hearings, however, exposed how difficult it is to apply twentieth-century copyright concepts to twenty-first-century AI systems. Three issues stand out in particular:
- Transient storage. Indian copyright law recognizes limited protection for temporary technical copies created during transmission. While, OpenAI argued that its processing falls within this safe zone, the Publishers and intervenors, including the Digital News Publishers Association, argued that storing vast volumes of content during pre-training is neither temporary nor incidental, but a form of large-scale infringement.
- Facts versus expression. OpenAI maintained that its models merely learn statistical relationships between words, much as humans learn by reading. ANI, by contrast, has pointed to instances of apparent memorization in which copyrighted news content was reproduced almost verbatim, potentially bypassing paywalls and reducing traffic to the original publisher.
- Reputational harm. ANI also emphasized hallucinated outputs falsely attributed to it. This argument broadens the dispute beyond copyright and suggests that flawed or unauthorized data inputs may also affect reputation, credibility, and brand value.
The Policy Turn: Could Compulsory Licensing Be the Answer?
The ANI litigation has highlighted sharply different views on whether machine learning amounts to copyright infringement. That uncertainty has pushed policymakers to consider structural solutions rather than leaving the entire question to case-by-case adjudication. Against this backdrop, the Department for Promotion of Industry and Internal Trade (DPIIT) has reportedly floated an alternative: a compulsory licensing framework for AI training data.
The idea is to avoid two extremes: unrestricted scraping on the one hand, and impractical one-to-one negotiations with every rights holder on the other. A statutory framework could create a regulated middle path. Under such a model, AI developers would receive a legal right to use proprietary datasets for training without prior permission but would be required to pay regulated royalties to the relevant rights holders in return.
The Core Tension: Innovation Versus Creative Autonomy
This proposal has divided stakeholders into two broad camps:
- The innovation-first view. Supporters of broader access argue that without affordable access to local and culturally relevant datasets, Indian startups will struggle to build truly domestic AI systems. If licensing costs are too high, only large multinational players may remain competitive.
- The creator-rights view. Publishers, authors, musicians, and other rights holders respond that compulsory licensing undermines a fundamental aspect of ownership: the right to refuse. For many creators, being compelled to license their works to technologies that may eventually compete with them is not a compromise, but a forced surrender.
Moving Forward: Designing India’s Hybrid Path
India’s copyright framework was designed for a world in which humans used computers as tools, not for one in which generative systems ingest and recombine enormous bodies of creative material. That mismatch is now impossible to ignore. India’s long-term solution is unlikely to lie in simply importing Western models. A blunt compulsory licensing regime could dilute creator control, while an overly rigid prohibition could choke domestic innovation before it matures. A more balanced framework, in our view, may combine mandatory transparency obligations, clear disclosure of training datasets, and a functioning market for voluntary commercial licensing. Such an approach is likely to better align innovation incentives with the legitimate interests of creators.
