• About Us
  • Contact Us
  • Advertise
  • Privacy Policy
  • Terms and Conditions
Wednesday, March 18, 2026
  • Login
  • Register
StartupSuperb
  • NewsLatest
    • Trending
    • International Insights
    • Reports
  • Funding FlowJust In
  • Artificial Intelligence
  • Tech
  • Marketing
  • Resources
    • Books
  • Shark Tank
    • Shark Tank India
  • Startup Stories
    • Founder Fridays
    • Superb Shepreneurs
No Result
View All Result
  • NewsLatest
    • Trending
    • International Insights
    • Reports
  • Funding FlowJust In
  • Artificial Intelligence
  • Tech
  • Marketing
  • Resources
    • Books
  • Shark Tank
    • Shark Tank India
  • Startup Stories
    • Founder Fridays
    • Superb Shepreneurs
No Result
View All Result
StartupSuperb
No Result
View All Result
  • News
  • Funding Flow
  • Artificial Intelligence
  • Tech
  • Marketing
  • Insights
  • Resources
  • Shark Tank
  • Startup Stories
  • Social Superb
ADVERTISEMENT
Home Tech

OpenAI’s GPT-4o Model Trained Using Subscription-Based Data Sources, According to Latest Report

Akash Das by Akash Das
April 3, 2025
in Tech
Reading Time: 7 mins read
0
A A
0
OpenAI’s GPT-4o Model Trained Using Subscription-Based Data Sources, According to Latest Report
ADVERTISEMENT
Share on LinkedInShare on FacebookShare on X.comSend on TelegramSend on WhatsApp



OpenAI Data Practices and Allegations Regarding GPT-4o


Highlights

  • 1 OpenAI Data Practices and Allegations Regarding GPT-4o
    • 1.1 Allegations of Unauthorised Training Data Use
    • 1.2 A Broader Industry Issue

OpenAI Data Practices and Allegations Regarding GPT-4o

OpenAI is facing scrutiny once again due to claims regarding its data practices. Recent allegations indicate that the company may have trained its latest model, GPT-4o, on copyrighted and paywalled content without the necessary permissions. These claims have emerged from the AI Disclosures Project, a non-profit organisation focused on monitoring AI activities, established in 2024 by media leader Tim O’Reilly and economist Ilan Strauss.

Allegations of Unauthorised Training Data Use

The AI Disclosures Project has released a study that raises significant concerns by asserting that OpenAI’s GPT-4o model shows a considerable recognition of copyrighted books from O’Reilly Media, despite a lack of a licensing agreement between OpenAI and the publisher. The report indicates that GPT-4o exhibits markedly greater recognition of paywalled O’Reilly book content compared to previous models such as GPT-3.5 Turbo.

The research employed a technique referred to as “membership inference attack” or DE-COP, designed to ascertain if the model could accurately differentiate between texts authored by humans and those generated through AI paraphrasing. If an AI model can effectively tell the difference, it suggests that it may have access to the original text, indicating that such content was likely included in its training data. The study analysed 13,962 paragraph excerpts from 34 O’Reilly publications, concluding that GPT-4o identified more paywalled content than GPT-3.5 Turbo, achieving an AUROC score of 82%, whereas the latter only surpassed 50%.

In spite of the striking results, the co-authors, including AI researcher Sruly Rosenblat, conceded that there could be limitations in their approach. They noted the possibility that users might have copied and pasted paywalled segments into ChatGPT, potentially leading to indirect inclusion of the content. Additionally, the study did not assess OpenAI’s latest models, such as GPT-4.5 and the reasoning models o3-mini and o1, which leaves uncertainties regarding whether these models have similar data sets.

A Broader Industry Issue

The findings of this report contribute to the mounting legal challenges faced by OpenAI, as the company confronts several lawsuits alleging copyright infringement and misappropriation of data. OpenAI and other prominent AI organisations have been pushing for more lenient regulations on the use of copyrighted materials for model training, contending that these practices should qualify under fair use. It is noteworthy that OpenAI has already established licensing agreements with news publishers, social media platforms, and stock media libraries to secure necessary data. The company has also recruited journalists to refine the outputs of its models.

The AI Disclosures Project underscores a larger systemic issue which could affect the quality and diversity of online content. The study contends that utilizing copyrighted data without remuneration could hinder income for professional content creators, thereby threatening the variety of available content on the internet. The project calls for enhanced accountability and transparency in the training processes of AI firms and advocates for policies ensuring that content creators are compensated when their data is utilised.

While OpenAI remains steadfast in defending its data practices, the findings from the AI Disclosures Project have undoubtedly escalated the discourse surrounding copyright and data ethics in the swiftly evolving AI sector. As legal disputes persist, the challenge of reconciling innovation with intellectual property rights remains an open question.


ADVERTISEMENT
Tags: AI
ShareShareTweetShareSend
ADVERTISEMENT
Akash Das

Akash Das

Hi, I’m Akash, an entrepreneur, tech enthusiast, digital marketer, and content creator on a mission to inspire innovation and drive transformation through technology and creativity.My expertise extends to digital marketing, where I craft data-driven strategies for SEO, social media, and branding to empower businesses and creators to grow their online presence. Alongside my entrepreneurial journey, I share my insights and discoveries through engaging blogs, tutorials, and YouTube content.

Related Posts

Poco X8 Pro Series Debuts in India with Powerful MediaTek Dimensity Chips Starting at ₹32,999

Poco X8 Pro Series Debuts in India with Powerful MediaTek Dimensity Chips Starting at ₹32,999

March 17, 2026
2
Ultimate Showdown: Apple AirPods Max 2 vs Sony WH-1000XM6 – Which Premium Headphones Deserve Your Investment?

Ultimate Showdown: Apple AirPods Max 2 vs Sony WH-1000XM6 – Which Premium Headphones Deserve Your Investment?

March 17, 2026
3
OpenAI Seeks Partnerships with Private Equity Firms to Amplify Enterprise AI Initiatives

OpenAI Seeks Partnerships with Private Equity Firms to Amplify Enterprise AI Initiatives

March 17, 2026
0
OpenAI Set to Pivot Towards Coding Solutions and Enterprise Opportunities: New Insights

OpenAI Set to Pivot Towards Coding Solutions and Enterprise Opportunities: New Insights

March 17, 2026
8
Dell Slashes 11,000 Jobs in 2026 as Part of Major Operational Overhaul

Dell Slashes 11,000 Jobs in 2026 as Part of Major Operational Overhaul

March 17, 2026
4
Unpacking Nvidia’s ‘NemoClaw’: A Game-Changer for Companies Developing AI Agents

Unpacking Nvidia’s ‘NemoClaw’: A Game-Changer for Companies Developing AI Agents

March 17, 2026
6

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the Terms & Conditions and Privacy Policy.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

ADVERTISEMENT
StartupSuperb

©️ All rights reserved startupsuperb

Navigate Site

  • About Us
  • Contact Us
  • Advertise
  • Privacy Policy
  • Terms and Conditions

Follow Us

Welcome Back!

Sign In with Google
Sign In with Linked In
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
Sign Up with Linked In
OR

Fill the forms bellow to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • News
    • Exclusive
    • International Insights
    • Reports
  • Funding Flow
  • Artificial Intelligence
  • Tech
  • Marketing
  • Insights
  • Resources
    • Books
  • Shark Tank
    • Shark Tank India
  • Startup Stories
    • Founder Fridays
    • Superb Shepreneurs
  • Social Superb

©️ All rights reserved startupsuperb

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version