Steve Huffman, the CEO of Reddit, has stated that large language models (LLMs) as we know them today would not exist without the data from his platform.
Scientists confirm: This is the most effective way to get your cat’s attention, according to new research
Elderly Couple Refuses Reserved Seats—Viral Train Standoff Sparks Fiery Debate on Courtesy
The assertion was made during Fast Company’s Most Innovative Companies Summit. Huffman did not mince words, suggesting that Reddit is one of the most significant training sources for LLMs. He referenced data from Profound, indicating that Reddit is the most cited site across all models. This claim supports Reddit’s business strategy and holds a challenging truth to dispute.
The Modern Oil Theory
Huffman used a compelling phrase to describe the dependence of models on his data, stating “There is no artificial intelligence without real intelligence”. He believes that LLMs primarily regurgitate on a large scale what they have previously consumed.
A significant portion of this data comes from human conversations on Reddit. The site’s thematic coverage is extensive, with virtually every topic on the planet discussed by real people. This corpus is indispensable for AI laboratories.
Contracts on One Hand, Lawsuits on the Other
In 2024, Reddit signed two licensing agreements with Google and OpenAI. Huffman presents these as the cornerstone of the company’s business strategy and mentions he is open for business for new partnerships. Details of other ongoing deals were not disclosed. For companies that have not negotiated, Reddit has turned to legal measures.
Why You Should Never Reheat These Foods in the Microwave – The Hidden Dangers Experts Warn About
I tried the top 5 guard dogs—here’s what makes these breeds the ultimate protectors
Anthropic is being sued in the Superior Court of California for unauthorized use. Perplexity and three scraping companies are facing a federal lawsuit in New York, primarily for DMCA violations. Huffman clearly delineates the line: companies that agree to collaborate receive negotiated terms, while those who take without asking end up in court.
An Acknowledged Paradox
Huffman admits a contradiction in his stance. Reddit feeds the AI of others while developing its own. Reddit Answers, the platform’s conversational search engine, only responds with actual user quotations.
The idea is to provide multiple human perspectives rather than a synthesized algorithmic output. AI also plays a role in moderation, particularly in detecting harassment.
The Issue with Posts Written by ChatGPT
The topic that irritates Huffman the most involves users composing their messages with ChatGPT before posting them on Reddit. He does not intend to create a special policy for this. The community already takes care of it, quickly identifying content that seems to be generated by AI.
Huffman prefers not to replace this social filtering with an automated detection system.
Similar Posts
- Reddit Aims to Transform into a Search Engine: Could It Challenge Google’s Dominance?
- University Bots Trick Reddit: Platform Claps Back with Identity Checks
- Reddit Sues AI Firm Anthropic: Claims Illegal Data Use in Explosive Legal Battle
- Study Reveals: 86% of AI Quotes Depend on Brand-Controlled Channels
- Master AI for Business: Top 5 Courses to Boost Your Skills!

Samantha Klein is a seasoned tech journalist with a sharp focus on Apple and mobile ecosystems. With over a decade of experience, she brings insightful commentary and deep technical understanding to the fast-evolving world of consumer technology.