Teaching AI together: a plain guide to federated learning
Most powerful AI models today are trained on enormous clusters of specialized chips, housed in a handful of data centers, owned by a handful of companies. The cost runs into the tens or hundreds of millions of dollars. That price tag decides who gets to build foundational AI — and, just as importantly, who gets to own it.
There is another way to gather computing power, and it is hiding in plain sight: the billions of laptops, desktops, and phones that sit idle for most of the day. The research question Lupus is built around is simple to state and hard to answer — can many small, ordinary devices, working together, train a model that normally needs a supercomputer?
What is federated learning?
Federated learning is a way of training a shared model across many separate devices without gathering everyone's data into one place. Instead of sending raw data to a central server, each device trains on what it has locally and sends back only what it learned — a small mathematical summary of its progress. A coordinator combines these summaries into an improved shared model, then sends it back out. The cycle repeats.
Think of it like a study group where everyone reads a different chapter at home, then meets to pool their notes. No one has to hand over their book; they just share the insights. The group's collective understanding improves with each meeting.
Learn locally
Each device improves the model a little, using only its own slice of work.
Share the lesson
Devices send back a compact summary of their progress — not raw data.
Combine & repeat
A coordinator merges the lessons into a better shared model, then sends it out again.
How the research has progressed
Federated learning began as a way to train modest models across phones while keeping personal data private. Over the past decade, a series of advances has pushed the idea toward something far more ambitious: training large models across slow, scattered, and unreliable connections. A few milestones tell the story.
- 2016 — The starting point Researchers show a shared model can be trained across many phones by averaging their progress, without ever centralizing the data.
- 2017–2019 — Talking less A wave of work tackles the biggest obstacle: devices have to communicate constantly, and that's slow. New methods let devices do far more work on their own between check-ins, and compress what they send.
- 2020–2022 — Handling the real world Methods mature for messy conditions — devices that drop offline, have wildly different speeds, or hold very different data. The field learns to keep training stable despite the chaos.
- 2023 — Big models, low bandwidth A breakthrough shows that even large language models can be trained across distant machines while communicating a tiny fraction of what was previously thought necessary — opening the door to training over ordinary internet.
- 2024–2025 — Proof at scale Independent groups successfully train billion-parameter models across machines on different continents, demonstrating that decentralized training of serious models is no longer just theory.
- 2026 → — The Lupus direction Bringing this to everyone: training that runs in a plain browser tab, open to untrusted volunteers, with a fair way to reward the people who contribute their spare computing power.
What makes Lupus different
The research above mostly assumes trusted machines in controlled settings. Opening the doors to the public — anyone, on any device — raises three new questions that define our work:
Zero friction
Contributing should be as easy as opening a web page. No downloads, no setup.
Trust without trusting
When contributors are strangers, the system must confirm that the work submitted is real — without slowing everyone down.
Fair reward
People who lend their computing power earn a stake in what they help create.
System architecture
At a high level, Lupus connects three kinds of participants in a continuous loop. The diagram below shows the shape of the system — the inner workings of each piece are part of our ongoing research.
The loop, step by step
- Tasks go out. The coordinator splits the work and sends a small piece to each volunteer device, along with the current shared model.
- Devices learn. Each device improves the model on its piece, entirely on its own, for a while.
- Lessons come back. Devices return a compact summary of what they learned — never raw data.
- The work is checked. The system confirms contributions are genuine before accepting them.
- Lessons combine. Verified summaries are merged into a better shared model.
- Repeat. The improved model goes back out, and the cycle continues — thousands of times.
Where this is headed
The near-term goal is a working demonstration — a small but real model trained end-to-end by volunteers in their browsers. From there, the path scales up in stages, with each step gated on results, safety review, and community input.
The longer aim is straightforward: if a community can sustain free encyclopedias and open-source software, it can train a foundation model that belongs to all of its contributors. That is the bet Lupus is exploring.