Blog

The Data That Wasn't There — How CKAN Changed Everything

How a teenager's burning questions about the world's future became the open data infrastructure powering governments on every continent. Story 01 of 20.

Yoana Popova
CKAN, CKAN@20
17 Apr 2026
Share

There is a book called The Collapse of Complex Societies. Written by archaeologist Joseph Tainter in 1988, it was fairly obscure when a curious British teenager stumbled across a secondhand copy in the late 1990s. The book asks a disquieting question: why do great civilizations fall? Rome. The Maya. The Western Han. Over and over, societies that had solved enormous problems — feeding millions, building cities, maintaining order — simply... unraveled.

The teenager who read it was Rufus Pollock. And the book didn't frighten him. It made him want data.

"I was asking these big questions," Rufus told the CKAN community in 2025, nearly three decades later. "How many people can the Earth support? Are we going to run out of fossil fuels? What's the population going to be in 2050?"

He reached for books. Joel Cohen's How Many People Can the Earth Support? Václav Smil's Feeding the World. These were rigorous, data-heavy works — the kind that tabulate civilizational futures in appendices. But when Rufus went looking for the underlying datasets, something strange happened.

They weren't there.

Or rather: they existed, but you couldn't get to them. Industry reports on fossil fuel reserves cost $10,000. $20,000. The population tables in the back of Cohen's book were "kind of typed pages in the old days — tabular tables." Raw data that should have been the foundation of public reasoning was locked behind prices that made it accessible only to corporations and governments. For a teenager trying to think seriously about the future of human civilization, the data was functionally invisible.

That invisible wall — between questions that mattered and the information needed to answer them — is where CKAN begins.

The Open Source Revelation

Fast-forward a few years. Rufus arrives at Cambridge University and, for the first time, has real internet access. A friend introduces him to Linux.

"It just blew my mind," he said. "So amazing that I could download an entire operating system and mess around with it."

But it wasn't just Linux that struck him. It was Debian — and specifically Debian's package management system. In the Debian model, software wasn't just written by individuals and left in isolation. It was curated, packaged, made interoperable, and distributed through a central registry. You didn't have to build everything from scratch. You could say apt-get install postgres and the whole ecosystem would resolve dependencies, verify integrity, and hand you working software.

Rufus had a thought that would quietly reshape global data infrastructure: Wouldn't that be possible for data?

Not just open data as a political principle. Not just transparent government as a moral argument. But data as a managed, discoverable, interoperable ecosystem — the same way Linux had become a managed, discoverable, interoperable ecosystem for software.

"This was the original 'wow.' Open source, incredible, really productive environment for basically collaborating and just breaking problems up. I don't have to build the whole of an operating system. People can build lots of different parts and put them together. Wouldn't that be possible for data?"

— Rufus Pollock, CKAN Monthly Live #33, May 2025

CPAN for Data

One piece of software gave the project its name. CPAN — the Comprehensive Perl Archive Network — was the package registry for the Perl programming language: a central catalog where developers could publish libraries, and anyone in the world could discover and reuse them. It was elegant, practical, and deeply generative. Perl developers didn't just write code. They contributed to a commons.

"CKAN is named CKAN because of CPAN," Rufus said, smiling at the history of it. "Nowadays I don't know how many people have heard of Perl, but once upon a time it was a big deal — bigger even than Python."

The name was deliberately chosen: the Comprehensive Knowledge Archive Network. Where CPAN was a package registry for software, CKAN would be a package registry for knowledge — for datasets, for the raw material of evidence and inquiry.

The first version was not glamorous. It was a wiki. Built in MoinMoin (a Python-based wiki engine that has since faded into the history books), it powered ckan.net — today reborn as DataHub.io. Rufus's rule of thumb, then as now, was never to build from scratch if something already exists. "One of my big rules in life," he said, "is never build something from scratch. You must always use someone else's tool till destruction, till it doesn't work for you anymore."

The wiki gave way to a proper Python application, built on the Pylons framework. And in 2007 — at the Creative Commons International Summit in Dubrovnik, Croatia — CKAN was officially launched.

"Dubrovnik, because it's an amazing city," Rufus recalled, with the tone of someone who had been to many conferences in worse places.

The Timing No One Planned For

Here is what Rufus did not plan: for CKAN to become the world's leading open data platform.

"At the beginning, there was no real desire to build software in the sense of a product," he explained. "It was open source, of course, people could use it. But basically it was to run ckan.net — to run this thing."

Then 2008 happened. Then 2009. Then Barack Obama was elected in the United States on a platform that included, among many things, a commitment to government transparency. In the United Kingdom, a coalition government came to power with similar commitments. Suddenly, governments that had been vaguely interested in open data needed open data infrastructure — immediately, yesterday, now.

"Those governments came along," Rufus said, "and were like: 'We really want this. We know you already have something working.'"

The UK Government launched data.gov.uk in 2009. It ran on CKAN. Canada, Finland, Australia, and dozens more followed — "not quite one a week, but suddenly one a month." In 2013, the United States migrated Data.gov — one of the largest open data portals in the world — to CKAN, consolidating hundreds of thousands of federal datasets onto a single platform. By 2018, more than 1,000 active data portals worldwide were running on CKAN.

None of this was in the original plan. The plan was to solve a teenager's frustrated question about how many people the Earth could support.

But it turned out the teenager's problem was everyone's problem.

What Made It Last

Many software projects surge with a wave and recede when the wave breaks. CKAN didn't. By 2024, after nearly two decades, it remained the world's leading open-source data management system — used not just for government portals but for academic data sharing, humanitarian data workflows, enterprise data governance, and machine learning pipelines.

There are several reasons for this. The plugin architecture introduced in CKAN 2.0 — launched around 2010–2012 — was probably the most important. Rather than building a monolithic system, the core team (including key contributors like Adrià Mercader, Ian Ward, and Steven De Costa) created a framework extensible by anyone. The community built things the core team had never imagined — from visualisation plugins to harvesting connectors to domain-specific metadata schemas.

The other reason is subtler: CKAN solved a problem that never goes away. The need to find, understand, and trust data doesn't expire. As Rufus put it: "You'd never say you've got enough books in your library or enough areas of study to follow everyone's curiosities. We do need more data. And we always need better infrastructure to find it."

In 2019, a new stewardship model was established, with Datopian and Link Digital joining as co-stewards alongside the Open Knowledge Foundation. In 2023, CKAN was recognized as a Digital Public Good by the DPGA — formally acknowledging its role in supporting 9 of the 17 UN Sustainable Development Goals.

The Question That Started It All, Revisited

Rufus Pollock is now in his mid-40s. He runs Datopian, co-leads Life Itself (a research collective focused on systemic change), and still thinks, obsessively, about the relationship between information and human decision-making.

He has come to believe that open data — as transformative as it has been — is not sufficient on its own.

"One of the things was this dream: if we had more data, we had more tools for working with data, we could make knowledge and insight, and from that would come action," he said. "And that just isn't quite so."

Data doesn't automatically translate into understanding. Understanding doesn't automatically translate into action. He cites the sociologist Karl Weick and a 1949 Montana forest fire, where thirteen smoke jumpers were killed partly because they couldn't make sense of what their leader — the only survivor — was doing. The survivor had lit a counter-fire to burn away the grass around him. It was the right call. But in the chaos and terror, his colleagues couldn't update their mental models fast enough to follow him.

"We're always making sense — especially in times of crisis or change," Rufus said. "And our tools need to support that."

This is why CKAN's next chapter is not just about storing data. It's about storytelling. About context. About the narrative layer that sits between a dataset and a decision.

"How, more and more, can we build into CKAN tools for telling data stories, for setting data in context? Not just a dataset, but some context for that dataset, or a narrative behind it, documentation — all of these things that show up maybe in the technical architecture but which relate to sensemaking."

— Rufus Pollock, CKAN Monthly Live #33, May 2025

Twenty Years On

The frustrated teenager who couldn't find the data behind a book about Earth's carrying capacity has, in the intervening twenty years, helped build the infrastructure that makes that data findable for governments, researchers, and citizens on every continent.

But Rufus doesn't tell the story of CKAN as a triumph. He tells it as an ongoing project — one that has made enormous progress and still has enormous distance to travel.

"The dream," he said, near the end of his talk, "was to democratize the power of data — through quality open data and data infrastructure. That's still the dream."

What CPAN did for Perl programmers in the 1990s — creating a commons where knowledge compounds — CKAN set out to do for human knowledge itself. Twenty years in, that project is very much alive.

The data is more findable now. The wall is thinner. But it hasn't come down yet.

That's the work of the next twenty years.

Get Involved

We are publishing 20 stories throughout 2026 — one for each year of CKAN's existence. Do you have a story to tell? A portal you're proud of? A moment where open data made a difference? We want to hear from you.

ckan.org/20-years-of-ckan Subscribe for updates GitHub Community

Rufus Pollock is the original creator of CKAN, founder of the Open Knowledge Foundation, and president of Datopian. This story is the first in CKAN's 20 Stories for 20 Years series, publishing throughout 2026. The next story is coming in March. · His book The Open Revolution is free to read at openrevolution.net.

Someone Built a Sheet Music Directory on CKAN. I Did Not See That Coming.

In Category on 24 Jun 2026

The Most Unexpected CKAN Use Case I've Ever Seen: A Sheet Music Directory With AI Metadata

Wolfgang from Ondics built an open source sheet music catalog on CKAN — with AI metadata generation, YouTube playback, and cross-instance sharing. Here's how.

In Category on 23 Jun 2026

See What's New in the CKAN World: Ecosystem Catalog, HDX Spotlight, New Community Forum — and CKAN Running a Sheet Music Directory

A recap of what the CKAN community covered on June 17, 2026: a live demo of the new CKAN Ecosystem Catalog, a deep-dive into HDX Tabular Data Endpoints, the launch of the new community discussion forum — and, surprise surprise, a very unexpected use of CKAN as a sheet music directory with AI-assisted metadata. Yes, really.

The Data That Wasn't There — How CKAN Changed Everything

The Most Unexpected CKAN Use Case I've Ever Seen: A Sheet Music Directory With AI Metadata

See What's New in the CKAN World: Ecosystem Catalog, HDX Spotlight, New Community Forum — and CKAN Running a Sheet Music Directory

Connect with CKAN