I have been thinking a lot lately about the increasing importance of the “public data layer” — meaning, data that we will need (“we” applied broadly, meaning the general public, NGOs, government, scientists, journalists) to make sense of what’s going on in and increasingly busy, but increasingly quantifiable world.
First, some of the drivers here. In general, there is more data being generated than ever before, so much of which has a bearing on “public” issues. A few of the specific drivers include:
- Increasing role of “platforms” in regulated spaces (transportation, health, finance, education, etc) — these are enormous generators of data with direct and indirect bearing on public issues.
- Sensors & IoT (publicly and privately owned) — same as above.
- Abundance of media — as we have seen with the recent US election, the rise of social & independent media is democratizing but also problematic.
- Personal health data — the cost of gene sequencing is dropping like a rock, which will lead to an explosion of health data. This data will provide personal value but can also provide enormous societal value.
Why this will be important? Because all of these data have the potential to increase collective intelligence and societal knowledge. And more specifically, we have the potential to redesign the way we make policy and handle regulation given these inputs. If we do this right, we can get smarter at policymaking, and design regulatory systems that have both greater effectiveness and lower costs of implementation and compliance.
So, what infrastructure will we need to handle and process all of this public data? This seems to be forming into a few broad categories:
- Data pooling & analysis platforms — tools and APIs that make sense of these data — generic/foundational tools like Composable Analytics and Stae, and more specific, vertically-oriented projects & tools, like OpenTraffic and Aerostate.
- “Regulation 2.0” platforms — specifically designed to facilitate a data-driven policymaking and regulatory process — for example, MeWe, Airmap, SeamlessGov.
- Foundational and application-layer blockchains — on the pure tech side, this is the most interesting area of development. Blockchains give us both public data access and data integrity in a way that’s not been possible before. Much of the focus is still on “foundational” blockchains like Bitcoin, Ethereum, Tezos and Zcash, but eventually this technology will reach the application layer and we’ll have more explicitly “public” applications. I also expect that Blockchains and Regulation 2.0 platforms will get ever closer and ultimately merge.
- Mechanisms for identifying and amplifying truth — this is a tough, but important one. We have two problems, in parallel: First, how to we discern truth from untruth? And second: how do we give truth the attention it needs to “win”? The big platforms like Facebook are experimenting with this now, and we’ll likely see more tools and services that help with this.
That’s the vision — where it seems clear that we are heading, and where we need to head. So, the more important question is, how will we actually get there? A bunch of questions/thoughts on my mind are:
- Broad vs narrow? Strikes me that we will see the most traction in narrow applications first — the thin edge of the wedge, that solves a concrete problem. Also, the “personal data layer” hasn’t arrived in one broad platform either.
- Open standards + distribution magnets: dating back to my work around open transit data, a key learning was that open standards need distribution magnets. The thing that got transit agencies to publish data in the open GTFS format was Google Maps.
- Portal access vs. real access — the natural tendency of data owners is to offer access via siloes and portals (e.g., Uber Movement). This is something, but’s not the real thing — the more important question is how to get actual data moving.
- Government isn’t the only audience: public data is of course useful for policymaking and regulation, but it’s equally important for scientific research and journalism. These areas could end up being the initial leaders.