The Public Data Layer

I have been thinking a lot lately about the increasing importance of the “public data layer” — meaning, data that we will need (“we” applied broadly, meaning the general public, NGOs, government, scientists, journalists) to make sense of what’s going on in and increasingly busy, but increasingly quantifiable world.

First, some of the drivers here. In general, there is more data being generated than ever before, so much of which has a bearing on “public” issues.  A few of the specific drivers include:

  • Increasing role of “platforms” in regulated spaces (transportation, health, finance, education, etc) — these are enormous generators of data with direct and indirect bearing on public issues.
  • Sensors & IoT (publicly and privately owned) — same as above.
  • Abundance of media — as we have seen with the recent US election, the rise of social & independent media is democratizing but also problematic.
  • Personal health data — the cost of gene sequencing is dropping like a rock, which will lead to an explosion of health data. This data will provide personal value but can also provide enormous societal value.

Why this will be important?  Because all of these data have the potential to increase collective intelligence and societal knowledge.  And more specifically, we have the potential to redesign the way we make policy and handle regulation given these inputs.  If we do this right, we can get smarter at policymaking, and design regulatory systems that have both greater effectiveness and lower costs of implementation and compliance.

So, what infrastructure will we need to handle and process all of this public data?  This seems to be forming into a few broad categories:

  • Data pooling & analysis platforms — tools and APIs that make sense of these data — generic/foundational tools like Composable Analytics and Stae, and more specific, vertically-oriented projects & tools, like OpenTraffic and Aerostate.
  • “Regulation 2.0” platforms — specifically designed to facilitate a data-driven policymaking and regulatory process — for example, MeWe, Airmap, SeamlessGov.
  • Foundational and application-layer blockchains — on the pure tech side, this is the most interesting area of development.  Blockchains give us both public data access and data integrity in a way that’s not been possible before.  Much of the focus is still on “foundational” blockchains like Bitcoin, Ethereum, Tezos and Zcash, but eventually this technology will reach the application layer and we’ll have more explicitly “public” applications.  I also expect that Blockchains and Regulation 2.0 platforms will get ever closer and ultimately merge.
  • Mechanisms for identifying and amplifying truth — this is a tough, but important one.  We have two problems, in parallel: First, how to we discern truth from untruth?  And second: how do we give truth the attention it needs to “win”?  The big platforms like Facebook are experimenting with this now, and we’ll likely see more tools and services that help with this.

That’s the vision — where it seems clear that we are heading, and where we need to head.  So, the more important question is, how will we actually get there?  A bunch of questions/thoughts on my mind are:

  • Broad vs narrow?  Strikes me that we will see the most traction in narrow applications first — the thin edge of the wedge, that solves a concrete problem.  Also, the “personal data layer” hasn’t arrived in one broad platform either.
  • Open standards + distribution magnets: dating back to my work around open transit data, a key learning was that open standards need distribution magnets.  The thing that got transit agencies to publish data in the open GTFS format was Google Maps.
  • Portal access vs. real access — the natural tendency of data owners is to offer access via siloes and portals (e.g., Uber Movement).  This is something, but’s not the real thing — the more important question is how to get actual data moving.
  • Government isn’t the only audience: public data is of course useful for policymaking and regulation, but it’s equally important for scientific research and journalism.  These areas could end up being the initial leaders.

That’s it for now.  More to come.  For some more context on my thinking here, see Regulating with Data and Alternative Compliance.


  • awaldstein

    Great stuff and thanks.

    One comment.

    Both blockchain and IoT (presuming devices everywhere) are honestly a mess standard wise and in a leadership void strategically from where I sit. And i know this better on the IoT side.

    So yes, I agree with these high level.

    I have concerns that they are built on or dependent on two communities of technology that are honestly very poor at aggregating the opportunity and making decisions that benefit all.

    • yeah I agree that that’s true. Couple of thoughts: 1) historically we’ve been pretty good at aggregating data even when the underlying sources are messy (e.g., google) and 2) it’s possible that we don’t need **all** the data to get something useful. May just be a matter of pinpointing a smaller number of much more targeted pieces of data — so that would reduce this challenge as well. But I agree — and even getting to that problem requires defining the right mechanisms to get data open and accessible in the first place.

      • awaldstein

        good point and i must repeat that i’m a believer in both platforms.

        just don’t want to embrace important progress that is based on the the belief that we need to wait till their post adolescent stage to get there.

        • I think the last part of your comment got cut off?

          • awaldstein

            added above

  • Citizens of all countries, but most certainly the United States right now, must pay close attention to the sources of public data, and the veracity of public data. Many tax dollars are working (appropriately) to generate the common good (e.g., weather data). The last two decades has seen much of this information made publicly accessible, often in formats which make sense to both machines and humans. That must continue…and we need to be able to trust the data.

    Hoping we can share facts to generate discussion, rather than playing games with information and access to that information to slant debate.

  • Hey Nick – Cool observations. Thank You. One area that I draw a lot of parallels from is the banking sector and how they got their “data” act together fairly early on.

    I argue that this is/was largely self-serving because the basic script is that the quicker money-data moved, or in albert wenger speak (reducing the marginal cost of financial data) the more money could be made.

    I riff about how such concepts can be translated to the public services realm here –

    It is interspersed with some personal experience narrative but larger goal is to address what data infrastructures in the public interest may look like based on what already drives global industry and commerce.

    While there may be great market opportunities in all of these spaces – we believe that a non-profit mandate better serves the many bottom lines that you have articulated.

  • Eyal Feder

    Hey Nick – thanks for the super interesting read!
    I think the “Public data layer” (loved the term by the way) is one of the most important discussions of the upcoming years – the first level of which is how to leverage this data to power better decision making.
    But a crucial question that soon follows is whether we are heading towards an open or closed data future, or in other words will data be used to democratize and reimagine public decision making processes or will it be used to increase the gap and control of the public and private organizations owning the data.
    Therefore, I think an important point to discuss is openness of the public data layer. How do we ensure that the data itself used to power decisions is open for review by the public, and also how the decision making processes (meaning algorithms, results) are open for review at least on some level. What should be done in term of regulation and standards to ensure data is open by default and not closed.
    To relate to your last point – this data is important for government, scientific research and journalism, but also for civic society organizations, advocacy and the general public – and it’s openness will ensure it used to enable more co-creation and collaborative governance, while increasing trust between stakeholders.

  • Jonathon Ende

    Thanks for the shout out Nick! We could not agree more that government and regulations are deeply in need of a modern platform to help manage and run their processes.