Towards data abundance and self-sovereignty

The world's best data is locked away. People shouldn't have to choose between progress and privacy.

We're building a different approach.

Mission

Open Data Labs studies how data is produced, valued, and used in AI systems. We conduct foundational research on data economics while building the technical infrastructure that makes it work at scale.

Through Vana, an open-source protocol we developed, 1M+ users now pool data across networks they control, spanning health, behavioral, and conversational data. We partner with organizations to source, evaluate, and structure datasets from this ecosystem for AI research and development.

Our work links foundational research with real-world systems, establishes frameworks for data as an emerging asset class, and cultivates the networks needed to strengthen the training data ecosystem as it evolves. Better mechanisms create more efficient systems, clear attribution and data licensing, and stronger incentives for knowledge production.

The idea is to give everyone a stake in the AI systems that will increasingly shape our society while also unlocking new pools of data to advance the technology.

Scaling this to frontier models would allow the AI industry to leverage vast amounts of decentralized and privacy-sensitive data, for example in health care and finance.

Research Areas

Data Sovereignty

How do individuals maintain ownership of their data while contributing to AI?

We're building portability standards and provenance systems that let data move between systems without losing track of its origin or control.

Privacy

How do we enable data use while preserving user privacy?

We study privacy-preserving training methods, from federated learning to secure computation environments, and their practical trade-offs at scale.

Economics

How should data be valued and exchanged?

We develop measurement methods for data contribution, pricing mechanisms for compositional goods, and market designs that account for data's unique properties: non-rival, high-variance in quality, and compositional.

Research

Open Problems in AI Data Economics

October 30, 2025

In our new paper, we introduce data economics as a coherent field and define open problems that have not yet been formalized. Most AI economics research focuses on downstream effects like productivity and labor displacement, not production. We argue that understanding AI's economic impact requires studying how data, compute, and labor interact to create AI systems.

Read on...

Model Influence Functions: Measuring Data Quality

August 1, 2024

As AI plays a larger economic role in society, a critical question emerges: who should own AI? Recent controversies, such as the Youtube creators realizing their videos had been used to train leading AI video models, highlight the urgent need for data ownership and transparency in AI.

Read on...

VIEW ALL POSTS

Datasets Playground

Personal, health, behavioral, financial, biometric data from data cooperatives, verified and directly from users.

Access user-generated datasets.

Explore playground