Towards data abundance and self-sovereignty

The world's best data is locked away. People shouldn't have to choose between progress and privacy.

We're building a different approach.

Mission

Open Data Labs studies how data is produced, valued, and used in AI systems. We conduct foundational research on data economics while building the technical infrastructure that makes it work at scale.

Through Vana, an open-source protocol we developed, 1M+ users now pool data across networks they control, spanning health, behavioral, and conversational data. We partner with organizations to source, evaluate, and structure datasets from this ecosystem for AI research and development.

Our work links foundational research with real-world systems, establishes frameworks for data as an emerging asset class, and cultivates the networks needed to strengthen the training data ecosystem as it evolves. Better mechanisms create more efficient systems, clear attribution and data licensing, and stronger incentives for knowledge production.

Research Areas

Data Sovereignty

How do individuals maintain ownership of their data while contributing to AI?

We're building portability standards and provenance systems that let data move between systems without losing track of its origin or control.

Privacy

How do we enable data use while preserving user privacy?

We study privacy-preserving training methods, from federated learning to secure computation environments, and their practical trade-offs at scale.

Economics

How should data be valued and exchanged?

We develop measurement methods for data contribution, pricing mechanisms for compositional goods, and market designs that account for data's unique properties: non-rival, high-variance in quality, and compositional.

Datasets Playground

Personal, health, behavioral, financial, biometric data from data cooperatives, verified and directly from users.

Access user-generated datasets.