Research infrastructure for queryable, AI-ready biomedical data.
WOBD is supported by the U.S. National Science Foundation under award #2535091 and is part of the Proto-OKN federation.
The value proposition
Biomedical data resources are often funded, curated, and queried separately. WOBD makes those investments work together by giving datasets and knowledge graphs a shared query plane that both humans and AI assistants can use.
Why continued support compounds
WOBD is not a single-purpose application. Each graph, repository, identifier bridge, and metadata extension expands the set of questions that can be asked across the entire federation without rebuilding a bespoke integration.
Growth plan
The next stage of WOBD should focus on durable infrastructure: more sources, richer metadata, stronger cross-graph identity, and evaluation that keeps AI-assisted discovery inspectable.
Broader metadata ingestion
Extend beyond the current NDE-centered layer into additional domain-specific and generalist repositories, especially clinical, environmental, and multi-omic resources that users otherwise search separately.
Richer dataset descriptions
Capture sample-level annotation, study provenance, assay context, contrasts, and analysis-ready relationships so WOBD can return more actionable answers without pushing users back into every source portal.
Cross-graph identifiers and evaluation
Strengthen mappings across genes, diseases, chemicals, organisms, datasets, and publications, then benchmark recurring workflows so AI-mediated answers remain auditable and reproducible as the federation grows.
What support unlocks
Current user-facing surfaces
Guided query templates
Researchers fill in terms for dataset discovery, drug-related datasets, and gene-expression questions. WOBD generates validated graph queries and returns table or dataset-card results with query traces.
Unified MCP server
AI assistants can discover relevant graphs, inspect schemas, bridge identifiers, run SPARQL, and synthesize answers across the wider Proto-OKN federation from one conversation.
Team
Scripps Research
- Trish Whetzel
- Ben Good
- Andrew Su
- Ginger Tsueng
RENCI
- Chris Bizon
- Jim Balhoff
- Yaphet Kebede
UCSD / UCSF
- Peter Rose
Technical foundation
The primary structured dataset metadata layer comes from the NIAID Data Ecosystem Discovery Portal (NDE), which harmonizes dataset records from domain-specific and generalist repositories. Metadata harvested from that pipeline is published as the NDE graph and loaded alongside other graphs in the federation.
WOBD also uses data from the EMBL-EBI Gene Expression Atlas (GXA), emitting study metadata, contrasts, genes, and pathway enrichment as linked data so expression evidence can be queried with dataset metadata and other knowledge graphs.
Knowledge graphs are listed in the OKN Registry, and the OKN Fabric exposes those graphs through a SPARQL federation so they can be queried individually or together.
Discuss WOBD growth or collaboration
For programmatic questions, collaboration opportunities, or support discussions, contact the WOBD team.
Contact the team