Diabetic nephropathy

End-to-end differential expression from a single question

One natural-language ask, an orchestrated bioinformatics pipeline (ontology resolution → sample discovery → differential expression → enrichment), and a method comparison that surfaced biology a simpler analysis would have missed.

Key finding

An end-to-end DE workflow — disease ontology resolution, ARCHS4 sample classification, statistical testing, enrichment — executed from one chat. Comparing pooled and study-matched modes revealed an interferon signaling signal (OAS2, RSAD2) that the simpler pooled analysis would have missed.

Graphs and tools

MONDO disease resolution, ARCHS4 metadata, differential-expression tooling, and enrichment analysis.

WOBD contribution

Turns disease selection, sample discovery, statistics, and interpretation into a single inspectable workflow.

Audit trail

Disease identifiers, sample counts, studies, parameters, and method choices are carried through the answer.

The question

Run a differential expression analysis of diabetic nephropathy using the public ARCHS4 bulk RNA-seq archive. Find disease and control samples, compute DE, and interpret the results.

Without an OKN-backed MCP server, this is days of manual work for a bioinformatics postdoc: resolve the disease term to a MONDO concept, download ARCHS4 metadata, classify thousands of samples as test or control, write the statistical pipeline, decide between pooled and per-study designs, and run enrichment. The MCP server does it in one chat — and runs both methods so the user sees what the simpler analysis would have missed.

One question, orchestrated pipeline

Natural-language query

Run a pooled differential expression analysis of diabetic nephropathy using ARCHS4. Then try the study-matched meta-analysis.

MCP server orchestrates four tools

  • resolve_disease_ontology — MONDO term lookup
  • find_samples — ARCHS4 metadata + LLM sample classification
  • differential_expression — pooled and study-matched meta-analysis
  • enrichment_analysis — g:Profiler over GO, KEGG, Reactome

Integrated answer

Five immediate-early response transcription factors strongly downregulated; study-matched mode additionally reveals two upregulated interferon-stimulated genes (OAS2, RSAD2) that pooling missed — pointing at two coordinated biological processes in DN kidney.

174
DN samples discovered
13
ARCHS4 studies
7
significant genes
2
DE methods compared

What the chat looks like

Excerpt from a conversation that produced these findings:

Run a differential expression analysis of diabetic nephropathy using ARCHS4 and try both pooled and study-matched modes. Walk me through the steps.

Resolving disease term, querying ARCHS4 metadata, running DE…

Workflow complete. Two modes — two stories.

  • Ontology: “diabetic nephropathy” → MONDO:0005016 (exact match).
  • Samples: 174 disease samples across 13 ARCHS4 studies; 555 kidney controls; 9 studies with matched controls.
  • Pooled DE (100 vs. 53 samples, 10 studies): five immediate-early transcription factors strongly downregulated — FOS, FOSB, EGR1, NR4A1, DUSP1.
  • Study-matched DE (71 vs. 27 samples, 3 studies; controls for batch via Stouffer's Z): same five — plus two upregulated interferon-stimulated genes, OAS2 and RSAD2.
  • Enrichment (g:Profiler): PPAR signaling, MAPK pathway, NGF-stimulated transcription.

Two coordinated processes in DN kidney: suppressed immediate-early response plus activated interferon signaling. The pooled analysis would have missed the interferon signal entirely.

Illustrative excerpt; phrasing is paraphrased and lightly edited from the upstream analysis. Live results may vary as graphs are updated.

What this query unlocks

  • End-to-end automation. Disease ontology resolution, sample discovery and per-study LLM classification, statistical testing, and enrichment all run in a single chat session — the kind of workflow that previously demanded a bioinformatics postdoc with R/Bioconductor experience.
  • Method comparison by default. The MCP runs both pooled (fast, cross-study) and study-matched meta-analysis (controls for batch effects per study), making the methodology choice transparent and surfacing signal that one method alone would miss.
  • Reproducibility built in. Every parameter — FDR threshold, fold-change cutoff, sample IDs, contributing studies, statistical test — is recorded in the answer; rerunning the analysis is a copy-paste, not a re-derivation.
  • Biology, not just statistics. The combined picture — suppressed immediate-early gene response plus activated interferon signaling — matches established disease biology, anchored to specific GEO accessions the user can drill into.

Why this matters — research productivity

Differential expression is the single most common bioinformatics analysis in biomedical research. Every disease produces its own variant, every variant takes weeks of postdoc time to assemble, and the methodological choices (pooled vs. matched, FDR cutoffs, sample inclusion) are buried in supplementary methods that rarely make it back into the next study. An MCP server that runs the workflow end-to-end — with both methods, with classification logged, with parameters preserved — turns the analysis itself into a portable, comparable artifact instead of an undocumented one-off.

Diabetic nephropathy is one disease in ARCHS4's ~1M-sample archive. The same tool stack runs an analogous analysis for any condition with samples in the archive, and the underlying pattern — ontology resolution, automated sample classification, two-method DE, enrichment — transfers to other transcriptomic archives as they are integrated. The compounding investment is in the workflow scaffolding, not the per-disease scripting.

Read the full analysis

Sample discovery, pooled and study-matched DE results, enrichment, and the pooled-vs-matched comparison.