Designing a microbial host for terpene biomanufacturing
A walk-through of how the unified MCP server supports a plant and microbial terpene engineering question. The assistant pulls candidate enzymes from pathway and expression graphs, then leans on the NDE/WOBD metadata layer to surface the experimental datasets that make engineering decisions defensible.
20+ datasets surfaced spanning seven plant species and three microbial hosts, with a candidate enzyme panel anchored across pathway and expression evidence — usable starting material for a synthetic biology team within minutes.
The scenario
A team of synthetic biologists is designing a microbial host for sustainable biomanufacturing of a high-value terpene. They want to identify promising genetic parts (enzymes from plant and microbial systems) and the experimental datasets — RNA-seq, proteomics, fermentation studies — that would support those choices.
How the assistant approached it
The proto-OKN stack does not have a single “terpene pathway graph.” The effective workflow is a federation of three layers:
- Pathway and gene-part discovery —
prokn,gene-expression-atlas-okn,spoke-genelab - Experimental metadata discovery —
nde, the WOBD metadata layer, with dataset records underokn.wobd.org/dataset/... - Cross-graph integration — joins between gene-expression-atlas-okn and spoke-genelab on
NCBI_Gene,GeneSymbol, andUBERON
Findings
The terpene ontology space is well covered
Search expansion grounds in terpenoid biosynthetic process (GO:0016114) and isoprenoid biosynthetic process (GO:0008299). Descendants include monoterpenoid, diterpenoid, sesquiterpenoid, triterpenoid, carotenoid, hopanoid, paclitaxel, and menthol biosynthetic processes — enough scaffolding to scope queries by terpene class.
Plant and microbial coverage in the expression graphs
Taxon coverage relevant to terpene engineering:
| Taxon | gene-expression-atlas-okn | spoke-genelab |
|---|---|---|
| human | 1671 | 43 / 1530 |
| mouse | 1323 | 104 / 4898 |
| Arabidopsis thaliana | 638 | 40 / 3356 |
| Saccharomyces cerevisiae | 54 | 1 / 26 |
spoke-genelab counts are studies / assays.
Canonical pathway genes are present
gene-expression-atlas-okn returns the precursor-supply panel: DXS, DXR, HDR, HMG1/HMG2/HMGR, IDI1/IDI2, ERG20/FDPS, GGPS1, PSY. spoke-genelab adds Arabidopsis-specific hits including PSY and LCYB. Not enough on its own to reconstruct a full specialized-terpene branch, but enough to anchor candidate-part searches around precursor-supply modules.
Plant datasets surfaced via NDE
- GSE102404 — Artemisia argyi transcriptome naming HMGR, MVD, DXS, DXR, HDS, HDR
- GSE175645, GSE28539, GSE121523, GSE121831 — Taxus taxoid / paclitaxel biosynthesis (incl. female-specific MYB-bHLH regulation)
- GSE103181 — Crocus sativus apocarotenoid biosynthesis: 41 pathway genes + 5 TF hubs
- GSE120135, GSE96954 — maize / Isodon diterpenoid defense pathways (kauralexin, kaurene synthase-like)
- GSE243419 — single-cell RNA-seq of cotton secretory glandular cells
- GSE109299 / GSE109303 / GSE288025 / GSE287659 — rice diterpenoid phytoalexin biosynthesis
Microbial / host-engineering datasets
- GSE102672 — IPP toxicity in isoprenoid-producing E. coli; RNA-seq + proteomics through onset and recovery, with PMK reduction implicated as a recovery mechanism
- GSE84255 — balancing IspG / IspH to reduce toxic HMBPP accumulation in E. coli
- GSE29267, GSE30403 — FPP toxicity in E. coli (LB and M9), showing rescue by channeling FPP into product
- GSE34665 — D-limonene response in S. cerevisiae(monoterpene tolerance)
- GSE225783 — taxadiene-producing S. cerevisiae evolved for oxidative robustness
- GSE10712 — Aspergillus nidulans response to farnesol (fungal isoprenoid-alcohol stress)
A starting candidate panel
Combining the discovery layers gives a first-pass plant and microbial terpene engineering panel organized in three modules:
Core precursor-supply module
HMGR / HMG1 / HMG2, MVD, DXS, DXR, HDS, HDR, IDI1 / IDI2, FDPS / ERG20, GGPS1
Product-branch module (pick by target class)
- Taxoid / taxane: Taxus datasets (GSE175645, GSE28539, GSE121523, GSE121831) plus the taxadiene yeast adaptation set (GSE225783)
- Defense diterpene: maize and Isodon (GSE120135, GSE96954) for kaurene/kauralexin scaffolds and recruited P450 chemistry
- Carotenoid / apocarotenoid: Crocus, cotton single cell, Arabidopsis (GSE103181, GSE243419), driving PSY/LCYB and downstream regulators
Host-hardening module
- PMK balancing from IPP toxicity (GSE102672)
- IspG / IspH balancing from HMBPP accumulation (GSE84255)
- FPP-to-product sink logic (GSE29267, GSE30403)
- Monoterpene tolerance programs (GSE34665)
- Oxidative robustness in taxadiene yeast (GSE225783)
Suggested ranking
A practical priority score combines pathway centrality (precursor-supply outranks peripheral responders), plant and microbial evidence (Arabidopsis, Taxus, Artemisia, maize, yeast, E. coli), dataset richness (multi-omics, perturbation, time course over static single-condition), and engineering transferability (microbial toxicity / tolerance gets extra weight because it de-risks host design).
Bottom line
The proto-OKN environment supports the scenario, but the most effective execution splits the work across layers: ontology terms define the terpene biology space, the expression KGs confirm plant and microbial coverage and candidate-gene presence, and the NDE/WOBD metadata layer carries the plant and microbial datasets that make engineering decisions defensible. The natural next step is a ranked candidate table with graph-specific query templates for one terpene class — taxanes, carotenoid/apocarotenoids, or defense diterpenes.