Terpene biosynthesis

Designing a microbial host for terpene biomanufacturing

A walk-through of how the unified MCP server supports a plant and microbial terpene engineering question. The assistant pulls candidate enzymes from pathway and expression graphs, then leans on the NDE/WOBD metadata layer to surface the experimental datasets that make engineering decisions defensible.

Key finding

20+ datasets surfaced spanning seven plant species and three microbial hosts, with a candidate enzyme panel anchored across pathway and expression evidence — usable starting material for a synthetic biology team within minutes.

The scenario

A team of synthetic biologists is designing a microbial host for sustainable biomanufacturing of a high-value terpene. They want to identify promising genetic parts (enzymes from plant and microbial systems) and the experimental datasets — RNA-seq, proteomics, fermentation studies — that would support those choices.

How the assistant approached it

The proto-OKN stack does not have a single “terpene pathway graph.” The effective workflow is a federation of three layers:

  1. Pathway and gene-part discoveryprokn, gene-expression-atlas-okn, spoke-genelab
  2. Experimental metadata discovery nde, the WOBD metadata layer, with dataset records under okn.wobd.org/dataset/...
  3. Cross-graph integration — joins between gene-expression-atlas-okn and spoke-genelab on NCBI_Gene, GeneSymbol, and UBERON

Findings

The terpene ontology space is well covered

Search expansion grounds in terpenoid biosynthetic process (GO:0016114) and isoprenoid biosynthetic process (GO:0008299). Descendants include monoterpenoid, diterpenoid, sesquiterpenoid, triterpenoid, carotenoid, hopanoid, paclitaxel, and menthol biosynthetic processes — enough scaffolding to scope queries by terpene class.

Plant and microbial coverage in the expression graphs

Taxon coverage relevant to terpene engineering:

Taxongene-expression-atlas-oknspoke-genelab
human167143 / 1530
mouse1323104 / 4898
Arabidopsis thaliana63840 / 3356
Saccharomyces cerevisiae541 / 26

spoke-genelab counts are studies / assays.

Canonical pathway genes are present

gene-expression-atlas-okn returns the precursor-supply panel: DXS, DXR, HDR, HMG1/HMG2/HMGR, IDI1/IDI2, ERG20/FDPS, GGPS1, PSY. spoke-genelab adds Arabidopsis-specific hits including PSY and LCYB. Not enough on its own to reconstruct a full specialized-terpene branch, but enough to anchor candidate-part searches around precursor-supply modules.

Plant datasets surfaced via NDE

  • GSE102404Artemisia argyi transcriptome naming HMGR, MVD, DXS, DXR, HDS, HDR
  • GSE175645, GSE28539, GSE121523, GSE121831Taxus taxoid / paclitaxel biosynthesis (incl. female-specific MYB-bHLH regulation)
  • GSE103181Crocus sativus apocarotenoid biosynthesis: 41 pathway genes + 5 TF hubs
  • GSE120135, GSE96954 — maize / Isodon diterpenoid defense pathways (kauralexin, kaurene synthase-like)
  • GSE243419 — single-cell RNA-seq of cotton secretory glandular cells
  • GSE109299 / GSE109303 / GSE288025 / GSE287659 — rice diterpenoid phytoalexin biosynthesis

Microbial / host-engineering datasets

  • GSE102672 — IPP toxicity in isoprenoid-producing E. coli; RNA-seq + proteomics through onset and recovery, with PMK reduction implicated as a recovery mechanism
  • GSE84255 — balancing IspG / IspH to reduce toxic HMBPP accumulation in E. coli
  • GSE29267, GSE30403 — FPP toxicity in E. coli (LB and M9), showing rescue by channeling FPP into product
  • GSE34665 — D-limonene response in S. cerevisiae(monoterpene tolerance)
  • GSE225783 — taxadiene-producing S. cerevisiae evolved for oxidative robustness
  • GSE10712Aspergillus nidulans response to farnesol (fungal isoprenoid-alcohol stress)

A starting candidate panel

Combining the discovery layers gives a first-pass plant and microbial terpene engineering panel organized in three modules:

Core precursor-supply module

HMGR / HMG1 / HMG2, MVD, DXS, DXR, HDS, HDR, IDI1 / IDI2, FDPS / ERG20, GGPS1

Product-branch module (pick by target class)

  • Taxoid / taxane: Taxus datasets (GSE175645, GSE28539, GSE121523, GSE121831) plus the taxadiene yeast adaptation set (GSE225783)
  • Defense diterpene: maize and Isodon (GSE120135, GSE96954) for kaurene/kauralexin scaffolds and recruited P450 chemistry
  • Carotenoid / apocarotenoid: Crocus, cotton single cell, Arabidopsis (GSE103181, GSE243419), driving PSY/LCYB and downstream regulators

Host-hardening module

  • PMK balancing from IPP toxicity (GSE102672)
  • IspG / IspH balancing from HMBPP accumulation (GSE84255)
  • FPP-to-product sink logic (GSE29267, GSE30403)
  • Monoterpene tolerance programs (GSE34665)
  • Oxidative robustness in taxadiene yeast (GSE225783)

Suggested ranking

A practical priority score combines pathway centrality (precursor-supply outranks peripheral responders), plant and microbial evidence (Arabidopsis, Taxus, Artemisia, maize, yeast, E. coli), dataset richness (multi-omics, perturbation, time course over static single-condition), and engineering transferability (microbial toxicity / tolerance gets extra weight because it de-risks host design).

Bottom line

The proto-OKN environment supports the scenario, but the most effective execution splits the work across layers: ontology terms define the terpene biology space, the expression KGs confirm plant and microbial coverage and candidate-gene presence, and the NDE/WOBD metadata layer carries the plant and microbial datasets that make engineering decisions defensible. The natural next step is a ranked candidate table with graph-specific query templates for one terpene class — taxanes, carotenoid/apocarotenoids, or defense diterpenes.