The World List of Cycads Phylogenomic Backbone

Maximum likelihood phylogeny of 346 cycad species with Ginkgo biloba as outgroup, inferred from 1,409 orthologs (411,345 amino acid sites) using IQ-TREE v3.0.1 with an edge-linked proportional partition model. Branch support was assessed using ultrafast bootstrap (1,000 replicates), gene concordance factors (gCF) from 1,409 gene trees, and site concordance factors (sCF) from quartet sampling. The time-calibrated tree was estimated using penalized likelihood in TreePL v1.0 with 12 fossil calibration constraints; 95% confidence intervals were derived from 100 bootstrap replicates dated under a fixed topology.

Loading phylogeny...

# Methods — Cycad Phylogenomic Timetree

## Taxon Sampling

The phylogeny includes 346 cycad species spanning all 10 extant genera across two families — Cycadaceae (*Cycas*) and Zamiaceae (*Bowenia*, *Ceratozamia*, *Dioon*, *Encephalartos*, *Lepidozamia*, *Macrozamia*, *Microcycas*, *Stangeria*, *Zamia*) — with *Ginkgo biloba* as the outgroup for phylogenetic inference. In recent years, transcriptome resources have become available for most cycad species through a combination of genomic and phylotranscriptomic studies (Liu et al. 2022; Habib et al. 2022, 2023; Lindstrom et al. 2024), enabling taxon sampling that captures deep evolutionary diversity across Cycadales. Protein sequences for *Cycas panzhihuaensis* were obtained from its reference genome; all other taxa are represented by transcriptome assemblies.

## Transcriptome Assembly and Processing

Paired-end Illumina RNA-seq reads for 355 cycad species were downloaded from the NCBI Sequence Read Archive. *De novo* transcriptome assembly was performed with Trinity v2.15.1 (Grabherr et al. 2011), with integrated quality trimming via Trimmomatic. To reduce redundancy, the longest isoform per Trinity gene was retained using a bundled Trinity utility script. Protein-coding regions were then predicted using TransDecoder (Haas et al. 2013), which identifies open reading frames (ORFs) of at least 100 amino acids and applies a machine-learning classifier to distinguish coding from non-coding sequences. Where multiple ORFs were predicted per transcript, the longest protein was selected. After quality filtering and taxonomic curation, 346 cycad species were carried forward into the phylogenomic analysis.

## Ortholog Identification

Orthologous gene families were identified from predicted protein sequences of 347 taxa (346 cycads + *Ginkgo biloba*) using OrthoFinder v2.5 (Emms & Kelly 2019). A relaxed single-copy ortholog strategy was employed: loci were retained if present in at least 75% of cycad species, with no more than 10% of species permitted to carry a second copy. This approach yielded 1,409 loci with an average taxon occupancy of 94.9% — a 12-fold increase over strictly single-copy orthologs (115 loci) — while maintaining high orthology confidence.

## Sequence Alignment and Trimming

Each of the 1,409 orthogroups was independently aligned at the amino acid level using MAFFT v7.526 (Katoh & Standley 2013) with automatic algorithm selection. Poorly aligned and gap-rich regions were removed with trimAl v1.4 (Capella-Gutierrez et al. 2009) using the automated1 heuristic. For loci containing multiple copies per species, a phylogenetically informed pruning step selected the copy with the shortest average distance to all other taxa in the gene tree, preferentially retaining orthologs over paralogs.

## Phylogenetic Inference

A partitioned maximum likelihood analysis was conducted in IQ-TREE v3.0.1 (Minh et al. 2020), treating each of the 1,409 loci as an independent partition with its own best-fit substitution model selected by ModelFinder (Kalyaanamoorthy et al. 2017). The concatenated alignment comprised 411,345 amino acid sites. Branch support was assessed using 1,000 ultrafast bootstrap replicates (UFBoot2; Hoang et al. 2018). The tree was rooted on *Ginkgo biloba* during inference.

## Concordance Factors

Two complementary measures of topological support were calculated in IQ-TREE. Gene concordance factors (gCF) quantify the percentage of 1,409 individual gene trees that recover each branch in the species tree, revealing the extent of gene-tree conflict attributable to incomplete lineage sorting or other processes. Site concordance factors (sCF) quantify the percentage of alignment sites supporting each branch via likelihood-based quartet sampling under a WAG+G4 substitution model. Together with bootstrap values, these metrics provide a multi-layered assessment of phylogenetic support.

## Divergence Time Estimation

Divergence times were estimated using TreePL v1.0, a penalized likelihood method that relaxes the strict molecular clock while accounting for rate heterogeneity across lineages. *Ginkgo biloba* was pruned prior to dating, as its extreme divergence from cycads destabilises rate smoothing. Twelve fossil calibrations were applied as minimum and maximum age constraints, spanning from the Cycadales crown node (291–359 Ma; Coiro et al. 2023) to genus-level divergences informed by recent phylotranscriptomic studies (Habib et al. 2022, 2023, 2025; Lindstrom et al. 2024; Gutierrez-Ortega et al. 2024). The optimal smoothing parameter was determined by cross-validation.

## Bootstrap Confidence Intervals

Uncertainty in divergence time estimates was quantified through a gene-wise bootstrap resampling approach. In each of 100 replicates, the 1,409 loci were resampled with replacement, branch lengths were re-optimised on the fixed ML topology using IQ-TREE, and the resulting trees were dated with TreePL under the same calibration scheme. Node ages from all replicates were compiled and 95% confidence intervals computed as the 2.5th and 97.5th percentiles. This procedure captures uncertainty from both branch-length estimation and molecular dating, yielding confidence intervals for all 345 internal nodes (mean CI width: 3.97 Ma).

---

**Software citations:** Capella-Gutierrez et al. (2009) *Bioinformatics* 25:1972; Coiro et al. (2023); Emms & Kelly (2019) *Genome Biol.* 20:238; Grabherr et al. (2011) *Nat. Biotechnol.* 29:644; Gutierrez-Ortega et al. (2024); Haas et al. (2013) *Nat. Protoc.* 8:1494; Habib et al. (2022, 2023, 2025); Hoang et al. (2018) *Mol. Biol. Evol.* 35:518; Kalyaanamoorthy et al. (2017) *Nat. Methods* 14:587; Katoh & Standley (2013) *Mol. Biol. Evol.* 30:772; Lindstrom et al. (2024); Minh et al. (2020) *Mol. Biol. Evol.* 37:1530.