Association and sensorimotor gene module curation
Independently generated spatiotemporal human brain exon microarray data and RNA-seq datasets were obtained from ref. 26 (accessed through https://hbatlas.org/) and ref. 28 (accessed through http://development.psychencode.org/#) and the developing macaque brain RNA-seq data across brain regions were obtained from ref. 29 (accessed through http://www.evolution.psychencode.org/#). The estimated correspondence between macaque developmental timepoints and equivalent human developmental periods has been reported previously29. We split association areas representative of pericentral programs into two groups: Af, which included the OFC, MFC, DFC/dlPFC and VFC; and At, which included the posterior STC and ITC. Sensorimotor groups representative of central programs contained M1C, S1C, A1C and V1C. Areas dissected during fetal development are depicted in Fig. 1a. When defining our GMs, we used human samples from period 3 to period 7 and macaque samples from predicted period 5 to predicted period 7 with at least two biological replicates for each predicted period (the earliest macaque sample corresponded to approximately human period 4 and there were no biological replicates). For each of the three sequencing datasets (human microarray26, human RNA-seq 28 and macaque RNA-seq 29), we conducted differential expression analysis using all samples across ten regions for each period. For RNA-seq differential expression analysis, we retained genes with sufficiently large counts using the filterByExpr function from the edgeR80 package and conducted trimmed mean of M values normalization using the normalizeCounts function from the tweeDEseq81 package. We then applied RNentropy82 to identify genes differentially expressed among the above ten neocortical regions in each developmental period. The resulting DEGs from RNentropy were further selected using criteria adapted from a previous study41 in each period. In brief, a gene to be considered as an upregulated DEG in a certain A/S subgroup, (1) there is at least one region in that subgroup where the gene is significantly upregulated; (2) the gene is not upregulated in any region of the opposing (S/A) groups; (3) the gene is underexpressed in at least 30% of the areas in the opposing (S/A) group; and (4) the gene is not in a module gene list of any of the opposing (S/A) subgroups. For exon microarray data in each period, to conduct an equivalent analysis to the RNA-seq data, we used limma83 to compare each cortical region within a subgroup independently as treatment to all other cortical regions of the opposing group as replicates of control for each period. DEGs were selected with FDR < 0.05 and absolute foldchange (FC) > 1 (the same FC threshold used in RNentropy)84. We then defined the overexpressed DEGs in a certain S/A subgroup in a certain period following the same criteria as we described for the RNA-seq data. Lastly, we merged the genes across all prenatal periods of each gene module within each dataset independently as the final module gene list respectively. To identify shared genes, we found the overlap of each concatenated module gene lists among three datasets.
To show the module gene expression in heat maps, we averaged the gene expression across genes in each module for each sample and took the median of the averaged expression values among samples from a given region and period. To generate continuous expression trajectories for each cortical region across developmental periods (log 2 [time (days)]), we applied locally weighted scatterplot smoothing (LOESS). For each region, the median of the sample-level mean log 2 -transformed expression values were modelled as a function of developmental period using the loess() function in R (stats package). The fitted LOESS models were then used to predict smoothed expression values at 1,000 equally spaced points spanning the observed developmental time range for each region.
Aggregated module PCA and Euclidean distance plots
PCA for plots in Figs. 1d and 2a and Extended Data Figs. 1d,h and 9c was performed using the prcomp function in R, using the expression matrix of either the combined shared sensorimotor and association GMs or combined SEMA7A and PLXNC1 expression. Ellipses for each region were centred on the mean of the points within the region, with their axes sized according to the s.d. values on each component. Euclidean distances between different peripheral and central groups were calculated based on the full module gene list for each developmental period. Smooth average lines were generated with a 95% confidence interval.
Gene set enrichment analysis for GMs
Gene set enrichment analysis (GSEA) of each GM was conducted using ClusterProfiler85 based on reference databases including GO, Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome and WikiPathways.
Human diffusion MRI analysis
Diffusion MRI (dMRI) data were acquired from ex vivo human fetal brains at PCW13, PCW15 and PCW17 (refs. 86,87,88). Cortical masks were generated using ITK-Snap (v.4.0.2)89. Whole-brain tractography was performed using Diffusion Toolkit with a deterministic tracking algorithm, reconstructing fibre pathways based on dMRI data90. To isolate corticocortical connections, only streamlines with both end points located within the manually delineated cerebral wall mask were retained. For region of interest (ROI)-based analysis, tracts were further filtered to include only those with at least one end point in a given cortical ROI. Surface ROIs were converted to volumetric space and slightly dilated to compensate for partial volume effects and minor misalignments between the segmentation and underlying diffusion data. Streamlines were visually inspected and cleaned to remove short and anatomically incorrect fibres. Tractograms were visualized in DSI Studio with brains aligned to a consistent orientation for cross-subject comparison91. To complement this, whole-brain corticocortical tracts were filtered by length using the anterior–posterior extent (L) of each brain as a reference. Thresholds of 0.25L and 0.50L were applied, in addition to unfiltered tracts (0L), to highlight the developmental progression of long-range association pathways. Tractograms were colour coded by principal diffusion orientation (red, left–right; green, anterior–posterior; blue, superior–interior) and visualized using TrackVis.
Individual gene module PCA
... continue reading