Illustration of *Drosophila melanogaster*. Embryo illustrations adapted from (Hartenstein 1993).

Embryo illustrations adapted from (Hartenstein 1993).

Abstract

Drosophila melanogaster is a powerful, long-standing model for metazoan development and gene regulation. We profiled chromatin accessibility in almost one million, and gene expression in half a million, nuclei from overlapping windows spanning the entirety of embryogenesis. Leveraging developmental asynchronicity within embryo collections, we applied deep neural networks to infer the age of each nucleus, resulting in continuous, multimodal views of molecular and cellular transitions in absolute time. We identify cell lineages, infer their developmental relationships, and link dynamic changes in enhancer usage, transcription factor (TF) expression and the accessibility of TFs’ cognate motifs. With these data, the dynamics of enhancer usage and gene expression can be explored within and across lineages at the scale of minutes, including for precise transitions like zygotic genome activation.

Raw data download from GEO

The raw sci-ATAC-seq and sci-RNA-seq fastqs generated by this study are all available from GEO organized under the superseries with accession no. GSE190149, with sci-ATAC-seq in the subseries GSE190130 and sci-RNA-seq in the subseries GSE190147.

Genome browser tracks

For visualization of chromatin accessibility as genome browser tracks use the following (link) to add tracks to a UCSC browser. Alternatively click the following (link) to open the UCSC browser with the tracks already added. There's a multiwig container per time window that holds bigwigs cells pseudobulked by leiden-based clustering. Additionally, one can individually visualize each cluster per time window (the putative annotation should be labeled once a track is made visible). Alternatively, raw bigwig tracks are available for direct download (link).

Web app for visualizing sciATAC and sciRNA summary statistics across clusters.

Please try our newly developed web app (link) to interactively visualize RNA and ATAC UMAPs for each time window.

Supplemental tables

All supplementary tables are aggregated in the following excel file with one table per spreadsheet: SupplementaryTables.xlsx.

Additional processed files

We generated many additional intermediate processed files that might be helpful for further analyses:

  1. Full list of identified peaks (link)
  2. ATAC bigwigs split by time window and cluster (link).
  3. ATAC Fragment files split by time window and experiment (link)
  4. Mesoderm HOMER-based motif enrichments (link).
  5. ATAC peak matrices split by experiment and time window (link).
  6. ATAC seurat filtered objects for peak counts, gene activity, and motif activity (link). NNv1 is the best performing neural net model with a MSE loss (used for downstream analyses), NNv2 is the best performing neural net model with the custom loss, and lasso refers to the lasso model. The [model]_age is the model age precition whereas the [model]_shift refers to the error outside the cells collection window.
  7. RNA seurat objects (link). The main seurat object includes all cells that passed QC filters, and includes the time prediction associated with each cell. NNv1 is the best performing neural net model with a MSE loss (used for downstream analyses), NNv2 is the best performing neural net model with the custom loss, and lasso refers to the lasso model. The [model]_age is the model age precition whereas the [model]_shift refers to the error outside the cells collection window. The time split objects are split into cells divided by collection window (windows), or predicted time windows (pred_windows).
  8. Annotation of cells from ATAC data that are likely XX or XY and counts of chrX- and chrY-mapped reads in peaks (link). Note that this is in Rds format.
  9. Preliminary set of time-specific DA peaks based on the inferred XX/XY genotypes (link).
  10. For ATAC, a data frame in Rds format that can be used to link unique cell identifiers to the annotations used in Fig. 3 (link).
  11. For RNA, a data frame in Rds format that can be used to link unique cell identifiers to the annotations used in Fig. 3 (link).
  12. For ATAC, bams of reads mapped to dm6 (link).
  13. For RNA, bams of reads mapped to dm6 with the BDGP6 gene annotations (link).
  14. For RNA-based lineage information used to generate the tree in Fig. 3C, this is the same RNA annotation table as in the supplementary information, but with an added column for lineage. To help decode the lineage annotations, refer to this additional file that translates each lineage into the corresponding germ layer annotation.

Note: Files with the .Rds or .rds suffix can be read in R with: df <- readRDS('some_file.Rds')

Data processing scripts

We have prepared a folder that includes the scripts used to perform these analyses (link). Generally, they're separated by code used for RNA in the RNA folder, and ATAC in the ATAC folder and a folder for the NNLS code. Note that we provide these mostly as a guidelines to see how we performed the analyses and to serve as an example for your own analyses. Most will require the installation of several tools that these scripts depend on and minor edits (e.g., adjusting local paths, folder structure).

Citation

If you use this resource in your research, please cite:

Calderon, Diego+, Ronnie Blecher-Gonen+, Xingfan Huang+, Stefano Secchia+, James Kentro, Riza M. Daza, Beth Martin, Alessandro Dulja, Christoph Schaub, Cole Trapnell, Erica Larschan, Kate M. O’Connor-Giles, Eileen E. M. Furlong*, Jay Shendure*. "The continuum of Drosophila embryonic development at single-cell resolution." Science 377, 620 (2022). DOI: https://doi.org/10.1126/science.abn5800

+These authors contributed equally to this work.
*Corresponding author.

Contact

For questions please contact Diego Calderon ().