Table of Contents

Omics Data Standards WG - Minutes 11-13-2017

Agenda

  1. DNase-HiC protocol (Vijay Ramani from Jay Shendure lab)
  2. Discussion
  3. Single cell combinatorial indexing Hi-C protocol (sciHi-C)

\__

Slides will be available at the link on 4DN wiki > Omics Working Group page. There will be a 30-day period when people can comment on the protocol and Vijay would be able to address those comments.

DNase-HiC protocol (by Vijay Ramani from Shendure lab)

Metadata Considerations

Metadata of both DNase-HiC or sciHi-C should be the same as required by DCIC sample submission standards, however, in sciHi-C, formaldehyde concentration during library preparation should be higher.

For all protocols adhering to SOP there would be no need for extra metadata, other protocols can provide additional metadata in a PDF file.

For replicates, there are two ways of providing metadata: 1) submit replicates together in one batch; 2) group different submissions together. The standard of replicates from DCIC can be seen at https://data.4dnucleome.org/help/getting-started#notes-on-experiments-and-replicate-sets

Quality Control

Breakdown of ligation types

3x enrichment for cis reads >= 10kb inferred distance

1.5x enrichment for cis reads vs. interchromosomal ligation

Detailed QC metrics will be released later

Number of PE reads: ~85M unique valid pairs.

  • This number was chosen because previous several samples generated 85M~200M valid pairs.
  • This is not a required number but the status of the completed experiments.
  • Depending on the type of features desired, the requirement for number of reads may differ.
  • Should we expect some parallel to Hi-C as to how many reads will be needed to call which type of features?
  • From empirical data, it appears that the number of reads required would be the same across technologies for the same type of broad features (Gürkan).

Analysis

DCIC will contact Shendure lab to facilitate data analysis of DNase-HiC data within DCIC.

DCIC does not require any wet lab level QC yet, however, there are some metrics for datasets that will be generated.

SciHi-C (by Vijay Ramani from Shendure lab)

Metadata

Replicates are defined differently because every single cell is an independent observation. Therefore, replicates are defined as post-dilution plates.

Multiple libraries were generated to sequence the sample to saturation.

Quality Control

Barcodes should be identical so mismatched reads will be discarded.

~50% of all 250-bp reads has barcodes that can be ascertained.

Other reads can be used for bulk-level analyses.

Mouse and human cells are mixed to ensure cell purity.

  • Bimodel distribution of reads expected
  • Location of the interactions

Submission file format

  • Raw FASTQ (paired)
  • Filtered FASTQ (paired) with matched barcodes and cellular index information
  • Mapped BAM, PAIRIX files

DCIC would need to coordinate with other single cell groups to determine on the formats.

DNase Hi-C and single cell Hi-C

It appears that DNase Hi-C has not provided advantage to single cell Hi-C experiments. One of the reasons may be due to the short fragment length of DNase Hi-C has interfered with single cell Hi-C.