==== Omics Data Standards WG - Minutes 11-13-2017 ==== |~~TABLE_CELL_WRAP_START~~ ===== Agenda ===== - DNase-HiC protocol (Vijay Ramani from Jay Shendure lab) - Discussion - Single cell combinatorial indexing Hi-C protocol (sciHi-C) \__ Slides will be available at the link on 4DN wiki > Omics Working Group page. There will be a 30-day period when people can comment on the protocol and Vijay would be able to address those comments. ===== DNase-HiC protocol (by Vijay Ramani from Shendure lab) ===== ==== Metadata Considerations ==== Metadata of both DNase-HiC or sciHi-C should be the same as required by DCIC sample submission standards, however, in sciHi-C, formaldehyde concentration during library preparation should be higher. For all protocols adhering to SOP there would be no need for extra metadata, other protocols can provide additional metadata in a PDF file. For replicates, there are two ways of providing metadata: 1) submit replicates together in one batch; 2) group different submissions together. The standard of replicates from DCIC can be seen at [[https://www.google.com/url?q=https://data.4dnucleome.org/help/getting-started#notes-on-experiments-and-replicate-sets&sa=D&sntz=1&usg=AFQjCNED8Aq1D70Pq0fBTe8LZfCYx2M1iA|https://data.4dnucleome.org/help/getting-started#notes-on-experiments-and-replicate-sets]] ==== Quality Control ==== Breakdown of ligation types 3x enrichment for cis reads >= 10kb inferred distance 1.5x enrichment for cis reads vs. interchromosomal ligation Detailed QC metrics will be released later Number of PE reads: ~85M unique valid pairs. * This number was chosen because previous several samples generated 85M~200M valid pairs. * This is not a required number but the status of the completed experiments. * Depending on the type of features desired, the requirement for number of reads may differ. * Should we expect some parallel to Hi-C as to how many reads will be needed to call which type of features? * From empirical data, it appears that the number of reads required would be the same across technologies for the same type of broad features (Gürkan). ==== Analysis ==== DCIC will contact Shendure lab to facilitate data analysis of DNase-HiC data within DCIC. DCIC does not require any wet lab level QC yet, however, there are some metrics for datasets that will be generated. ===== SciHi-C (by Vijay Ramani from Shendure lab) ===== ==== Metadata ==== Replicates are defined differently because every single cell is an independent observation. Therefore, replicates are defined as post-dilution plates. Multiple libraries were generated to sequence the sample to saturation. ==== Quality Control ==== Barcodes should be identical so mismatched reads will be discarded. ~50% of all 250-bp reads has barcodes that can be ascertained. Other reads can be used for bulk-level analyses. Mouse and human cells are mixed to ensure cell purity. * Bimodel distribution of reads expected * Location of the interactions ==== Submission file format ==== * Raw FASTQ (paired) * Filtered FASTQ (paired) with matched barcodes and cellular index information * Mapped BAM, PAIRIX files DCIC would need to coordinate with other single cell groups to determine on the formats. ==== DNase Hi-C and single cell Hi-C ==== It appears that DNase Hi-C has not provided advantage to single cell Hi-C experiments. One of the reasons may be due to the short fragment length of DNase Hi-C has interfered with single cell Hi-C. ~~TABLE_CELL_WRAP_STOP~~|