User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-07-24-2017

This is an old revision of the document!


Omics Data Standards WG - Minutes 07-24-2017

Agenda:

  • Discussion an updated ChIA-PET protocol and guideline (Yijun Ruan)
  • Nomination of additional OMICS protocols to be discussed in the OMICS WKG.
  • Update on HI-C normalization test (Burak Alver)
  • If time permits, continue to discuss compartments.

Discussion an updated ChIA-PET protocol and guideline

Updated ChIA-PET protocol has been forwarded to WG members in email.

QC steps of ChIA-PET

  • During ChIA-PET library construction, several QC steps about DNA fragment size were added.
  • There are 5 QC steps, (Li et al Nature Protocols 2017)
  • The size requirements (for QC1, chromatin fragments should be around 3kb) may be different from the ones used by HiC since ChIA-PET uses sonication and HiC uses restriction enzymes.
  • All the steps where QCs are needed is described in the updated protocol.

About the number of QCs needed in all the intermediate steps and QC data included in data submission

  • ENCODE does not require all intermediate QC results to be included, however, such intermediate QCs will be good for the technicians in the labs to make sure everything is running correctly.
  • DCIC may need to provide fields and procedures to store and evaluate such intermediate QC data, which may be important to further data analysis. If there is a number to indicate whether QC passes it may be easier to implement (other than storing the whole size distribution curve).
  • Benefit of including such QC data may be increase in confidence of the data submitted but there are costs (manpower, etc) involved as well.

Antibody coating: coat the beads with nonspecific antibodies to cover all the spaces left.

While tagmentation will generate smaller tags while streptavidin pull-down should filter the tags so there won’t be lots of extra tags coming from the longer fragments being further fragmented.

Two steps sequencing

  • 1st, \__MiSeq (2 million reads) combine to 4 libraries (to make ~10 million reads in total),
  • the second step is doen by HiSeq 2500 or 4000 to generate ~200 million reads
  • Reads are done in paired-end 150bp so the average read length will be ~100bp. Previously shorter ones are used and when 100bp ones are used, better inference can be made.
    • There may be arguments for a shorter read length in HiC. Because longer reads that are rescued may not be very helpful.

The data processed pipeline can generate data for three types of features: binding peaks, interaction clusters and singletons

  • The binding peaks works similar to ChIP-seq and currently ENCODE pipeline for peak calling is used
  • For PET cluster, reads are extended by 500bp / 1000bp to reflect the fragment lengths and there is a custom scheme to call clusters where more reliable contacts are detected.

Signal and noise in PET clusters

  • Proximity ligation between PET clusters is probably the major noise and the question is where it take place.
  • In the protocol two different tags are used in different aliquots and when ligation is done homo-dimers and hetero-dimers (noises) are measured.
  • Hetero-dimers will be the noise background.
  • While there are two peaks in homo-dimer distribution, the ones with closer interaction length appears to be the signal while the one with the farther interaction length appears to be noises (similar to distribution of hetero-dimers).

Interspecies experiments that include fly and human data: based on the sequence data, the mix or hybrid reflect interchromatin complex ligations. Higher dilution is less the noise. In this particular an 1:1 ratio was prepared.

How uniquely maps are human or fly? How many reads can be random ligations? If this is 1:1, then 50% noise rate will be expected from purely random ligations. Random ligation from another species then should be used as quality control metrics in this experiment. However, mixing 1:1 control species with sample may cost too much and currently Dr. Ruan’s lab is exploring 10:1 mixture of human sample with fly controls.

Encode-4 data processing pipeline:

Data generated> data mapping> pet file was ready for downstream analysis, then PET data can be fed into Juicebox for visualization, be classified into inter-ligation PETs, get through peak-calling process or samtools pile-up analysis.

Intra PETs are more likely to have lower coverage and inter PETs are more likely to have higher coverage

All the mappable PETs and self-/inter-ligation PETs are unbinned data and can be available to DCIC.

The experiment and data analysis protocols can be approved separately and when discussing data analysis protocols it is possible to relate to HiC analysis (file format, mapping algorithm, conversion to Juicer format, etc.)

Data Standards group should also release the standard data format soon. Since Dr. Ruan’s group is the primary contributor, DCIC may need to discuss with him to incorporate much of what’s in the workflow.

Most of the ChIA-PET pipeline is already using common methods from the other pipelines (mapping, peak calling and others).

Conversion between the contact file format from Dr. Ruan’s group and DCIC (the one readable by Juicebox and Cooler) may needs to be done.

ChIA-PET data visualization:

SNP-based validation of CTCF binding and looping

Allele specific analysis is very important for the 4DN. This topic need to bring up soon to discuss.

4dn/phase1/working_groups/omics_data_standards/minutes-07-24-2017.1553551611.txt.gz · Last modified: 2025/04/22 16:21 (external edit)