User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-09-26-2016

This is an old revision of the document!


Omics Data Standards WG - Minutes 09-26-2016

Omics Year 1 Report by Bing Ren

Achievements

  • Common experiment protocol
  • Data file formats

Goals for Year 2

  • Data standards for Hi-C and ChIA-PET
  • Quality metric for chromatin organization features

Main challenges and obstacles for OMICS group

  • Gold standards for chromatin features
  • Definition for “reference 4D genome” (probably a reference sweep list for terms and etc. rather than a detailed “reference genome”

Build consensus in common terminology

Data standards for Hi-C:

  • Needed when external experimental groups submit their own Hi-C data
  • The data standards should reflect what types of feature the datasets are trying to resolve because different features will require vastly different levels of resolution (low for compartment/domains but high for loops)
    • Do we need to enforce a minimum sequencing depth for Hi-C data? ENCODE has a requirement of 20M reads for phase II and 40M for phase III to ensure the number of binding sites (peaks) does not appear limited.
    • Datasets may be divided into two (or more) categories by resolution.
  • Data will have a large amount of heterogeneity
  • Standard libraries can provide a good “sanity test” for newly generated data for better aggregation analysis
  • Categorize quality control libraries
    • Do groups do their internal controls individually or mandate a QC standard for all datasets

Determine the minimum numbers of reads required for datasets

  • Distinct reads appears to be also very important because it is possible to have lots of reads but a very low molecular complexity, leading to waste of reads
  • Reproducibility analysis
  • However, saturation analysis by Erez group showed that there may be less benefit once the number of unique reads reaches certain level. (~2B-3B contacts, i.e. read pairs in the library assuming a high quality library)
  • Inter-chromosomal and intra-chromosomal data may need to be separately considered because the underlying biological process.
  • Define a minimal standard with the minimal read counts, read depths and quality control of the libraries.
  • Early standard pushout may be more beneficial (can be raised later on) so that people don’t have to regenerate data once a standard is implemented in the future for the “reference genome”.
4dn/phase1/working_groups/omics_data_standards/minutes-09-26-2016.1600883225.txt.gz · Last modified: 2025/04/22 16:21 (external edit)