User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-10-24-2016

This is an old revision of the document!


Omics Data Standards WG - Minutes 09-26-2016

Omics Year 1 Report by Bing Ren

Achievements

Common experiment protocol

Data file formats

\ Goals for Year 2 * Data standards for Hi-C and ChIA-PET * Quality metric for chromatin organization features \

Main challenges and obstacles for OMICS group

Gold standards for chromatin features

Definition for “reference 4D genome” (probably a reference sweep list for terms and etc. rather than a detailed “reference genome”

\ Build consensus in common terminology \

Data standards for Hi-C:

Needed when external experimental groups submit their own Hi-C data

The data standards should reflect what types of feature the datasets are trying to resolve because different features will require vastly different levels of resolution (low for compartment/domains but high for loops)

Do we need to enforce a minimum sequencing depth for Hi-C data? ENCODE has a requirement of 20M reads for phase II and 40M for phase III to ensure the number of binding sites (peaks) does not appear limited.

Datasets may be divided into two (or more) categories by resolution.

Data will have a large amount of heterogeneity

Standard libraries can provide a good “sanity test” for newly generated data for better aggregation analysis

Categorize quality control libraries

Do groups do their internal controls individually or mandate a QC standard for all datasets

\ Determine the minimum numbers of reads required for datasets * Distinct reads appears to be also very important because it is possible to have lots of reads but a very low molecular complexity, leading to waste of reads * Reproducibility analysis * However, saturation analysis by Erez group showed that there may be less benefit once the number of unique reads reaches certain level. (~2B-3B contacts, i.e. read pairs in the library assuming a high quality library) * Inter-chromosomal and intra-chromosomal data may need to be separately considered because the underlying biological process. * Define a minimal standard \with the minimal read counts, read depths and quality control of the libraries.

Early standard pushout may be more beneficial (can be raised later on) so that people don’t have to regenerate data once a standard is implemented in the future for the “reference genome”.

4dn/phase1/working_groups/omics_data_standards/minutes-10-24-2016.1550249998.txt.gz · Last modified: 2025/04/22 16:21 (external edit)