This is an old revision of the document!
Omics Year 1 Report by Bing Ren Achievements Common experiment protocol Data file formats \ Goals for Year 2 * Data standards for Hi-C and ChIA-PET * Quality metric for chromatin organization features \ Main challenges and obstacles for OMICS group Gold standards for chromatin features Definition for “reference 4D genome” (probably a reference sweep list for terms and etc. rather than a detailed “reference genome” \ Build consensus in common terminology \ Data standards for Hi-C: Needed when external experimental groups submit their own Hi-C data The data standards should reflect what types of feature the datasets are trying to resolve because different features will require vastly different levels of resolution (low for compartment/domains but high for loops) Do we need to enforce a minimum sequencing depth for Hi-C data? ENCODE has a requirement of 20M reads for phase II and 40M for phase III to ensure the number of binding sites (peaks) does not appear limited. Datasets may be divided into two (or more) categories by resolution. Data will have a large amount of heterogeneity Standard libraries can provide a good “sanity test” for newly generated data for better aggregation analysis Categorize quality control libraries Do groups do their internal controls individually or mandate a QC standard for all datasets \ Determine the minimum numbers of reads required for datasets * Distinct reads appears to be also very important because it is possible to have lots of reads but a very low molecular complexity, leading to waste of reads * Reproducibility analysis * However, saturation analysis by Erez group showed that there may be less benefit once the number of unique reads reaches certain level. (~2B-3B contacts, i.e. read pairs in the library assuming a high quality library) * Inter-chromosomal and intra-chromosomal data may need to be separately considered because the underlying biological process. * Define a minimal standard \with the minimal read counts, read depths and quality control of the libraries. Early standard pushout may be more beneficial (can be raised later on) so that people don’t have to regenerate data once a standard is implemented in the future for the “reference genome”. |