User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-02-27-2017

This is an old revision of the document!


Omics Data Standards WG - Minutes 02-27-2017

Summary

  • Recommendations for Hi-C standards were approved by DCIC
  • -ChIA-PET and PLAC-seq protocols drafted, will need to be approved by SC
  • -to find high order interactions, require new methods and discussion
  • -Chia-Pet experiments need to be reviewed. Internal standards for Hi-C spend time to review

RAPID-seq next meeting (presented by Dr. Gilbert’s lab, March 13) and other new technologies in the future (a spreadsheet to schedule)

  • Valuable having data standards for high-order contact
  • -Protocols like ChIP-seq across 4DN
  • -Genome assemblies used for 4DN, especially for some cell lines with chromosomal arrangement (Human resembling will not reflect the cell lines), individual cell-line assembly needed?
  • -Numbers in the standards (e.g. # of reads in some genomic feature) needs to be specified

Assemblies: for human, DCIC is using hg38 (GRCh38) with duplicates removed and is waiting on SC for this decision, some labs use inferior assemblies deliberately to avoid issues with new assemblies (alignment cost, annotation, comparison across different genomes, etc)

Reprocess HiC data sets, Most of the data sets generated are using hg38, including ENCODE, Roadmap, IHEC. Because we want data to be comparable.

Three options for analysis cell lines like GM12878: use hg19; use hg38; use a GM12878-specific assembly, because rearrangements may generate artifacts. The third way may benefit other cell types with an abnormal karyotype (K562, for example).

Common cell lines need include diploid cells, few haploids. Even we generate assemblies for cancer cell lines are unstable

Mouse cells lines that have high heterozygosity that may be used by other research groups. A mm10 together with phasing support would be better. Hybrid mice will generate horrendous biases if aligned to homozygous assemblies in HiC

We need to set time aside to cope with hybrid mice cell lines. Dr Ren's group use specific procedures in analysis of HiC Data with hybrid cell lines to eliminate mapping biases and may present in the future.

In the analysis working group, people do not show up. Groups should merge in a topic like this (how to align, phase the genome) to avoid duplication of work. Set time for a joint meeting to engage data analysis and omics groups.Both Dr Ren’s Lab and Dr. Lieberman-Aiden's Lab can present their methods in this joint meeting.

Integration to the Imaging WG should be part of the discussion in the future.

The infrastructure of data handling has been established and tools are still being reviewed in data analysis group for evaluation and would be probably ready in a month. Data analysis WG has trainees for the tool evaluation as well (to ensure that the tool can be used external people).

Data analysis group has decided on option for normalizing and building (BWA, Cooler), use pair-file as standard data file, filtering and etc. Data analysis WG has compared basic algorithms and found that Cooler performed slightly better in both memory usage and reproducibility, a joint session may need to be scheduled to discuss the results and compare the pros and cons of all tools involved.

Erez: Compared notes for reproducibility and prons and cons

Discussion in previous meeting: mentioned that want to use matrix balancing or other methods of normalization. Tools without filtering the diagonal have similar results but methods filtering out the diagonal like Juicer and Cooler performs better.

Matrix Balancing, not fully addressing biases in Hi-C data, so we have alternative approaches, we should highlighted again. Also matrix balancing needs to make assumptions that may go wrong in some cases. Whole genome normalization might not be needed because most noises are concentrated in the inter-chromosomal space and therefore reads there should carry less weight.

In Dr. Ren’s last year Genome Research paper, when using matrix balancing, biases are not fully eliminated. Enhancers in regions with higher expected interaction frequencies tend to broke the assumption of matrix balancing. In effect, when choosing normalization methods, it is like choosing which part of the genome get thrown out.

What are the metrics we should use to evaluation normalization and filtering methods? One way may be that HiC from different restriction enzymes should have better reproducibility or be more similar, however, that is cheatable. Restriction enzymes will create its own biases (e.g. if 6-cutter are used).

Or take two cell lines related by known perturbation (e.g. deletion) and compare the interaction status between them

One topic suggested by Giancarlo, DNase Hi-C, may be discussed in a future meeting. We may also need to reach out Dr. Dekker’s Lab for Micro-C.

Dr. Alver can share the Hi-C files processed by different normalization methods and therefore people can make an informal comparison between them. Other normalization methods can be sent to Dr. Alver to be added into the pipeline as well.

4dn/phase1/working_groups/omics_data_standards/minutes-02-27-2017.1600883205.txt.gz · Last modified: 2025/04/22 16:21 (external edit)