User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-05-22-2017

This is an old revision of the document!


Omics Data Standards WG - Minutes 05-22-2017

05-22-2017 4DN OMICS MINUTES

Next meeting June 12th, 2017

AGENDA:

  • Repli-Seq protocol and data standards (We have a final version of the protocol and standards from Dr. David Gilbert, and will have a vote on this). \ * Update from Burak on mapping results from simulated datasets * Normalization methods for HiC data. Comment and vote on E/L Repli-seq protocols: Experimental protocol and data standard guidelines for Repli-seq. * Burak has suggested to separate the experimental protocol and the data analysis protocol (Step 10) from the documents. The experimental protocol may be approved immediately but the data analysis protocol may be changed to make it more modularized and fall inline with other protocols developed by DCIC. * Specifically modify step 10 from the data analysis protocol. The data analysis pipeline is an older version and is not completed yet. * Thus, the data analysis protocol will be subjected to modifications and will be finalized between the DCIC and David Gilbert Lab. * In summary, OMICS members on call has approved the E/L Repli-seq documents through Step 9 and they will be sent to 4DN SC for final approval and an updated Step 10 will be jointly presented by DCIC and Gilbert Lab and be approved later. Continue discussion on Hi-C mapping issues: Figure 1| Backbone of Hi-C pipeline to be build * Alignment: Current proposal is to use bwa -SP5M as the flag. SP mean single end mode without using paired end rescuing mode. * Single end vs Paired end: Runtime is the same for both SE and PE alignments. However, SE requires additional fixmate step to put the read mates together. And there are 29/1M chimeric reads with different MAPQ. * There was no systematic features detected in the 29 reads that were different. A lot of them have MAPQ equals 0 for some portion of the reads, so it was only 2 reads out of a million that Juicer wouldn't have three differently doing paired end mode. * There are two issues: first is the chimeric reads (one read consisting of two regions). These reads are chimeric from one or other end and Neva observed similar features in terms of how far apart are the single end and paired mode map the reads. * The chimeric reads do not came from the same chromosome. * Wobbly effect: these are errors added by the sequencing instrument. Is this problem handled? \To resolve this problem Burak has suggested to generate a separate steps of alignment, filtering and steps of segregation, then remove duplicate reads or wobbly reads or reads related to restriction enzymes. Once has been agreed on the alignment steps, he wants to open a discussion on how the filtering will be carried out.
  • In summary, a single end mapping vs paired end mapping with SP5 end option and time wise perform equally well.
  • MAPQ 0 alignment can be sometimes useful if you want to get general sense of repeats and what they are doing.
  • If you use a more a stringent analysis, can you get identical results at the single end or paired end? No, even with the most stringent criteria you get differences in the MAPQ.
  • Why these differences? We don’t fully understand the behaviour of bwa -SP5M and there may be other bugs in the software that can create weird results.
  • For chimeric reads it might be possible for bwa to have some heuristic ways to generate MAPQ values. Different random seeds may cause the MAPQ value to differ. This might be interesting to know in case if that is a bug in the software. For example, there was one read that can be mapped to the same place but one is MAPQ 0 and the other is MAPQ 50.
  • Memory behavior: bwa uses more memory for the single end than paired end.

Hi-C analysis, especially normalization:

  • Do we want to apply balancing approaches and/or explicit correction? These will depend on the computation cost and the benefits that are obtained by the study.
  • On the balancing side, juicer and cooler have being compared. Their main differences are their kind of filters that are applied for reads, in terms of their algorithm, a computational comparison has been done, however is not valid anymore because cooler also perform KR and juicer also perform other calculations, file format, reproducibility.
  • What are the objective people need to use after the visualization? The criteria may be inferred from the visualization results to find the features that may become relevant.
  • Resolutions will be from 5kb. We can make the results available to the community and let people see what’s going on.
  • Is easy to convert Hi-C to Cooler, but not the reverse
  • Is important to understand how explosive factors can influence normalization
  • Is also important to compare different methodologies as a group and evaluate the performance of different normalization tools by visualizing on the tools such as Hi-C glass and Juicer. When you look at these complex patterns, what are the criterias that will be using to reach a conclusion? To define these specific criteria….
  • Is important to use the same data sets in different methodologies
  • What resolution is going use? 5KB, which is the highest resolution..
  • Cooler or multi-cooler stores multiple resolution files

Timeline of the evaluation

Soo Lee is leading the efforts of the evaluation and the timeline should be a number of weeks. Burak will confirm in ten days to see if it is premature to present the results.

4dn/phase1/working_groups/omics_data_standards/minutes-05-22-2017.1615215961.txt.gz · Last modified: 2025/04/22 16:21 (external edit)