User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-04-09-2018

Omics Data Standards WG - Minutes 04-09-2018

Agenda

  1. Discussion and approval of Single Cell Hi-C protocol (attached), presented by Dr. Peter Fraser a month ago.
  2. Update from sub-working group - PLAC-seq/HiChIP \(Miao Yu). Different library prep will not affect data quality. - Update from sub-working group - single cell Hi-C (Burak Alver). - Update from sub-working group - allele specific Hi-C (Burak Alver). - Other businesses for future OMICS sessions? ===== Vote for motion to approve the single cell protocol ===== The protocol has been approved verbally within the working group and will be submitted to the SC. ===== PLAC-Seq/HiChIP sub-working group update (Miao Yu) ===== Sub Working group is working in two things: ==== Differences between PLAC-Seq and HiChIP ==== The majority of differences appears to be in library prep. No parallel comparison was conducted before. Therefore, the sub working group is planning to perform both on the same cell line to compare the QC metrics for each and to see whether there are any differences. The different library preps are not expected to affect data quality and/or biological insight but experiments need to be done to confirm this. The current protocol only includes PLAC-Seq library prep methods (normal end-repair ligation method) and will include HiChIP methods (TN5 based) in the near future. ==== Data processing ==== The sub-working group is working with Ming Hu’s Lab and are working on peak-calling methods. Currently the sub WG is trying to work on several mouse cell lines (F123 mouse ES cells and mouse tissue). The data looked promising and the sub WG is compiling the full pipeline. ==== Questions and responses ==== Comparison between PLAC-Seq/HiChIP and ChIA-PET. From the data analysis part, should ChIA-PET and PLAC-Seq be treated as different experiment but generating same type of data? ChIA-PET is mainly focused between the two protein binding sites and PLAC-Seq can borrow the peak-calling method from that when its focus is on two protein binding sites as well. However, if PLAC-Seq is used for other types of interactions, such as promoter-enhancer interaction, where one end may not have H3K4me3 markers, new analysis methods will be needed to handle these. From the data processing stand point, should ChIA-PET and PLAC-Seq be treated as different experiment but generating same type of data? Dr. Ren explained his opinion that fundamentally PLAC-Seq/HiChIP and ChIA-PET are different technologies that the order between the ligation part and the ChIP part is switched. Therefore, they would need distinct workflows. For example, in ChIA-PET random collision between different particles will be removed by using a control between two different species, which is not needed in PLAC-Seq/HiChIP. Therefore, different pipelines and QC procedures will be needed. That being said, there are a lot that can be shared between the data analysis approaches as both are fundamentally a reduced representation of HiC and would have lots of things in common. We don’t want to completely mix them but there is also a lot of interchangeability. Currently we will probably want treat them as separate data sets. How much of what have been learned in ChIA-PET can be applied to PLAC-Seq/HiChIP? For example, can MANGO be used to process PLAC-Seq/HiChIP data? Mango is generally useful for ChIA-PET data and can be adapted for PLAC-Seq/HiChIP, but the key distinction is that some of the assumptions of CHIA-PET does not apply to HiChip data. For example, CHIA-PET assumes the resulting interactions are within the same protein mark because of the experiment protocol (pulling down first), therefore, filtering is incorporated in MANGO to filter out the interaction whose two ends are not both bound by the protein. However, HiChip/PLAC-Seq is able to discover interactions based on spatial proximity without both ends bearing the protein mark, for example, promoter-centered interaction, which would be filtered by MANGO (the enhancer side would not be bound by the mark). Thus, a distinct workflow is needed for HiChip/PLAC-Seq. Some known approaches for ChIA-PET may be modified to address these concerns. Dr Ren and Dr Hu’s labs are currently working on using two statistical models to address different pairs. ==== About MANGO and data processing pipelines ==== Dan Capruso from Yijun Ruan’s lab commented that they are trying to use the ChIA-PET pipeline on both ChIA-PET and HiChIP datasets, the results looked very comparable with the main difference being the efficiencies (number of cells needed). The ChIA-PET pipeline doesn't remove loops but will mark loops by how many anchors of them (two, one, or no) overlaps the peaks. There would be differences between binding-coverage bedGraph between the different technologies. Dr Ren commented that for the two cases (one anchor with the mark or two anchors with the mark), different statistical models will be needed because their experience showed that for two anchors case the number of reads have a higher chance to be higher. Dr Ruan’s lab’s current protocol has separated loop-calling process from peak-callers. Each loop will be called independently. They are working on hosting the pipeline on the 4DN Portal and ENCODE Portal, and would post them on Github sometime this year. They would like to share any processed data outputs from the pipeline. Dr Ren will be in touch with Dr Ruan’s lab for comparison between their pipelines. A meeting will be scheduled for this. ===== Single cell sub-working group update (Burak Alver) ===== Single cell working group is on hold, the WG is trying to get different labs uploading their data onto the Data Portal. At the moment there is nothing to discuss. After getting the data, DCIC will try running existing HiC pipelines on those. ==== Questions and responses ==== DCIC is collecting from Peter Fraser’s and Jay Shendure’s lab at the starting point. They have data published in the public domain. Can that be used right now? It would be ideal if DCIC work with both labs to get the metadata in shape. DCIC is currently working with Jay’s lab on sci-HiC data. But the data is a little bit tricky to process and they are working on the processing now. They have also contacted with Leonid’s group for a single-cell data sets and the progress is underway. Metadata are really needed at the current stage. What’s the timeline to submit new single cell data for submission? Sci-HiC data for Joint Analysis WG has been submitted and there are no other single-cell data needed by any 4DN joint efforts. Therefore, the urgency is lower, and there is not major push. If data are being generated with sci-HiC, is it reasonable for us to consider hewing relatively closely to the current metadata standard? Absolutely. The expectation is that shouldn't be challenges in the process and there is not sense of urgency because the data has been published. For sci-HiC metadata. ===== Allele sub-working group update (Burak Alver) ===== The allele assignment/phasing sub working group letting different groups to use different protocols for allele assignment/phasing for the group to compare. These groups are submitting the results to DCIC and will present on the differences. Although without a gold standard for allele assignments it will be difficult, the differences will guide the working group through the different choices for the pipeline. One of the more important point is how important are indels in this assignments to complete this assessment. Just considering variances may be computationally easier but for some mouse strains that are evolutionary distant, indels might be important. ==== Questions and responses ==== What currently is the closest thing to the allele sub WG for a gold standard? There doesn’t appears to be a current \gold standard. Now most are trying to see where they stand but in the future when trying to determine a “best” method, some kind of gold standards will need to be established.

Other businesses for future OMICS sessions

Still need to cover microC and additional tools, GRO-Seq and such. GAM data may also need to be scheduled and the WG can reach out to different investigators to schedule their technologies. In short term we can schedule the comparison between PLAC-Seq/HiChIP and ChIA-PET.

The group for microC has requested more time before and we can reach out to Job to talk about this. We can also look at the Joint Analysis Project to see what kinds of data is being used and give them priorities.

Burak showed the most up-to-date information about what kind of data are being submitted to the Joint Analysis Project. The page is at https://data.4dnucleome.org/joint-analysis. Currently there is no other data types already submitted to Joint Analysis Project that the OMICS group will need to prioritize.

4dn/phase1/working_groups/omics_data_standards/minutes-04-09-2018.txt · Last modified: 2025/04/22 16:21 (external edit)