User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-02-26-2018

This is an old revision of the document!


Omics Data Standards WG - Minutes 02-26-2018

Agenda: Sub working group updates

Allele specific sub working group.

There are three problems that have been discussed within the sub working group:

  1. Phasing and what reference assembly should be use.
  2. How reads should be assigned to different Alleles.
  3. Data Standards of the process mentioned above

Allele assignment is of great interest to many members of the group because many methods have been developed. This wednesday (Feb. 28th, 2018) Bill is leading a comparison for different tools. They are close to agree with the exact datasets to be used and the exact output format to be provided, however there isn’t a clear-cut definition of which method is better and we’ll need to discuss what differences are first. All lab groups are welcome to join this discussion to finalize the details.

There are two different Hi-C datasets: one is from a hybrid mouse line from Bill’s lab (Giancarlo knows better the details of this dataset) and the other is human samples, which has been trying to decide between GM12878 vs H1 for the 4DN joint analysis project or use both of them because one have the parental allele information and the other has readily available files in the GRCh38 for assembly, which might be easy to get started. .

There is no clear gold standard to determine which methods are better yet and the sub working group would start by focusing on allele assignments first.

And in practice, it would be a concern that a conservative standards would be seen as having less sensitivity instead of elevated specificity. But for the hybrid mouse cell line, some standard would be needed in the future.

Bill and Burak will iterate on the dataset and input/output format to be used and would finalize them by Wednesday. Anyone who’s interested is welcome to attend the discussion.

Erez’s group published some Hi-C data several years ago with high read depth that may be used in this sub working group.

Single-cell Hi-C sub working group.

Meeting frequency reduced to monthly from bi-weekly.

There were two presentations during the last conference: how to use bulk Hi-C data as a reference point, and a comparison of different single cell datasets with the same pipeline.

The second presentation concludes that different datasets require different filtering schemes because of the way data are generated. Currently, DCIC has been collecting different single cell data sets from different groups and work with each lab to get a processing pipeline together with the goal of having pipelines with different data sets as close as reasonable but with some variations. For example, in some single cell reads are over-read so many times that it might be optimal to filter the unique reads (which was counter-intuitive but appeared to clear up the signal a lot, and DCIC didn’t know why this worked yet, though).

Is there any consensus emerging in terms of the optimal way to generate \__single cells datasets that could optimal lead to a consensus pipeline?

In terms of experiments, all the experimental protocols from all the groups are quickly evolving and it would be difficult to come up with THE protocol for single-cell Hi-C. The combinatorial indexing scheme is so different from the original Fraser protocol, which generate much different dataset profiles in terms to coverage and cell count. Therefore, there isn’t a movement towards a unified experiment protocol yet.

Other protocols

The best approach to deal with all the other protocols like Pro-seq and DamID?

Erez will resend the list of protocols to schedule discussions about these protocols.

4dn/phase1/working_groups/omics_data_standards/minutes-02-26-2018.1553551597.txt.gz · Last modified: 2025/04/22 16:21 (external edit)