==== Omics Data Standards WG - Minutes 10-23-2017 ==== |~~TABLE_CELL_WRAP_START~~ DCIC would like to prioritize the finalization a few standards including single cell Hi-C, allele specific Hi-C and other technologies. * Single cell Hi-C * We would need to get people who work on those technologies to agree upon the standard so that when people are submitting the data the standards can be followed. * There are a variety of flavors for single cell Hi-C so a standard for shared metadata, QC metrics or others would be very helpful. * Forming a separate sub-workgroup for those technologies? * People involving single cell Hi-C and data analysis group members can form a separate working group and report back to OMICS WG. Calls can be arranged more frequently and in parallel with the other sub-groups. This works well for PLAC-Seq for now. * Single cell Hi-C may not come up with a single protocol, however, certain expectations and metadata may be determined. * A sub-working group will be formed for single cell Hi-C to discuss the format and metadata. * Allele specific data format * People have discussed about the amount of data that need to be reported in allele specific Hi-C datasets * However, some technical details has not been decided (maternal/paternal) * Each single read can be assigned maternal/paternal/ambiguous * How ambiguous on one end of the pair only will be treated has not be determined * The protocol needs to be specified in several terms * Mapping tool used * Mapping summary of the allelic reads * It’s important to distinguish the OMICS standard and how people report the results. * We may need to know what kind of question we would like to address with the technology before jumping to a specific protocol from DCIC. * There are already standard pipelines (Juicer for example) that can handle some of the problems. * The standard would apply to datasets with hybrid mouse / cell lines other than Hi-C so necessary metadata and allelic information would be needed * We can have presentations about how the data analysis can be performed, the parental information analyzed and diploid information extracted. * There are lots of inaccuracies in such data sets that people didn’t know yet. * Standard of “inbreeding” is not even defined, so that different “inbred” mice may have different homozygosity. * How many data sets would be sequenced at a sufficient depth to address allelic specific questions? * For example, Bing’s lab discovered that the error rate for haplotypes derived from phasing old Hi-C data sets in H1 are about 2%, but can be further reduced to less than 0.2% if better technologies (10x data, longer reads, etc) are adapted. Do we use the newer data sets and discard the old ones? * For other cell lines, there may not be enough data sets to perform such allelic analysis * It’s going to be difficult to advance the standard of phasing. Even with hybrid crosses, there are issues of how inbred the parents are and others. * 10x data is becoming less expensive to do and would benefit the community if it can be done on the F1 cells. * For GM12878 cells, there are issues coming from using the “platinum” reference genomes when phasing the data set. * There are also cases when there is no “golden standard” for loci. * Allelic analysis would be a primarily data analysis and presentation problem. Therefore, data analysis working group may be in charge while OMICS WG can play a supporting role. * The processing of allele specific information will be done at DCIC. * The data standard needs to be worked out quickly so that DCIC can give recommendations (format, mappers and procedures). Benchmarking will be needed against some ground truth data sets to provide some basis, though. * Data analysis WG ought to lead this effort and use some data sets from, for example, GM12878, as ground truth to do benchmarking and recommend based on the results. * A sub-working group will also form to discuss about data formats and mapping tools. * Burak can initiate the discussion with people to kick-off the allelic analysis sub-working group and get people who are interested to steer the discussion. * Bing and Bill will initiate the discussion for the sub-working group. * Member from Bill’s group, Peter Fraser’s group and Amos Tanay’s group needs to be involved in the single cell Hi-C sub-working group. * There are published solutions to the stuff being discussed and in the short term those solutions can be considered. ~~TABLE_CELL_WRAP_STOP~~|