User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-06-12-2017

This is an old revision of the document!


Omics Data Standards WG - Minutes 06-12-2017

AGENDA

  • Discussion of HG38 vs. HG19. \Which version should be used by 4DN? Pros and cons * Concepts of chromatin loops, domains. \Collection of opinions.

Hg38 vs hg19 and other issues regarding to references

Difference between the two references and the pros and cons for using either.

Advantages for hg38

  • Most of the improvements in hg38 are in variation contigs and haplotypes that 4DN might not use;
  • However, ENCODE will shift to hg38 and stop supporting hg19 and other consortiums will do so;
  • The gap region in hg38 is more accurate than those in hg19.

Why do we even consider hg19?

  • Probably because of legacy data. However, they are being remapped so this should not be an issue.

Different versions of hg38.

  • There are different versions of hg38 (GRCh38 vs hg38 vs other small versions),
  • DCIC suggested picking the one ENCODE uses.
  • The coordinates are all the same with all those versions. But the haplotypes included in each variation is not the same.
  • It appears that ENCODE is using a different version from the rest of IHEC. The reason is still not clear right now.
  • The ENCODE uses the main chromosomes plus mitochondria only so there would be no difference in any of the versions.

Legacy support

There has to be conversion of legacy data from hg19 to hg38. Support from DCIC to facilitate this conversion?

  • DCIC would suggest reprocessing all data from the start of the pipeline. This can be done on published datasets as well.
  • For 4DN data, DCIC will recommend running uniform pipelines so they will all be in hg38.
  • For published data, DCIC will talk with the rest of 4DN to set up priorities of conversion. ENCODE would be the important datasets but they are doing the conversion by themselves (albeit delayed).
    • There are still more data in hg19 currently in ENCODE to hg38 but the gap is closing.
  • Other IHEC databases and Ensembl data?
  • There are two types of legacy data:
    • Data from 4DN members (Hi-C, for example)
    • Data existed before 4DN

Supporting both references.

The argument for keeping supporting hg19 is for cross-referencing. But there will be impact on resources.

  • This depend on the data volumes. In principle it is doable to support both but the work will be slowed down.
  • Also there will be a lot of other components that will be involved (data portal, ways to make sure people understand which reference they are talking about). Burak would suggest this not being worthwhile.
  • DCIC would be only processing hg38, and will process limited amount of important data currently at hg19 to convert them to hg38.
    • DCIC will want to keep this “limited” amount small, namely 10 publications. So far 3 publications has been done and it might be OK to add a couple more.
    • Curating a publication is the bottomline of the work because of the shared processing procedures
    • People should recommend the publications for DCIC to convert from hg19 to hg38 to SC (can be ranked by citation numbers).

Male and female references.

  • ENCODE are using separate references for male and female samples
  • DCIC recommend not using different reference assemblies because it would pose difficulties in visualizations.

Hi-C Normalization, Domains, Loops, etc.

This would be postponed to the next OMICS meeting.

4dn/phase1/working_groups/omics_data_standards/minutes-06-12-2017.1550267391.txt.gz · Last modified: 2025/04/22 16:21 (external edit)