AGENDA
Hg38 vs hg19 and other issues regarding to references
Difference between the two references and the pros and cons for using either.
Advantages for hg38
Most of the improvements in hg38 are in variation contigs and haplotypes that 4DN might not use;
However, ENCODE will shift to hg38 and stop supporting hg19 and other consortiums will do so;
The gap region in hg38 is more accurate than those in hg19.
Why do we even consider hg19?
Different versions of hg38.
There are different versions of hg38 (GRCh38 vs hg38 vs other small versions),
DCIC suggested picking the one ENCODE uses.
The coordinates are all the same with all those versions. But the haplotypes included in each variation is not the same.
It appears that ENCODE is using a different version from the rest of IHEC. The reason is still not clear right now.
The ENCODE uses the main chromosomes plus mitochondria only so there would be no difference in any of the versions.
Legacy support
There has to be conversion of legacy data from hg19 to hg38. Support from DCIC to facilitate this conversion?
DCIC would suggest reprocessing all data from the start of the pipeline. This can be done on published datasets as well.
For 4DN data, DCIC will recommend running uniform pipelines so they will all be in hg38.
For published data, DCIC will talk with the rest of 4DN to set up priorities of conversion. ENCODE would be the important datasets but they are doing the conversion by themselves (albeit delayed).
Other IHEC databases and Ensembl data?
There are two types of legacy data:
Supporting both references.
The argument for keeping supporting hg19 is for cross-referencing. But there will be impact on resources.
This depend on the data volumes. In principle it is doable to support both but the work will be slowed down.
Also there will be a lot of other components that will be involved (data portal, ways to make sure people understand which reference they are talking about). Burak would suggest this not being worthwhile.
DCIC would be only processing hg38, and will process limited amount of important data currently at hg19 to convert them to hg38.
DCIC will want to keep this “limited” amount small, namely 10 publications. So far 3 publications has been done and it might be OK to add a couple more.
Curating a publication is the bottomline of the work because of the shared processing procedures
People should recommend the publications for DCIC to convert from hg19 to hg38 to SC (can be ranked by citation numbers).
Male and female references.
Hi-C Normalization, Domains, Loops, etc.
This would be postponed to the next OMICS meeting.
|