Comparison of normalization methods
(Slides by Burak et al. available on wiki to explore different normalization schemes)
DCIC has followed through workflows and can now present normalization results in HiGlass and Juicer (links in Slides)
DCIC will be working on 6 samples in the next step and work with Ren lab for HiNorm procedures
Metrics used to compare normalization procedures
There are no accepted metrics, four used by ENCODE may not be applicable to HiC data and it seems the current best way is to visualize first
Biases may be a good metric. When results near the unmappable regions are shown it might differ from different normalization methods
Currently using published datasets (first 6), and DCIC is planning to include the 4DN datasets (submitted by Job, for example)
The reason why the 6 examples are chosen is that for a single cell type there can be different publications for comparison purposes
One big dataset from Job on HFF cells and some from Ren’s lab is pending. DCIC is planning to process 10 additional datasets
HFF data has 2 replicates and each has ~2B unmapped reads, which is the same for ES (making it ~8B in total)
DCIC needs to agree with the labs producing data about how they should be processed
The plan would be to have a look at the normalizations at those data
The current dataset is Rao et al. IMR90 datasets, it has the entire dataset but a smaller sample from the data.
We can compare this (diluted HiC) to in situ HiC data.
We need to make sure to make the in situ HiC data available to people before letting them view the data
Optical mapping
Feng’s lab has performed optical mapping with a manuscript with Job’s lab in bioRxiv. This would be discussed in the next meeting.
It might be helpful to have the agenda in the email beforehand.
Discussion of genome support
-
What does GRCh38 is more “future proof” than hg19 mean?
GRCh38 would be using more and more, while hg19 less and less, so in the future, for example, three years, GRCh38 would be prevalent.
One thing is that infrastructure to support a new reference, it takes lots of time. While the genome assembly may continue to evolve (we may have GRCh39 in the future).
This policy has passed unanimously at this OMICS meeting and will be sent to Steering Committee.
This resolution will also be highlighted in Joint Analysis Working Group meeting to make sure people know that this reference has become the standard for 4DN analysis.
HiGlass and domain calls
HiGlass and Juicebox both supports such domain calls from Focarto et al., a document need to be generated by DCIC.
Domain calling, algorithms involved, normalization
The domain calls are based on various subsets of HiC data and it not yet shown which data is contained in the subpanel. This subsampling may be a disadvantage for comparison.
However, it may also used to show the details of different TAD calls based on different datasets.
Currently Erez’s lab is creating multiple JuiceBoxes and present replicates with any algorithms to compare the called domains, to facilitate comparing algorithms.
Arrowhead results are included for completeness, however, if people would like to use arrowhead, they will need to manually provide parameters, otherwise the algorithm would see the data as incomplete (too “shallow”) and refuse to run. It may be extremely hard to correctly call domains from such shallow datasets.
In the Juicebox results, the domain calls from the Nature Methods paper are superimposed on the original data.
Some of the methods appear to pick unmappable regions and/or compartment domains.
This shows that the definition is not very clear yet and many of the algorithms will call features indiscriminately, which may generate strange/weird stuff.
The boundary in the representation should be much wider than the shown zone, and such uncertainty is different in different algorithms and has not been implemented in JuiceBox yet.
Also this is the most deep datasets available right now and other datasets will be shallower and generate more artifacts.
For other datasets, DCIC may work to generate HiC file from them and facilitate the comparison.
Standard protocols for other technologies
There are some technologies (for example, ATAC-seq) that may need protocols finalized by 4DN so that DCIC can start accepting those data. These protocols have been pretty much standardized and probably don’t need much discussion
People will be asked to submit such protocols to 4DN to proceed.
|