This is an old revision of the document!
QC metrics
Recommended antibodies
Crosslinking conditions
Hi-C - digestion/ligation efficiency, noise level
IP - enrichment performance
Library preparation - complexity All three aspects have both a qualitative detection methods and quantitative detection method (with shallow sequencing)
Hi-C - DNA fragment size before digestion (should not have smear), after digestion (should be a smear at lower size range, note that the expected size should be larger than the theoretical value because of potential chromatin structures and accessibility issue) and after ligation (should be a similarly-shaped smear at a larger size range)
IP - DNA fragments after sonication should be around 100~600bp, incomplete sonication will result in larger fragments and affect IP performance. IP yield (IPed DNA / input DNA) is also a good metric. In general for H3K4me3 / H3K27ac will have <1~3% IP yield, and CTCF / PolII will have < 0.1%. However, IP yield is necessary but not sufficient for a good IP.
Library preparation - Libraries with good complexity require at least 10~20ng of IPed DNA, with 11~13 PCR cycles and 20~40% duplication rate at ~250M reads. Libraries with worse complexity will need more input.
Glossary:
A - sequenced read pairs
B - valid read pairs
C - valid read pairs after PCR duplicates removal
D - inter-chromosomal read pairs
E - intra-chromosomal read pairs
F - short-range (⇐1kb) of E
G - long-range (>1kb) of E
H - F that overlap with ChIP peaks
Hi-C - trans ratio (D/C) reflects noise level (reference < 20~40%), long-range cis ratio (G/E): (reference > 50~70%)
IP - on-target rate (H/F): (reference for histone marks > 20%, for TFs > 5~10%)
Library preparation - PCR duplication rate (C/B): (reference < 3%)
High specificity - high on-target rate
High affinity - large IP yield
Highly robust - less batch effects (monoclonal Ab is better than polyclonal) Currently recommended tested antibodies (all monoclonal):
CTCF: Cell Signaling, 3418T
H3K4me3: Millipore, 04-745
H3K27ac: Diagenode, C15200184-50; Active motif, 91193 Bill Noble: Will ENCODE develop QC metrics on Hi-C data? Shall we establish a data quality measurement procedure? There are several software that can evaluate Hi-C datasets, like HiCRep or other tools as described in https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1658-7.
Miao Yu: The current QC metrics is before deep sequencing and the evaluation can be done after data generation
PLAC-seq / HiChIP has lower IP efficiency than ChIP-seq
Hi-C may disrupt protein complexes
Biotin enrichment after IP may enrich DNA fragments without protein binding
Different crosslinking conditions may affect on-target rate. Results are preliminary and higher temperature does not improve on-target rates. One DSG + HCHO test had a high on-target rate but needs further verification.
Burak: What is the intended disseminate method for all this results?
Bing: We are currently preparing a protocol that will be circulated within 4DN and be submitted to Nature Protocol but the manuscript is still under work.