1. PCR: looking at pairs vs. max-bp differentce
2. Dangling ends / self circles
- evidence in read pair orientation discrepancy at short distances.
- Dangling ends: ~+- pairs out to 1jb
- self circles: ~-+ pairs out to 10kb in HindIII; not so evident in MboI since molecules don't circle back as much.
- Conclusion: We cannot trust contact within ~10kb for 6 cutters or within ~1kb for 4 cutters.
- Yunjiang: The error structure is to some extent library specific. Anton/Max: All HindIII data we've looked at has similar trends with roughly 10kb.
- Max: restriction efficiency can also have an effect in determining features. So we should look at ?2-3 restriction fragment scale
- Burak: Did you look at restriction fragment instead of bp distance. Max: Yes, in a paper before.\
* - Anton/Max: propose to keep the pairs because they can be useful. But we might want to flag them. But don't include them in corrections (normalization)
* - Bing: need to make sure these are removed in matrix.
3. MAPQ
* - cis/total ratio does have a MAPQ dependence out to MAPQ=60.
* - Try to estimate sensititivy/specificity vs. MAPQ threshold.
* - MAPQ>0: 17% of filtered reads comes from mismappers; MAPQ>=60: 3-5& of filtered reads comes from mismappers;\
- MAPQ>=60: 20% of reads are removed
- the effect of MAPQ>0 vs MAPQ>59 does not appear to be big in terms of cis contact probability distributions
- Conclusion: MAPQ>0 or MAPQ>10 might be the optimal cutoff.
- Followup
- Another solution:
\Hoachen: Similar studies in WGS from same cell type may be informative
4. Distance to restriction site
* - very close to rest-site (1-few bp): dangling ends
* - very far from rest-site (: random ligation
* - Also note low rate of within 30bp; because of bwa-mem requirement.
very close:\