==== DAWG Meeting Notes 20170803 ==== |~~TABLE_CELL_WRAP_START~~ ==== Comparison of Hi-C processing tools ==== Burak Alver\\ Reminder:\\ We are planning to run:\\ - fastq -> pairs (juicer)\\ - pairs -> hic (juicer)\\ - pairs -> cool (cooler)\\ - export juicer normvectors and import to the cools.\\ - export cooler normvectors and provide in juicebox format.\\ 4 of the 6 data sets have completed running.\\ The resulting files and juicebox.js links are\__**[[https://docs.google.com/spreadsheets/d/1mbOaU5C35XBTCFQ9V1QizVG0YPQrJivTN1p8yapn5Ik/edit#gid=894703464|here]].**\\ Next steps:\\ - finish running the largest 2 samples.\\ - start implementing hicrep.\\ \__ ==== Visualizing TAD calls on HiGlass ==== Peter Kerpedijev\__\\ HiGlass all calls all reps link:\__[[http://www.google.com/url?q=http://higlass.io/app/?config=JALHH-HzQGeJCaJaU9EwTA&sa=D&sntz=1&usg=AFQjCNGtWAn_P5L2CtIROZ96xTVauK8mdQ|http://higlass.io/app/?config=JALHH-HzQGeJCaJaU9EwTA]] HiGlass RepH calls link:\__[[http://www.google.com/url?q=http://higlass.io/app/?config=IPCHmdOQR4CDY2sqj5VJHQ&sa=D&sntz=1&usg=AFQjCNHI1A4N-aCeq7sgmlMpP-Cu3OWHSg|http://higlass.io/app/?config=IPCHmdOQR4CDY2sqj5VJHQ]]\\ The RepH calls link above corresponds to Figure 3 from\__[[http://www.google.com/url?q=http://www.nature.com/nmeth/journal/v14/n7/fig_tab/nmeth.4325_F3.html&sa=D&sntz=1&usg=AFQjCNEhoCSPC6c_F8bQlbUCWPSmaETN_A|Foracto et al]].\\ Easy to remember link:\__[[http://www.google.com/url?q=http://higlass.io/examples&sa=D&sntz=1&usg=AFQjCNH1eLJSIERtxmbA5fjw6y8fXFvDKA|http://higlass.io/examples]]\\ \\ - Showcasing 8 linked views and overlaid TAD calls.\\ \\ - Erez had mentioned that the TAD calls in Forcato et al are for one replicate, constituting a shallow data set.\\ - The RepH view corresponds to the actual matrix that Forcato et al used.\\ - Erez reiterated that using this shallow a data set is not ideal.\\ - A caveat on the view: the matrices are in hg38. The TAD calls were lifted over from hg19. At most 5% of the TADs were lost across 7 sets.\\ \__ - Erez: z-scale (color scale) zoom-in/out feature will be useful, and should be easy.\__\\ - - Pete: It is easy, but there is an advantage to optimize on UX.\__\\ \__ ==== Domain calling with Arrowhead ==== Neva Durand\\ (See\__[[https://docs.google.com/viewer?a=v&pid=sites&srcid=NGRudWNsZW9tZS5vcmd8NGQtbnVjbGVvbWUtd2lraXxneDo2ZDgyYzEwYTdiNjEwNGUx|slides]]\__for details.)\\ \\ Background: features at different scales are resolved with different sequencing depths:\\ - compartments: checkerboard pattern, extracted with eigenvectors, Aiden 2009 (~Mbs)\__\\ - TADs: seen in Dixon 2012, directionality index (~1Mb)\__\\ - loop domains: seen in Rao et al 2014, peak+square motif (as small as 100kb)\__\\ - exclusion domain: Sanborn et al 2015; Even without CTCF, loop-domain like structures are present.\__\\ - cohesin degradation eliminates all loop domains but not all loops, and does not eliminate compartments.\\ Overall: Different contact domains have different biologies; we need to define the biology we are after.\__\\ \\ Arrowhead:\__\\ - similar to directionality index in principle.\__\\ - But matrix transformation makes the sought-after feature much more clearly defined.\\ \__ Juicebox.js - linked views feature is now also available in juicebox.js\\ - showing Focatto + arrowhead on complete GM data\__\\ - Also see IMR90 arrowhead vs. directionality index results.\\ \__ [[http://www.google.com/url?q=http://www.aidenlab.org/juicebox/HIC003_GM12878_MboI.html&sa=D&sntz=1&usg=AFQjCNFpsVysSp9BLUgTSvX1fneO8AHOeg|http://www.aidenlab.org/juicebox/HIC003_GM12878_MboI.html]]\\ [[http://www.google.com/url?q=http://www.aidenlab.org/juicebox/GM12878_combined.html&sa=D&sntz=1&usg=AFQjCNEunIAWghnbZcLSflkOPxwEInRw4Q|http://www.aidenlab.org/juicebox/GM12878_combined.html]]\\ [[http://www.google.com/url?q=http://www.aidenlab.org/juicebox/IMR90_TADS.html&sa=D&sntz=1&usg=AFQjCNEWtgWTBR73frhLiHOHwKwyZi9JaQ|http://www.aidenlab.org/juicebox/IMR90_TADS.html]]\\ \__ ==== Discussion ==== There are three inter-related topics 1. Defining different types of domains with a biological basis.\\ 2. Resolvability of features vs. sequencing depth\__\\ 3. Assessment of different callers:\__ - Are they making accurate robust calls?\__ - Are they making good use of the available data? To partially separate the three points, we will proceed with a presentation of cohesin and CTCF depletion work by Erez and Leonid at first possible next call. ~~TABLE_CELL_WRAP_STOP~~|