==== DAWG Meeting Notes 20170615 ==== |~~TABLE_CELL_WRAP_START~~ **A. DCIC status and plans for reaching consensus** - Picked 6 datasets to use for tool evaluation (IMR90 and GM12878) - Current pipeline implementation uses juicer for alignment and filtering, juicer and cooler for normalization - Building and testing a reproducible analysis infrastructure connecting data portal + sevenbridges genomics - - Containerizing the tools and making these containers available - - https://github.com/4dn-dcic/docker-4dn-hic - Once cooler + juicer normalized matrices are available - - We are in contact with hicnorm developers to add hicnorm to comparison - - We are in contact with hicrep developers to run hicrep - Note that reaching scientific consensus is more important; Different implementations can be used. Comments: - Bill Noble has reached out to data production groups to get input on which tools should be considered. (Job->Leonid, Bing, Erez) - - We should have a bake-off of tools; spreading the work to tool developer labs should make it easy. Tools should take dcic standard files. - Gurkan/Noble lab/Anshul Kundaje et. al. have developed tools to run 4 different QC metrics side-by-side. - Peter Park: It would be great if people could agree on evaluation metrics before running them. - We should include data from different production groups. Burak will ask Bing and Job if we can include the newly submitted data. **B. Read filtering:** - Based on presentation last week, Anton / Mirny lab propose using only two filters: duplicates, short distance pairs (but keep low mapq reads, and ignore restriction sites.) - Follow-up from last time: Do we know if looser filters lead to any local biases on contact matrices? Not yet investigated. - Other parties are not on the call. We will write this up as a proposal. If there are objections, we can work with tool developers to run them side-by-side. ~~TABLE_CELL_WRAP_STOP~~|