This is an old revision of the document!
AGENDA:
Hi-C DATA STANDARDS:
All these information will be included when the metadata is submitted, but can be done without a data file. The questions here is: What we are going to do to get these data files? Quality control metrics? and Threshold on those metrics?
PRESENTATION: First slide: Hi-C Experimental structure
Second and third slides: Biosample
. ycporkVc9kN5m9xFZD4uHlw_cxWuK8sy81mZ8ZIG-roAmz0VZsxLCK29Tm1bOwRJM17RiqHWNgpkLN1_4kJMI1VL290vUjnj8baOiaZCoM1fKbpUd0_r0N2KHaU1SUhxGSyuQb jyotoDQYgD8bN7L20pfLZ3gFNCHQQh_4DiHsjGhThUDhFosDT1pbM5bI1CEwABM746G6Rd5kAlmvk9bKv4h368T_dBRnnYgwKisqMoZ4LGuTZGdknmAOyTiCvX0dTXomnnt4mms3 Fourth slide: Biosample control information * These information entries are put up forward by the Cell working group. * A reference to SOP document is also needed * Which ones are required and which ones are optional? * There will be essential checks (standard to be jointly finalized by Cell WG and OMICS WG), which the submission is required to pass before the dataset can be released * Karyotype image: cell working group mentioned that we might use Hi-C data to determine if the karyotype is normal. Erez and David Gilbert are working together to create a systematic pipeline that is sensitive to karyotypic abnormalities. Is planned to be used as an verification system. * Karyotype information is only required for unstable cell lines that have been passaged more than 10 times. If any entry is required, it is important to tell the consortium ASAP so that people can get prepared before generating any data. This will be put to the steering committee call. * There is a Google Doc by DCIC about these metadata entries and will be shared to OMICS by Burak. orX4jBTIJX5LAd_aiBaa8DdAYPXrxmijgkdF_tNsiFnBolm8jiuX3YRmuanYXho8NeNEh7YeGW1tyiYX1Rk0GqHOhe-CtIbbVjAGaoau-BECU7qrtQ2Uc0gpse3ZOkDs5xHe5fGX NUMBER OF READS: * Number of reads was previously decided to be not based on features to be called, but by convenience ~400-500 million per sample. * Up to a certain level of depth is very difficult to distinguish features on these maps, therefore is very important to provide some guidance to the consortium. The features that we are trying to capture is based on higher depths of sequencing. A single cell will have ~1B contacts, corresponding to 2B read-pairs. * For this, we need a power analysis for a loop-caller to see how good the results are, how many number of loops the function will call. * We can have two standards: 500M required for TADs calling but much deeper (for example, 2B) for loop-calling. However, the current domain-calling tools all needs deeper sequencing, otherwise the resulting domains will not be reliable. * DAWG is discussing about domain/loop-calling methods currently and OMICS may join in such discussion. DAWG has its agenda posted on wiki and is currently working on stuff like alignment filtering, matrix balancing, replicable analysis and will come to those methods afterwards. * But there appears to be a scheduling problem. DAWG meeting was once per month but has been pushed to once every other week so we might be able to move more quickly. * The power analysis is crucial and one on TAD callers were just published and can be used as an example. ALIGNMENT, NORMALIZATION, AND OTHER ASPECTS OF THE SOP: * The groups can present their own in-house normalization procedures so that people can compare. Ren lab has already prepared such presentation and can be scheduled at two-weeks notice. The metrics for method evaluation can be determined after the presentations. * Similar stuff can be said about domain-callers and loop-callers, we can make a Google Doc to put all these matters and schedule when those stuff are going to be discussed. We will follow up in email first about when to discuss about those stuff. * The next DAWG meeting (Thr. Apr. 20th, 8:30am PDT) \will be focused on pipeline discussion (alignment first and the rest of the meeting can be devoted to normalization). Two sessions may be needed for the three talks scheduled.
|