This is an old revision of the document!
AgendaPresentation on DamID protocol and data standard by Tom van Schaik (Bas Van Steensel lab). DamID Protocol and Data Standard
DamID works by expressing a fusion protein: gene of interest (lamina-associated genes in recent case to study speckle structure) and Dam, which methylate near chromatin. The DNA is then digested with DpnI (only the methylated sites can be digested) and used to build sequencing libraries. There was a destabilization domain built within the lentiviral vector, which will destabilize the resulting molecule and cause it to degrade (therefore low expression) unless protected by shield protein. DamID protocol works without using shield. The amplification step in library preparation is called by Methyl-PCR because only methylated DpnI sites was amplified. The data workflow involves removal of adapters (Illumina and DamID adapter), mapping to genome, count GATC fragments (DpnI site), normalization over Dam-only control, then calling associated domains using HMM. Quality Controls
The number of 5kb bins across the genome associated with lamina in HFF overlaps greatly between replicates. (may use actual domains, which may encompass multiple bins to improve representation) Data Availability2 replicates of H1 and HFF are available on 4DN portal A 3rd replicate has been planned. Nucleolus DamIDData in HFF appears to be nice with 20kb bins, but with other cell lines (HCT116 and H1), the data appears to be much noisier (Observation can still be made though so the group is determining whether the current data is good enough already). Questions and CommentsHow to define if the data is “good enough”? To determine whether the correlation values are good enough, there is one way: once enough data points are generated, there would be a distribution of correlation values. Therefore, it is possible to check them to establish a threshold where the correlation values within replicates fall within and outliers members can be classified below the threshold. This may be established with DCIC once a large number of data sets have been submitted and some statistics can be applied. However, there is also possibility that this is a biological phenomenon where in some cell types the interaction might be weak, therefore, the signal of nucleolus is larger. This can be confirmed with imaging data (FISH, for example), or replication timing. This may also affected by the capture/fusion of Dam domain, with LMNB1 the capture/fusion may be good while for the 4xAP3 it might not be the case. Do you think that may be a protein-dependent phenomenon? For lamina analysis the methods have been quite robust, we did 4 or 5 proteins associated with lamina and they all showed very similar results. For the nucleolus we tried several proteins as well and AP3 appeared to be the best so far. There might be a “magical” protein for nucleolus that performs better. There is also possibility of liquid phase separation in nucleolus that could have impact on the results. In such a case, then the tethering peptide AP3 would be inside the liquid droplet while the Dam is outside the droplet. So different configurations may be tried but at some point we would have to say this is the best we can get. Have you tried to compare the data with the other sequencing data sets? We haven’t done this yet because they only submitted raw reads and we would like to know about the normalization information. This is something that we need to do obviously and would be done. Last year DCIC made the decision of several resolutions to be used in Juicer for all data sets, could DamID provide data at such resolution level? Sure. How critical are the gel-imaging and cell-imaging QC pictures to the data quality? Should they be included to all data submissions? Gel images and cell images are vital to final data quality and we submitted all our images to DCIC already. We would provides guidelines of such images along with the experimental procedures as well. How much is the target sequencing depth? Can such targets be included in the guidelines? It depend on the resolution. We are aiming for 30M reads, 10M reads would get similar information but we won’t recommend go below 10M. Those recommendation will be included. Comments about the quality control guideline. The autocorrelation function used to evaluate data quality is quite helpful and should be included in the guidelines as well. However, caveat needs to be noted that it is based on the assumption that domain configuration is similar across different cells within the same line. |