==== Omics Data Standards WG - Minutes 09-12-2016 ==== |~~TABLE_CELL_WRAP_START~~ Usability study for different file formats. * Members (grad students and postdocs) in labs within 4DN are drafted to survey their opinion on using both formats. * Reports can be seen here on Google Docs: * [[https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnw0ZG5kYXdpa2l8Z3g6Mjc0NjQzOWIwYTEyOTI0Ng|https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnw0ZG5kYXdpa2l8Z3g6Mjc0NjQzOWIwYTEyOTI0Ng]] * Main opinion: if the analysis is already implemented in HiC ecosystem then it would make more sense to use HiC; if novel analyses are needed (new algorithms, etc) then the python APIs by Cooler would be better * There are python APIs in HiC as well (albeit not public). About the properness of the user audience * Users should be representative for the actual user once the data are published * Users should be able to evaluate all the functionalities About the structure of survey questionnaires * Subjective answers may bring biases, there should be a choice of which format they prefer * Simple multiple choices on other aspects will filter responses About supporting multiple formats * For a short term supporting both is fine * Files can be optimized during the process, prepare for a second round with a time window for both data formats to optimize, polish APIs, etc. * Since more softwares will be developed for 4DN, simplicity in using would be more important * However, as a long lasting solution maybe one format is preferred to a two-format solution * There would be more confusion if more file formats are supported * If all the conversions are perfect then multiple formats would be OK, but there would be more points of error in any of the component. About the analysis pipeline * Keep data in lossless bam files, adding information for downstream filtering, then discard original FASTQ files (they can be re-generated from bam if needed). * Pairs file can be generated from the bam file by applying filters * Add information about every pair (mapping quality, validity for pairs, etc.) to the bam file * Bam files are pairs are needed for the information and therefore need to be there ~~TABLE_CELL_WRAP_STOP~~|