Main opinion: if the analysis is already implemented in HiC ecosystem then it would make more sense to use HiC; if novel analyses are needed (new algorithms, etc) then the python APIs by Cooler would be better
There are python APIs in HiC as well (albeit not public).
About the properness of the user audience
Users should be representative for the actual user once the data are published
Users should be able to evaluate all the functionalities
About the structure of survey questionnaires
Subjective answers may bring biases, there should be a choice of which format they prefer
Simple multiple choices on other aspects will filter responses
About supporting multiple formats
For a short term supporting both is fine
Files can be optimized during the process, prepare for a second round with a time window for both data formats to optimize, polish APIs, etc.
Since more softwares will be developed for 4DN, simplicity in using would be more important
However, as a long lasting solution maybe one format is preferred to a two-format solution
There would be more confusion if more file formats are supported
If all the conversions are perfect then multiple formats would be OK, but there would be more points of error in any of the component.
About the analysis pipeline
Keep data in lossless bam files, adding information for downstream filtering, then discard original FASTQ files (they can be re-generated from bam if needed).
Pairs file can be generated from the bam file by applying filters
Add information about every pair (mapping quality, validity for pairs, etc.) to the bam file
Bam files are pairs are needed for the information and therefore need to be there
4dn/phase1/working_groups/omics_data_standards/minutes-09-12-2016.1550260509.txt.gz · Last modified: 2025/04/22 16:21 (external edit)