User Tools

Site Tools


4dn:phase1:working_groups:omics_data_standards:minutes-09-12-2016

This is an old revision of the document!


Omics Data Standards WG - Minutes 09-12-2016

Usability study for different file formats.

About the properness of the user audience

  • Users should be representative for the actual user once the data are published
  • Users should be able to evaluate all the functionalities

About the structure of survey questionnaires

  • Subjective answers may bring biases, there should be a choice of which format they prefer
  • Simple multiple choices on other aspects will filter responses

About supporting multiple formats

  • For a short term supporting both is fine
  • Files can be optimized during the process, prepare for a second round with a time window for both data formats to optimize, polish APIs, etc.
  • Since more softwares will be developed for 4DN, simplicity in using would be more important
  • However, as a long lasting solution maybe one format is preferred to a two-format solution
  • There would be more confusion if more file formats are supported
  • If all the conversions are perfect then multiple formats would be OK, but there would be more points of error in any of the component.

About the analysis pipeline

  • Keep data in lossless bam files, adding information for downstream filtering, then discard original FASTQ files (they can be re-generated from bam if needed).
  • Pairs file can be generated from the bam file by applying filters
  • Add information about every pair (mapping quality, validity for pairs, etc.) to the bam file
  • Bam files are pairs are needed for the information and therefore need to be there
4dn/phase1/working_groups/omics_data_standards/minutes-09-12-2016.1550260509.txt.gz · Last modified: 2025/04/22 16:21 (external edit)