Instead of defining custom file formats for each data type and access pattern… • Parquet creates a compressed format for each Avro-defined data model. • Improvement over existing formats1 • 20-22% for BAM • ~95% for VCF 1compression % quoted from 1K Genomes samples
the AMPLab • Apache 2 License • Contributors from both research and commercial organizations • Core spatial primitives, variant calling • Avro and Parquet for data models and file formats
well as I do.” A single piece of a filtering stage for a somatic variant caller “11-base-pair window centered on a candidate mutation” actually turns out to be optimized for a particular file format and sort order
code 2. You should have assembled a team to build your software 3. If you choose the right license, more people will use and build on your software. 4. Making software free for commercial use shows you are not against companies. 5. You should maintain your software indefinitely 6. Your “stable URL” can exist forever 7. You should make your software “idiot proof” 8. You used the right programming language for the task. Lior Pachter https://liorpachter.wordpress.com/2015/07/10/the-myths-of-bioinformatics-software/