As a species, we're engaged in a crucial evolutionary struggle, and we're losing: pathogens are evolving resistance to drugs faster than we can make new ones. To slow down the clock and beat the bugs, we need to make sure that resistant pathogens don't get a chance to replicate unchecked in their human hosts. This means doing drug resistance tests to ensure that we only give patients drugs that their infections will respond to.
At Hyrax Biosciences, our software developers build web services that use machine learning to analyse DNA for drug resistance. While building Exatype, our drug resistance testing platform, we ran into a classic problem: how do we build a validation test with a verified result, when we're already the most sensitive tool on the market? To solve this problem, we turned to simulation. This talk is about a multithreaded python tool (Biopython, multiprocessing, numpy, pysam) that simulates every stage of the evolution of HIV drug resistance and the DNA sequencing process. Each run returns 300 unique, procedurally-generated HIV samples from "patients" with different histories and drug resistance profiles. We host the tool on AWS and integrate with Slack to run validations and report results.
The code to simulate your own HIV dataset is available publicly at https://github.com/hyraxbio/simulated-data. Other pathogens coming soon.