A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a codeathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Here, we present the applied workflows, initial results, and lessons learned from the codeathon.
Recommended citation: Connor R, Brister R, Buchmann JP, Deboutte W, Edwards R, Martí-Carreras J, et al. (2019). NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements. Genes, 10(9), 714–18. http://doi.org/10.3390/genes10090714