Mar 4, 2020

Bioinformatics Tooling

The domain of bioinformatics tooling holds great opportunity for startups and new products. First, I’m going to explain why I think that we’re at an inflection point for bioinformatics, both in terms of demand and utility. Next, I’ll explain why I think bioinformatics tooling will prove paramount to the success of genomics in consumer medicine and finally I will explore some classes of tooling that are ripe for innovation.

Redux on “Software is Eating the World”

In 2011, Marc Andreessen, of Andreessen-Horowitz / a16z, coined the phrase, “Software is Eating the World” with an article in the WSJ [1]. The article sought to explain why:

Traditional technology companies such as HP were shifting focus from hardware to software.
Internet companies like Facebook and Twitter were undervalued in the market.
Rebuff the investors denouncing the nascent tech boom in 2011 as a repeat of the 1990s dot-com crash.

Andreesen argued for the intrinsic value of these new software enterprises, claiming that the critical infrastructure for software was now in place:

Bedrock technologies The microprocessor and modern internet had matured after their inventions some forty years and twenty years prior respectively.
Demand Smartphones stood to connect over 5bn people to the internet in the next ten years.
Incipient tooling The availability of cloud computing resources dramatically reduced the cost and overhead of launching an internet business.

He also described the following challenges to the software boom:

Economic headwinds as a result of the financial crash [2].
A global shortage in skills required to build software.
The traditional challenges of founding and sustaining new businesses.

We’re approaching a decade since this piece was written, and Andreesen made few missteps in his predictions. The value that he attached to software tooling was a particular win. In 2018, AWS took in over $25.7bn [3], more than McDonald’s. The number of tools that have sprung up to support the software development ecosystem for consumer products is staggering; build, release, code management, code search, devOps, etc… As consumers expect more from and become more reliant upon technology, we can only expect software tools to proliferate further. Note that a lot of the infrastructure built around technology exists to provide access to non-software engineers; tools like Jira and Phabricator provide access to PMs (product and project). Buildkite, Ansible, Spinakker, and other release tools provide access to release engineers.

Perhaps the only industry which has not seen radical overhauling is healthcare. Both in the UK and the US, healthcare has seen limited innovation. We have had the tools required to provide telemedicine to westerners for decades, but we are only just seeing the adoption of software such as Babylon in the UK [4]. The appalling integration of digital healthcare systems is a testament to the power of markets over public welfare [5]. The promise of personalized healthcare has not been recognised; we were promised GATTACA but got absolutely NATTACA. Standing in the way of innovation in healthcare are the national regulatory bodies, such as the MHRA in the UK and the FDA in the US. These bodies have been tardy in adapting to technological development but they are not solely to blame for the failure to deliver on healthtech. Not only are the regulatory bodies finally gaining institutional knowledge on how to effectively regulate healthtech products, but there are a plethora of health related biotech offerings that are waiting in wings, or just making their way to the forefront.

Redux on “Biology is Eating the World”

Cost to sequence human genome

Some employees at Andreessen-Horowitz recently put out a piece called “Biology is Eating the World” [6]. Comparing this article to its predecessor, we can draw out some interesting parallels:

Bedrock technologies The cost to sequence the human genome is way under Moore’s Law (see above). Combining this trend with the potential of targeted assays [7], the price point for medical interventions that require genetic information is shrinking quickly. These precision genetic panels now underpin most of the consumer genetic testing companies, such as 23andMe and Ancestry.com. Automation technologies are making the possibility of sequencing at scale a reality.
Demand CRISPR [8] and gene circuitry [9] are more reliable methods for altering or implementing biological processes on the back of genetic information. Honing applications for these methods and implementing them in a regulatory environment requires vast quantities of genetic information for error-checking.
Incipient tooling Bioinformatics tooling is niche, but a body of solid tools exist which are used widely. The IGV genomics browser [10] is a mainstay of most bioinformatics labs and provides a highly performant, portable tool for browsing genomes. A set of well tested pipelines like bcl2fastq and BWA can move researchers quickly from raw sequencing information coming off a sequencer to mapped sequences of DNA.

Looking more closely at these claims, I think its apparent why bioinformatics is about to enter into its golden era. Companies that are engineering biology as an industry, like Gingko Bioworks [11], are looking to sequence exponentially more DNA as they scale. The cost of sequencing is exponentially decreasing. New sequencing technologies that allow scientists to evaluate epigentic information, like methylation patterns, are opening up entirely new fields of biological understanding. Advances in traditional computation and pattern recognition are empowering researchers to extract signal from the noise generated by six million years of human evolution, like predicting height to within an inch [12].

The relevance of tooling

Historically, portability and performance have dominated conversations about implementing bioinformatics tools. Look no further than the IGV browser that I mentioned earlier. This Java app is extremely portable and highly performant, but it is clear that there were few UI/UX considerations made when designing its interface. The industry already has a set of file formats that are standardised between applications (.fastq, .bam, .sam), so the groundwork is set to produce rock solid tools that can drive bioinformatics research forward, and make the results of bioinformatics pipelines accessible to stakeholders outside of the research community.

Citations

[1] https://www.wsj.com/articles/SB10001424053111903480904576512250915629460

[2] https://en.wikipedia.org/wiki/Financial_crisis_of_2007%E2%80%9308

[3] https://www.sec.gov/Archives/edgar/data/1018724/000101872419000004/amzn-20181231x10k.htm

[4] https://www.gpathand.nhs.uk/switch-now?utm_source=babylon&utm_medium=header&utm_campaign=test

[5] https://www.healthcatalyst.com/insights/EHR-integration-digital-health-imperative

[6] https://a16z.com/2019/10/28/biology-eating-world-a16z-manifesto/

[7] So, A.P., Vilborg, A., Bouhlal, Y. et al. A robust targeted sequencing approach for low input and variable quality DNA from clinical samples. npj Genomic Med 3, 2 (2018). https://doi.org/10.1038/s41525-017-0041-4

[8] https://ghr.nlm.nih.gov/primer/genomicresearch/genomeediting

[9] Brophy, J., Voigt, C. Principles of genetic circuit design. Nat Methods 11, 508–520 (2014). https://doi.org/10.1038/nmeth.2926

[10] https://software.broadinstitute.org/software/igv/

[11] https://www.ginkgobioworks.com/

[12] https://www.genetics.org/content/210/2/477