Cloud Computing Outlook

Solving the Genomic Big Data Crisis with Cloud Computing

By Cloud Computing Outlook | Thursday, July 01, 2021

The rapid growth of genomics in the drug realm and personalized treatments can be traced back to two significant developments over the last decade; plunging sequencing costs and, consequently, an explosion of data.

FREMONT, CA: Drug discovery has been hugely reliant on genomics over the last decade. Genomics has enabled scientists to develop more targeted therapies, boosting the chances of successful clinical trials. Studies show that 40 percent of FDA approved drugs in 2018 had the potential to be personalized to patients, primarily based on genomics data. Over the last four years, there has been a significant increase in this statistics, and it is unlikely to slow down soon.

The rapid growth of genomics in the drug realm and personalized treatments can be traced back to two significant developments over the last decade; plunging sequencing costs and, consequently, an explosion of data. With sequencing technologies continually evolving and being optimized, there has been a steep reduction in genome sequencing costs. The first sequenced genome, part of the Human Genome Project, cost USD 2.6 billion and took 13 years in the making. Today, one can get their genome sequenced in less than a day for approximately USD 990.

Genomics today falls under the big data field. A single human genome sequence approximately produces 200 gigabytes of raw data. If even 100 million genomes are sequenced by 2025, we will have accumulated at least 20 billion gigabytes of raw data. Such massive volumes of data need to be stored somewhere. Furthermore, sequencing is futile unless each genome is thoroughly analyzed to achieve meaningful scientific insights. Genomic data analysis tends to generate an additional 100 gigabytes of data per genome, posing a severe dilemma over storage. Also, the extensive and advanced computing infrastructure required to make use of the same is often outside the economic range of smaller firms.

Enter Cloud Computing!

Over the years, cloud computing has gained traction as a feasible way of processing and storing large amounts of datasets quickly, without having to deal with maintenance and upgrade of servers. The pay-as-you-go model allows you to rent computational power and storage and is pervasive across many different sectors.

Cloud offers a significant amount of flexibility, which is an alluring characteristic for small life science companies that are short on capital. HPC costs can make or break a company. As a result, small companies often prefer to test their products on the cloud first and see their profitability before making large scale HPC investments.

The inherent elasticity of cloud services allows companies to scale their computational resources concerning the amount of genomic data that they need to analyze. This involves less financial risks and reduced idle computational resources. Elasticity is also applicable to the storage, as data can be downloaded directly from the cloud and removed once the analysis is complete. The increased number of protocols and safe practices ensure data protection. Cloud resources are allocated in virtualized slices called instances. Each instance hardware and software is pre-configured according to the user’s demand, ensuring reproducibility.

Despite having numerous advantages, the global adoption of cloud in genomic analysis is yet to be seen. This is mostly because organizations feel that cloud computing costs, in the long run, can be more expensive than HPC costs.

See also: Top Cloud Consulting/Services Companies