All Blog Posts
How to Accelerate Genomics Research with More Compute Power
The opportunities for solving big health problems through genomics research are tremendous; yet, the challenges associated with storing, managing and processing the massive amounts of data involved in these jobs are daunting. Accessing more compute power can help.
The field of genomics research is exploding as researchers make groundbreaking discoveries through next-generation DNA sequencing (NGS). The demand for sequencing is increasing exponentially while the cost of sequencing has dropped tremendously.
The latest statistics reveal that genomics data is doubling every seven months. As you might imagine, keeping pace with the velocity of such incredible growth presents new challenges.
Genomics Research Today
Genomics research may include studying the entire human genome or large subsets of DNA sequence data. Genomic testing locates variations in DNA that affect health, disease or drug responses.
Enabling personalized medicine is just one example of what’s possible from the study of genomes. Personalized medicine equips clinicians to more accurately determine an individual’s risk of cancer, treat diagnosed patients and monitor disease recurrence.
Researchers have already begun testing a personalized cancer vaccine on patients. The creation of this particular test resulted from collaborative efforts to identify tumor-specific mutations that could trigger immune responses in cancer patients. In effect, helping cancer patients fight their diseases.
Technical advances in DNA sequencing and computational biology are enabling and accelerating the field of genomics. When the Human Genome Project first launched in 2003, it took 13 years to complete the sequencing of a single human genome. According to a 2018 IDC Infobrief, sequencing can now be completed in as little as 27 hours – and continues to drop.
The cost to sequence a single human genome (one person) has dropped dramatically. What started at a cost of $3 billion in 2003 has now dropped to as low as $1,000. In their 2020 Big Ideas Report, ARK Invest reported a new target for reducing the average cost of sequencing a whole human genome – $200 by 2024.
In addition to the astounding cost and time reductions achieved in sequencing, there’s the exploding volume of data being extracted from each genome. This accumulation of massive volumes of data creates big data storage, management, and processing challenges.
The Three V’s of Genomics Research Data
As the volume of data continues to soar, the storing, managing and processing of electronic information is requiring new approaches for supporting the growing demand for access. Data is often siloed and access to significant compute power is a must-have for efficient processing of this otherwise unused information.
Volume (Data storage) – There are 3 billion units, or base pairs, of DNA across nearly 30,000 genes in a single genome. Each base pair equates to 750 megabytes of data. Imagine the amount of storage space required as the data continues to accumulate with the completion of each genome study.
Variety (Data management) – In addition to the volume of data, the types of studies are expanding. There’s the Million Veteran Program, The Cancer Genome Atlas, 1000 Genomes Project, and others. As a result of these distinct initiatives, data is siloed and almost certainly includes different points of data that must be carefully aggregated before it can be studied as a whole.
Velocity (Data processing) – Researchers are constantly facing challenges accessing the right infrastructure and compute environments. Processing enormous amounts of data requires serious compute power – and this is in addition to the demands for faster results and seamless scaling of infrastructure.
Fortunately, advancements in technology solutions and tools are making great strides in solving the challenges associated with data volume, management, and processing.
Required Skill Set: A PhD in Genomics OR Computer Science?
Yes, that’s right: all it takes is a few minutes of perusing current job postings in the field of genomics to see that a PhD and at least hands-on software development experience are both listed as required. While a researcher’s domain is centered around everything related to the study of genomes, they still have to know how to write software. Specifically, they have to code the algorithms they use every day to process substantial volumes of data.
Interestingly, many researchers learn how to code by necessity – essentially taking the hacker approach. Python has been their choice, as it is easier to learn in comparison to other programming languages. Knowing how to code in Python is what equips researchers with the skills necessary to build the algorithms for processing, analyzing, and extracting insights from their data.
While Python is relatively easy to learn, it’s also easy to use it to write bad code. In addition, many scientists are not experts in multi-processing so the algorithms aren’t set up to run as efficiently as they could. When code is not optimized – specifically algorithms – it delays processing and the production of insights researchers need to make important discoveries. Fortunately, solutions for working with less-than-perfect code exist today.
In reality, coding may not be the only technical skill researchers need to know. Responsibilities can also include setting up the technical environment and infrastructure to run that code. However, researchers are not likely to be cloud natives or knowledgeable about modern software development practices that include using the cloud, web technologies and tools for scheduling and orchestration.
Time-to-market is crucial when it comes to improving patient outcomes through genomics research. Now is the time to start exploring what’s available to streamline data management and make those important discoveries faster.
Access Powerful Compute Without DevOps
Imagine being able to enter a single command line interface to launch a script or algorithm – even if it’s not coded perfectly. Think single-source access to parallelized computing in the cloud that’s scalable. There’s no need to worry about where and how the job runs.
Dis.co’s compute platform leverages existing infrastructure (on-prem and cloud) as a single resource. The Dis.co Smart Scheduler seamlessly distributes processing jobs across available resources for optimal resource configuration and compute models. Dis.co also eliminates the need for DevOps teams to build in-house scheduling and orchestration tools.
Available resources on the Dis.co platform may be an on-prem or private cloud, or public, hybrid, or multicloud. It even includes accessing GPUs in addition to CPUs. Originally designed for gaming, GPUs boast nearly 200 times more processor per chip than CPUs.
If you’re ready to simplify data management and accelerate running processing jobs to accomplish research goals, contact Dis.co today.
Most recent articles
Dis.co Builds a Serverless Solution for GPU Compute Stacks
Dis.co works hard to stay on the cutting edge of serverless solutions. Our team is constantly building new features and…
How to Accelerate Genomics Research with More Compute Power
The opportunities for solving big health problems through genomics research are tremendous; yet, the challenges associated with storing, managing and…
How to Build a Render Farm (Distributed Rendering) with Dis.co
While the speed and processing power of computers have increased exponentially over the past few decades, it seems that we…
IoT on the Edge: Opportunities and Challenges in Distributed Computing
The Internet of Things (IoT) is getting bigger every day, and as it does devices with computing power are simultaneously…
The Top 5 Tech Trends for 2020
This year, engineers and developers are looking forward to seeing what will come next in any of several exciting advancements…
Packet Customers Discover How to Scale Compute Power at IFX2019
Dis.co participated at Packet’s annual customer event, IFX2019, which was held in December in Las Vegas. Since the conference was…
Ways to Improve Batch Processing and Get Faster Results
When you hear the phrase batch processing what comes to mind? One of your first thoughts could be, “Oh, that’s…
What are the Advantages and Disadvantages of Cloud Computing?
The adoption of cloud computing continues to rise, and that growth is expected to increase into the foreseeable future. Gartner…
Dis.co Makes a Splash at Samsung Developer Conference 2019
The Dis.co team was out in force at this year’s Samsung Developer Conference (SDC 2019) held on October 29 and…
How Does Dis.co Work?
Many compute jobs take hours to complete, especially when data is growing too fast for hardware to keep up. Sharing…
Dis.co’s Vision for a More Powerful Kind of Cloud Computing
Consider the fact that more than 3.3 billion people or about 42% of the world’s population now have smartphones. Then,…
What Technology Tools Can Help Data Scientists?
To say the field of Data Science is exploding is truly an understatement. It only takes one statistic to be…
Serverless Computing vs. Cloud Computing: What’s the Difference?
When IT pros first see or hear the phrase serverless computing, their minds go in multiple directions. Is it true…
What is Serverless Computing?
How many developers would prefer spending more of their time building new products that create value rather than configuring and…
What is DevOps and Why You Should Care
Companies have set the bar high when it comes to their expectations for speed and responsiveness, on-demand availability, easy access…