Introduction
Our goal, at Relevance Lab (RL), is to make scientific research in the cloud ridiculously simple for researchers and principal investigators. Cloud is driving major advancements in both Healthcare and Higher Education sectors. Rapidly being adopted by various organizations across these sectors in both commercial and public sector segments, research on the cloud is improving day-to-day lives with drug discoveries, healthcare breakthroughs, innovation of sustainable solutions, development of smart and safe cities, etc.
Powering these innovations, public cloud provides an infrastructure with more accessible and useful research-specific products that speed time to insights. Customers get more secure and frictionless collaboration capabilities across large datasets. However, setting up and getting started with complex research workloads can be time-taking. Researchers often look for simple and efficient ways to run their workloads.
RL addresses this issue with Research Gateway, a self-service cloud portal that allows customers to run secure and scalable research on the public clouds without any heavy-lifting of set-ups. In this blog, we will explore different use cases that simplify their workloads and accelerate their outcomes with Research Gateway. We will also elaborate on two specific use cases from the healthcare and higher education sector for the adoption of Research Gateway Software as a Service (SaaS) model.
Who Needs Scientific Research in the Cloud?
The entire scientific community is trying to speed up research for better human lives. While scientists want to focus on “science” and not “infrastructure”, it is not always easy to have a collaborative, secure, self-service, cost-effective, and on-demand research environment. While most customers have traditionally used on-premise infrastructure for research, there is always a key constraint on scaling up with limited resources. Following are some common challenges we have heard our customers say:
- We have tremendous growth of data for research and are not able to manage with existing on-premise storage.
- Our ability to start new research programs despite securing grants is severely limited by a lack of scale with existing setups.
- We have tried the cloud but especially with High Performance Computing (HPC) systems are not confident about total spends and budget controls to adopt the cloud.
- We have ordered additional servers, but for months, we have been waiting for the hardware to be delivered.
- We can easily try new cloud accounts but bringing together Large Datasets, Big Compute, Analytics Tools, and Orchestration workflows is a complex effort.
- We have built on-premise systems for research with Slurm, Singularity Containers, Cromwell/Netflow, custom pipelines and do not have the bandwidth to migrate to the cloud with updated tools and architecture.
- We want to provide researchers the ability to have their ephemeral research tools and environments with budget controls but do not know how to leverage the cloud.
- We are scaling up online classrooms and training labs for a large set of students but do not know how to build secure and cost-effective self-service environments like on-premise training labs.
- We are requiring a data portal for sharing research data across multiple institutions with the right governance and controls on the cloud.
- We need an ability to run Genomics Secondary Analysis for multiple domains like Bacterial research and Precision Medicines at scale with cost-effective per sample runs without worrying about tools, infrastructure, software, and ongoing support.
Keeping the above common needs in perspective, Research Gateway is solving the problems for the following key customer segments:
- Education Universities
- Primarily for Data Analytics and Research
- Healthcare Providers
- Hospitals and Academic Medical Centers for Genomics Research
- Drug Discovery Companies
- Especially in Genomics, Precision Medicine, and Vaccine companies
- Not-for-Profit Companies
- Primarily across health, education, and policy research
- Public Sector Companies
- Looking into Food Safety, National Supercomputing centers, etc.
The primary solutions these customers are seeking from Research Gateway have been mentioned below:
- Analytics Workbench with tools like RStudio and Sagemaker
- Bioinformatics Containers and Tools from the standard catalog and bring your own tools
- Genomics Secondary Analysis in Cloud with 1-Click models using open source orchestration engines like Nextflow, Cromwell and specialized tools like DRAGEN, Parabricks, and Sentieon
- Virtual Training Labs in Cloud
- High Performance Computing Infrastructure with specialized tools and large datasets
- Research and Collaboration Portal
- Training and Learning Quantum Computing
The figure below shows the customer segments and their top use cases.
How Research Gateway is Powering Frictionless Outcomes?
Research Gateway allows researchers to conduct just-in-time research with 1-click access to research-specific products, provision pipelines in a few steps, and take control of the budget. This helps in the acceleration of discoveries and enables a modern study environment with projects and virtual classrooms.
Case Study 1: Accelerating Virtual Cloud Labs for the Bioinformatics Department of Singapore-based Higher Education University
During interaction with the university, the following needs were highlighted to the RL team by the university’s bioinformatics department:
Classroom Needs: Primary use case to enable Student Classrooms and Groups for learning Analytics, Genomics Workloads, and Docker-based tools
Research Needs: Used by a small group of researchers pursuing higher degrees in Bioinformatics space
Addressing the Virtual Classroom and Research Needs with Research Gateway
The SaaS model of Research Gateway is used with a hub-and-spoke architecture that allows customers to configure their own AWS accounts for projects to control users, products, and budgets seamlessly.
The primary solution includes:
- Professors set up classrooms and assign students for projects based on semester needs
- Usage of basic tools like RStudio, EC2 with Docker, MySQL, Sagemaker
- Special ask of forwarding and connecting port to shared data on cloud-based for local RStudio IDE was also successfully put to use
- End-of-day automated reports to students and professors on server “still running” for cost optimization
- Ability to create multiple projects in a single AWS Account + Region for flexibility
- Ability to assign and enforce student-level budget controls to avoid overspending
Case Study 2: Driving Genomics Processing for Cancer Research of an Australian Academic Medical Center
While the existing research infrastructure is for on-premise setup due to security and privacy needs, the team is facing serious challenges with growing data and the influx of new genomics samples to be processed at scale. A team of researchers is taking the lead in evaluating AWS Cloud to solve the issues related to scale and drive faster research in the cloud with in-build security and governance guardrails.
Addressing Genomic Research Cloud Needs with Research Gateway
RL addressed the genomics workload migration needs of the hospital with the Research Gateway SaaS model using the hub-and-spoke architecture that allows the customer to have exclusive access to their data and research infrastructure by bringing their one AWS account. Also, the deployment of the software is in the Sydney region, complying with in-country data norms as per governance standards. Users can easily configure AWS accounts for genomics workload projects. They also get access to genomic research-related products in 1-click along with seamless budget tracking and pausing.
The following primary solution patterns were delivered:
- Migration of existing HPC system using Slurm Workload Manager and Singularity Containers
- Using Cromwell for Large Scale Genomic Samples Processing
- Using complex pipelines with a mix of custom and public WDL pipelines like RNA-Seq
- Large Sample and Reference Datasets
- AWS Batch HPC leveraged for cost-effective and scalable computing
- Specific Data and Security needs met with country-level data safeguards & compliance
- Large set of custom tools and packages
The workload currently operates in an HPC environment on-premise, using slurm as the orchestrator and singularity containers. This involves converting singularity containers to docker containers so that they can be used with AWS Batch. The pipelines are based on Cromwell, which is one of the leading workflow orchestrator software available from the Broad Institute. The following picture shows the existing on-premise system and contrasts that with the target cloud-based system.
Case Study 3: Secure Research Environments for US based Academic Medical Centre
Secure Research Environments (SRE) provide researchers with timely and secure access to sensitive research data, computation systems, and common analytics tools for speeding up Scientific Research in the cloud. Researchers are given access to approved data, enabling them to collaborate, analyze data, share results within proper controls and audit trails. Research Gateway provides this secure data platform with analytical and orchestration tools to support researchers in conducting their work. Their results can then be exported safely, with proper workflows for submission reviews and approvals.
Addressing Secure Research Needs for Senstive Data with Ingress/Egress Controls
RL addressed the SRE needs for a US based Academic Medical Centre with HIPAA Compliant research for Health Sciences group. There are the following key building blocks for the solution:
- Data Ingress/Egress
- Researcher Workflows & Collaborations with costs controls
- On-going Researcher Tools Updates
- Software Patching & Security Upgrades
- Healthcare (or other sensitive) Data Compliances
- Security Monitoring, Audit Trail, Budget Controls, User Access & Management
The figure below shows implementation of SRE solution with Research Gateway.
Conclusion
Relevance Lab, in partnership with public cloud providers, is driving frictionless outcomes by enabling secure and scalable research leveraging Research Gateway for various use-cases. By simplifying the setting up and running research workloads in a seamless manner in just 30 minutes with self-service access and cost control, the solution enables the creation of internal virtual labs, acceleration of complex genomic workloads and solving the needs of Secure Research Environments with Ingress/Egress controls.