Your address will show here +12 34 56 78
HPC Blog, 2022 Blog, Blog, Featured, Feature Blog

Modern scientific research depends heavily on processing massive amounts of data which requires elastic, scalable, easy-to-use, and cost-effective computing resources. AWS Cloud provides such resources, but researchers still find it hard to navigate the AWS console. RLCatalyst Research Gateway simplifies access to HPC clusters using a self-service portal that takes care of all the nuts and bolts to provision an elastic cluster based on AWS ParallelCluster 3.0 within minutes. Researchers can leverage this for their scientific computing.

Relevance Lab has been collaborating with AWS Partnership teams over the last one year to simplify access to High Performance Computing across different fields like Genomics Analysis, Computational Fluid Dynamics, Molecular Biology, Earth Sciences, etc.

There is a growing need from customers to adopt the High Performance Computing capabilities in the public cloud. However this throws in key challenges related to right architecture, workload migration and cost management. Working closely with AWS HPC groups we have been enabling adoption of AWS HPC solutions with early adopters in Genomics and Fluid Dynamics with Higher Education and Healthcare customers. The primary ask is for a self-service Portal for planning, deploying and managing HPC workloads with security, cost management and automation. The figure below shows the key building blocks of HPC Architecture part of our solution.


AWS ParallelCluster 3.0
AWS ParallelCluster is an open source cluster management tool written using Python and is available via the standard python package index (PyPI). Version 3.0 also provides support for APIs and Research Gateway leverages this to integrate with the AWS Cloud to set up and use the HPC cluster for complex computational tasks. AWS ParallelCluster supports two different orchestrators, AWS Batch and Slurm, which cover a vast majority of the requirements in the field. ParallelCluster brings many benefits including easy scalability, manageability of clusters, and seamless migration to the cloud from on-premise HPC workloads.

FSx for Lustre
Amazon FSx for Lustre provides fully managed shared storage with the scalability and performance of the popular Lustre file system. This storage can be accessed with very low (sub-millisecond) latencies by the worker nodes in the HPC cluster and provides very high throughput.

NICE DCV
NICE DCV is a high performance remote display protocol used to deliver remote desktops and application streaming from resources in the cloud to any device. Users can leverage this for their visualization requirements.

Research Gateway Provides a Self-Service Portal for AWS PCluster 3.0 Launch with Automatic Cost Tracking
Using RLCatalyst Research Gateway, research teams are organized into projects with their own catalog of self-service workspaces that researchers can provision easily with minimum knowledge of AWS cloud setup. The standard catalog, included with RLCatalyst Research Gateway, now has a new item called PCluster which a Principal Investigator can add to the project catalog to make it available to their team. This product is based on AWS ParallelCluster 3.0 which is a command line tool that advanced users can work with. Research Gateway has wrapped this tool with an intuitive user interface.

To see how you can set up an HPC cluster within minutes, check this video.

The figure below shows a standard catalog inside Research Gateway for users to provision PCluster and FSx for Lustre with ease.


Setting Up a Shared Cluster for Use in the Project
The PCluster product on Research Gateway offers a lot of flexibility. While researchers can set up and use their own clusters, sometimes there is a need to use a shared cluster across collaborators within the same project. Towards this goal, we have also brought in a feature that allows a user to “share” the cluster with the entire project team. The other users can then connect to the same cluster and submit jobs. For example a Principal Investigator might set up the cluster and share it with the researchers in the project to use for their computations.


Large Datasets Storage and Access to Open Datasets
AWS cloud is leveraged to deal with the needs of large datasets for storage, processing, and analytics using the following key products.

Amazon S3 for high-throughput data ingestion, cost-effective storage options, secure access, and efficient searching.

AWS Datasync for secure, online service that automates and accelerates moving data between on-premises and AWS storage services.

AWS Open Datasets program houses openly available, with 200+ open data repositories.

Cost Analysis of Jobs
Research Gateway injects cost allocation tags into the ParallelCluster so that all resources created are tagged and the cost of the scalable cluster can easily be monitored from the Research Gateway UI.


Summary
AWS Cloud provides services like AWS ParallelCluster and FSx for Lustre that can help users with High Performance Computing for their scientific computing needs. Research Gateway makes it easy to provision these services with a 1-Click, self-service model and provides cost and governance to help manage your budget.

To know more about how you can start your HPC needs in the AWS cloud in 30 minutes using our solution at https://research.rlcatalyst.com, feel free to contact marketing@relevancelab.com

References
Build Your Own Supercomputers in AWS Cloud with Ease – Research Gateway Allows Cost, Governance and Self-service with HPC and Quantum Computing
Leveraging AWS HPC for Accelerating Scientific Research on Cloud
Accelerating Genomics and High Performance Computing on AWS with Relevance Lab Research Gateway Solution



0

HPC Blog, 2022 Blog, Blog, Featured

While there are a lot of talks about Digital Innovation leveraging the cloud, another key disruption in the industry is Applied Science Innovation, led by Scientists and Engineers targeting a broad range of disciplines in Engineering and Medicine. Relevance Lab is proud to now ease the leverage of power tools like High-Performance Computing (HPC) and Quantum Computing on AWS Cloud for such pursuits with our Research Gateway product.

What is Applied Science?
Applied Science uses existing scientific knowledge to solve day-to-day problems in areas like Health Care, Space, Environment, Transportation, etc. It leverages the power of new technologies such as Big Compute and Cloud to drive faster scientific research. Innovation in Applied Science has some unique differences compared to Digital Innovation:


  • Users of Applied Science are researchers, scientists, and engineers
  • Workloads for Applied Science are driven by more specialized systems and domain-specific algorithms & orchestration needs
  • Very large domain-specific data sets and collaboration with a large ecosystem of global communities is a key enabler with a focus on open-source and knowledge sharing
  • Use of specialized hardware and software is also a key enabler

The term Big Compute is used to describe large-scale workloads that require multiple cores (with specialized CPU and GPU types) working with very high-speed network and storage architectures. Such Big Compute architectures solve the problems in image processing, fluid dynamics, financial risk modeling, oil exploration, drug design, etc.

Relevance Lab is working closely with AWS in pursuing specialized use cases for Applied Science and Scientific Research using Cloud. A number of government, public and private sector organizations are focussing large amounts of investment and scientific knowledge on driving innovation in these areas. A few specialized ones with well-known programs are listed below.


What is High Performance Computing?
Supercomputers of the past were very specialized and high-cost systems that could only be built and afforded by large and well-funded institutions. Cloud computing is driving the democratization of supercomputers by providing High Performance Computing (HPC) systems that have specialized architectures. It combines the power of on-demand computing with large & specialized CPU/GPU types, high-speed networking, fast access storage, and associated tools & utilities for workload orchestration and management. The figure below shows the key building blocks of HPC components of AWS Cloud.


What is Quantum Computing?
Quantum computing relies upon quantum theory, which deals with physical phenomena at the nano-scale. One of the most important aspects of quantum computing is the quantum bit (Qubit), a unit of quantum information that exists in two states (horizontal and vertical polarization) at the same time, thanks to the superposition principle of quantum physics.

The Amazon Braket quantum computing service helps researchers and developers use quantum computers and simulators to build quantum algorithms on AWS.


Key Use Cases:

  • Research quantum computing algorithms
  • Test different quantum hardware
  • Build quantum software faster
  • Explore industry applications

What Do Customers Want?
The availability of specialized services like HPC and Quantum Computing has made it extremely simple for customers to be able to consume these advanced technologies and build their own supercomputers. However, when it comes to the adoption cycle, customers are hesitant to adopt the same due to key concerns and asks, as summarized below:

Operational Asks:

  • The top challenge and fear on the cloud is the variable cost model, which can throw a big surprise, and customers want strong Cost Management & Tracking with auto-limits control
  • Security and data governance are also key priorities
  • Data transfer and management are the other key needs

Functional Asks:
  • Faster and easier design, provisioning, and development cycles
  • Integrated and automated tools for deployment and monitoring
  • Easy access to data and the ability to do self-service
  • Derive increased business value from Data Analytics and Machine Learning

How Does Research Gateway Solve Customer Needs?
AWS cloud offerings provide a strong platform for HPC and quantum computing requirements. However, enabling Scientific Research and Training of Researchers requires an ability to offer these with a Self-Service Portal that encapsulates the underlying complexity. On top of proper cost tracking and controlling, security, data management, and an integrated workbench are needed for a collaborative research environment.

To address the above needs, Relevance Lab has developed Research Gateway. It helps scientists accelerate their research on the AWS cloud with access to research tools, data sets, processing pipelines, and analytics workbenches in a frictionless manner. The solution also addresses the need for tight control on a budget, data security, privacy, and regulatory compliances, which it meets while significantly simplifying the process of running complex scientific research workloads.

Research Gateway meets the following key dimensions of collaborative and secure scientific research:

  • Cost and Budget Governance: The solution offers easy control over Cost Tracking of Research Cloud resources to track, analyze, control, and optimize budget spending. Principal Investigators can also pause or stop the budget if it exceeds the set threshold.
  • Research Data & Tools for Easy Collaboration: Research Gateway provides the team of researchers real-time view of research-specific product catalog, cost, and governance, reducing the complexities of running scientific research on the cloud.
  • Security and Compliance: Principal investigators have a unified view and control over security and compliance, covering Identity management, data privacy, audit trails, encryption, and access management.

Principal investigators leading the research get a quick insight into the total budget, consumed budget, and available budget, along with the available research-specific products, as shown in the image below.

With Research Gateway, researchers can provision available research-specific products for their high-performance and quantum computing needs in just 1-click, launching scientific research as quickly as 30 minutes or less.


Summary
High Performance Computing and Quantum computing are essential to the advancement of science and engineering now more than ever. Research Gateway provides fundamental building blocks for Applied Science and Scientific Research in the AWS cloud by simplifying the availability of HPC and Quantum computing for customers. The solution helps create democratized supercomputers on-demand while eliminating the pain of managing infrastructure, data, security, and costs, enabling researchers to focus on science.

To know more about how you can high-performance and quantum computing with just 1-click and launch your research in 30 minutes using our solution at https://research.rlcatalyst.com, feel free to contact marketing@relevancelab.com

References
High-performance genetic datastore on AWS S3 using Parquet and Arrow
Parallelizing Genome Variant Analysis
Leveraging AWS HPC for Accelerating Scientific Research on Cloud
Genomics Cloud on AWS with RLCatalyst Research Gateway
Enabling Frictionless Scientific Research in the Cloud with a 30 Minutes Countdown Now!
Accelerating Genomics and High Performance Computing on AWS with Relevance Lab Research Gateway Solution



0

HPC Blog, 2021 Blog, Blog, Featured

AWS provides a comprehensive, elastic, and scalable cloud infrastructure to run your HPC applications. Working with AWS in exploring HPC for driving Scientific Research, Relevance Lab leveraged their RLCatalyst Research Gateway product to provision an HPC Cluster using AWS Service Catalog with simple steps to launch a new environment for research. This blog captures the steps used to launch a simple HPC 1.0 cluster on AWS and roadmap to extend the functionality to cover more advanced use cases of HPC Parallel Cluster.

AWS delivers an integrated suite of services that provides everything needed to build and manage HPC clusters in the cloud. These clusters are deployed over various industry verticals to run the most compute-intensive workloads. AWS has a wide range of HPC applications spanning from traditional applications such as genomics, computational chemistry, financial risk modeling, computer-aided engineering, weather prediction, and seismic imaging to new applications such as machine learning, deep learning, and autonomous driving. In the US alone, multiple organizations across different specializations are choosing cloud to collaborate for scientific research.


Similar programs exist across different geographies and institutions across EU, Asia, and country-specific programs for Public Sector programs. Our focus is to work with AWS and regional scientific institutions in bringing the power of Supercomputers for day-to-day researchers in a cost-effective manner with proper governance and tracking. Also, with Self-Service models, the shift needs to happen from worrying about computation to focus on Data, workflows, and analytics that requires a new paradigm of considering prospects of serverless scientific computing that we cover in later sections.

Relevance Lab RLCatalyst Research Gateway provides a Self-Service Cloud portal to provision AWS products with a 1-Click model based on AWS Service Catalog. While dealing with more complex AWS Products like HPC there is a need to have a multi-step provisioning model and post provisioning actions that are not always possible using standard AWS APIs. In these situations requiring complex orchestration and post provisioning automation RLCatalyst BOTs provide a flexible and scalable solution to complement based Research Gateway features.

Building blocks of HPC on AWS
AWS offers various services that make it easy to set up an HPC setup.


An HPC solution in AWS uses the following components as building blocks.

  • EC2 instances are used for Master and Worker nodes. The master nodes can use On-Demand instances and the worker nodes can use a combination of On-Demand and Spot Instances.
  • The software for the manager nodes is built as an AMI and used for the creation of Master nodes.
  • The agent software for the managers to communicate with the worker nodes is built into a second AMI that is then used for provisioning the Worker nodes.
  • Data is shared between different nodes using a file-sharing mechanism like FSx Lustre.
  • Long-term storage uses AWS S3.
  • Scaling of nodes is done via Auto-scaling.
  • KMS for encrypting and decrypting the keys.
  • Directory services to create the domain name for using HPC via UI.
  • Lambda function service to create user directory.
  • Elastic Load Balancing is used to distribute incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, Lambda functions, and virtual appliances.
  • Amazon EFS is used for regional service storing data within and across multiple Availability Zones (AZs) for high availability and durability. Amazon EC2 instances can access your file system across AZs.
  • AWS VPC to launch the EC2 instances in private cloud.

Evolution of HPC on AWS
  • HPC clusters first came into existence in AWS using the CfnCluster Cloud Formation template. It creates a number of Manager and Worker nodes in the cluster based on the input parameters. This product can be made available through AWS Service Catalog and is an item that can be provisioned from the RLCatalyst Research Gateway. The cluster manager software like Slurm, Torque, or SGE is pre-installed on the manager nodes and the agent software is pre-installed on the worker nodes. Also pre-installed is software that can provide a UI (like Nice EngineFrame) for the user to submit jobs to the cluster manager.
  • AWS Parallel Cluster is a newer offering from AWS for provisioning an HPC cluster. This service provides an open-source, CLI-based option for setting up a cluster. It sets up the manager and worker nodes and also installs controlling software that can watch the job queues and trigger scaling requests on the AWS side so that the overall cluster can grow or shrink based on the size of the queue of jobs.

Steps to Launch HPC from RLCatalyst Research Gateway
A standard HPC launch involves the following steps.

  • Provide the input parameters for the cluster. This will include
    • The compute instance size for the master node (vCPUs, RAM, Disk)
    • The compute instance size for the worker nodes (vCPUs, RAM, Disk)
    • The minimum and maximum number of worker nodes.
    • Select the workload manager software (Slurm, Torque, SGE)
    • Connectivity options (SSH keys etc.)
  • Launch the product.
  • Once the product is in Active state, connect to the URL in the Output parameters on the Product Details page. This connects you to the UI from where you can submit jobs to the cluster.
  • You can SSH into the master nodes using the key pair selected in the Input form.

RLCatalyst Research Gateway uses the CfnCluster method to create an HPC cluster. This allows the HPC cluster to be created just like any other products in our Research Gateway catalog items. Though this provisioning may take upto 45 minutes to complete, it creates an URL in the outputs which we can use to submit the jobs through the URL.

Advanced Use Cases for HPC

  • Computational Fluid Dynamics
  • Risk Management & Portfolio Optimization
  • Autonomous Vehicles – Driving Simulation
  • Research and Technical Computing on AWS
  • Cromwell on AWS
  • Genomics on AWS

We have specifically looked at the use case that pertains to BioInformatics where a lot of the research uses Cromwell server to process workflows defined using the WDL language. The Cromwell server acts as a manager that controls the worker nodes, which execute the tasks in the workflow. A typical Cromwell setup in AWS can use AWS Batch as the backend to scale the cluster up and down and execute containerized tasks on EC2 instances (on-demand or spot).



Prospect of Serverless Scientific Computing and HPC
“Function As A Service” Paradigm for HPC and Workflows for Scientific Research with the advent of serverless computing and its availability on all major computing platforms, it is now possible to take the computing that would be done on a High Performance Cluster and run it as lambda functions. The obvious advantage to this model is that this virtual cluster is highly elastic, and charged only for the exact execution time of each lambda function executed.

One of the limitations of this model currently is that only a few run-times are currently supported like Node.js and Python while a lot of the scientific computing code might be using additional run-times like C, C++, Java etc. However, this is fast changing and cloud providers are introducing new run-times like Go and Rust.


Summary
Scientific computing can take advantage of cloud computing to speed up research, scale-up computing needs almost instantaneously and do all this with much better cost efficiency. Researchers no longer worry about the expertise required to set up the infrastructure in AWS as they can leave this to tools like RLCatalyst Research Gateway, thus compressing the time it takes to complete their research computing tasks.

To learn more about this solution or participate in using the same for your internal needs feel free to contact marketing@relevancelab.com

References
Getting started with HPC on AWS
HPC on AWS Whitepaper
AWS HPC Workshops
Genomics in the Cloud
Serverless Supercomputing: High Performance Function as a Service for Science
FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC



0