Your address will show here +12 34 56 78
2021 Blog, Blog, Featured, Research Gateway

Working on non-scientific tasks such as setting up instances, installing software libraries, making model compile, and preparing input data are some of the biggest pain points for atmospheric scientists or any scientist for that matter. It’s challenging for scientists as it requires them to have strong technical skills deviating them from their core areas of analysis & research data compilation. Further adding to this, some of these tasks require high-performance computation, complicated software, and large data. Lastly, researchers need a real-time view of their actual spending as research projects are often budget-bound. Relevance Lab help researchers “focus on science and not servers” in partnership with AWS leveraging the RLCatalyst Research Gateway (RG) product.

Why RLCatalyst Research Gateway?
Speeding up scientific research using AWS cloud is a growing trend towards achieving “Research as a Service”. However, the adoption of AWS Cloud can be challenging for Researchers with surprises on costs, security, governance, and right architectures. Similarly, Principal Investigators can have a challenging time managing the research program with collaboration, tracking, and control. Research Institutions will like to provide consistent and secure environments, standard approved products, and proper governance controls. The product was created to solve these common needs of Researchers, Principal Investigator and Research Institutions.


  • Available on AWS Marketplace and can be consumed in both SaaS as well as Enterprise mode
  • Provides a Self-Service Cloud Portal with the ability to manage the provisioning lifecycle of common research assets
  • Gives a real time visibility of the spend against the defined project budgets
  • The principal investigator has the ability to pause or stop the project in case the budget is exceeded till the new grant is approved

In this blog, we explain how the product has been used to solve a common research problem of GEOS-Chem used for Earth Sciences. It covers a simple process that starts with access to large data sets on public S3 buckets, creation of an on-demand compute instance with the application loaded, copying the latest data for analysis, running the analysis, storing the output data, analyzing the same using specialized AI/ML tools and then deleting the instances. This is a common scenario faced by researchers daily, and the product demonstrates a simple Self-Service frictionless capability to achieve this with tight controls on cost and compliance.

GEOS-Chem enables simulations of atmospheric composition on local to global scales. It can be used off-line as a 3-D chemical transport model driven by assimilated meteorological observations from the Goddard Earth Observing System (GEOS) of the NASA Global Modeling Assimilation Office (GMAO). The figure below shows the basic construct on GEOS-Chem input and output analysis.



Being a common use case, there is documentation available in the public domain by researchers on how to run GEOS-Chem on AWS Cloud. The product makes the process simpler using a Self-Service Cloud portal. To know more about similar use cases and advanced computing options, refer to AWS HPC for Scientific Research.



Steps for GEOS-Chem Research Workflow on AWS Cloud
Prerequisites for researcher before starting data analysis.

  • A valid AWS account and an access to the RG portal
  • A publicly accessible S3 bucket with large Research Data sets accessible
  • Create an additional EBS volume for your ongoing operational research work. (For occasional usage, it is recommended to upload the snapshot in S3 for better cost management.)
  • A pre-provisioned SageMaker Jupyter notebook to analyze output data

Once done, below are the steps to execute this use case.

  • Login to the RG Portal and select the GEOS-Chem project
  • Launch an EC2 instance with GEOS-Chem AMI
  • Login to EC2 using SSH and configure AWS CLI
  • Connect to a public S3 bucket from AWS CLI to list NASA-NEX data
  • Run the simulation and copy the output data to a local S3 bucket
  • Link the local S3 bucket to AWS SageMaker instance and launch a Jupyter notebook for analysis of the output data
  • Once done, terminate the EC2 instance and check for the cost spent on the use case
  • All costs related to GEOS-Chem project and researcher consumption are tracked automatically

Sample Output Analysis
Once you run the output files on the Jupyter notebook, it does the compilation and provides output data in a visual format, as shown in the sample below. The researcher can then create a snapshot and upload it to S3 and terminate the EC2 instance (without deleting the additional EBS volume created along with EC2).

Output to analyze loss rate and Air mass of Hydroxide pertaining to Atmospheric Science.


Summary
Scientific computing can take advantage of cloud computing to speed up research, scale-up computing needs almost instantaneously, and do all this with much better cost-efficiency. Researchers no longer need to worry about the expertise required to set up the infrastructure in AWS as they can leave this to tools like RLCatalyst Research Gateway, thus compressing the time it takes to complete their research computing tasks.

The steps demonstrated in this blog can be easily replicated for similar other research domains. Also, it can be used to onboard new researchers with pre-built solution stacks provided in an easy to consume option. RLCatalyst Research Gateway is available in SaaS mode from AWS Marketplace and research institutions can continue to use their existing AWS account to configure and enable the solution for more effective Scientific Research governance.

To learn more about GEOS-Chem use cases, click here.

If you want to learn more about the product or book a live demo, feel free to contact marketing@relevancelab.com.

References
Enabling Immediate Access to Earth Science Models through Cloud Computing: Application to the GEOS-Chem Model
Enabling High‐Performance Cloud Computing for Earth Science Modeling on Over a Thousand Cores: Application to the GEOS‐Chem Atmospheric Chemistry Model



0

2021 Blog, Blog, Feature Blog, Featured, Research Gateway

AWS provides a comprehensive, elastic, and scalable cloud infrastructure to run your HPC applications. Working with AWS in exploring HPC for driving Scientific Research, Relevance Lab leveraged their RLCatalyst Research Gateway product to provision an HPC Cluster using AWS Service Catalog with simple steps to launch a new environment for research. This blog captures the steps used to launch a simple HPC 1.0 cluster on AWS and roadmap to extend the functionality to cover more advanced use cases of HPC Parallel Cluster.

AWS delivers an integrated suite of services that provides everything needed to build and manage HPC clusters in the cloud. These clusters are deployed over various industry verticals to run the most compute-intensive workloads. AWS has a wide range of HPC applications spanning from traditional applications such as genomics, computational chemistry, financial risk modeling, computer-aided engineering, weather prediction, and seismic imaging to new applications such as machine learning, deep learning, and autonomous driving. In the US alone, multiple organizations across different specializations are choosing cloud to collaborate for scientific research.


Similar programs exist across different geographies and institutions across EU, Asia, and country-specific programs for Public Sector programs. Our focus is to work with AWS and regional scientific institutions in bringing the power of Supercomputers for day-to-day researchers in a cost-effective manner with proper governance and tracking. Also, with Self-Service models, the shift needs to happen from worrying about computation to focus on Data, workflows, and analytics that requires a new paradigm of considering prospects of serverless scientific computing that we cover in later sections.

Relevance Lab RLCatalyst Research Gateway provides a Self-Service Cloud portal to provision AWS products with a 1-Click model based on AWS Service Catalog. While dealing with more complex AWS Products like HPC there is a need to have a multi-step provisioning model and post provisioning actions that are not always possible using standard AWS APIs. In these situations requiring complex orchestration and post provisioning automation RLCatalyst BOTs provide a flexible and scalable solution to complement based Research Gateway features.

Building blocks of HPC on AWS
AWS offers various services that make it easy to set up an HPC setup.


An HPC solution in AWS uses the following components as building blocks.

  • EC2 instances are used for Master and Worker nodes. The master nodes can use On-Demand instances and the worker nodes can use a combination of On-Demand and Spot Instances.
  • The software for the manager nodes is built as an AMI and used for the creation of Master nodes.
  • The agent software for the managers to communicate with the worker nodes is built into a second AMI that is then used for provisioning the Worker nodes.
  • Data is shared between different nodes using a file-sharing mechanism like FSx Lustre.
  • Long-term storage uses AWS S3.
  • Scaling of nodes is done via Auto-scaling.
  • KMS for encrypting and decrypting the keys.
  • Directory services to create the domain name for using HPC via UI.
  • Lambda function service to create user directory.
  • Elastic Load Balancing is used to distribute incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, Lambda functions, and virtual appliances.
  • Amazon EFS is used for regional service storing data within and across multiple Availability Zones (AZs) for high availability and durability. Amazon EC2 instances can access your file system across AZs.
  • AWS VPC to launch the EC2 instances in private cloud.

Evolution of HPC on AWS
  • HPC clusters first came into existence in AWS using the CfnCluster Cloud Formation template. It creates a number of Manager and Worker nodes in the cluster based on the input parameters. This product can be made available through AWS Service Catalog and is an item that can be provisioned from the RLCatalyst Research Gateway. The cluster manager software like Slurm, Torque, or SGE is pre-installed on the manager nodes and the agent software is pre-installed on the worker nodes. Also pre-installed is software that can provide a UI (like Nice EngineFrame) for the user to submit jobs to the cluster manager.
  • AWS Parallel Cluster is a newer offering from AWS for provisioning an HPC cluster. This service provides an open-source, CLI-based option for setting up a cluster. It sets up the manager and worker nodes and also installs controlling software that can watch the job queues and trigger scaling requests on the AWS side so that the overall cluster can grow or shrink based on the size of the queue of jobs.

Steps to Launch HPC from RLCatalyst Research Gateway
A standard HPC launch involves the following steps.

  • Provide the input parameters for the cluster. This will include
    • The compute instance size for the master node (vCPUs, RAM, Disk)
    • The compute instance size for the worker nodes (vCPUs, RAM, Disk)
    • The minimum and maximum number of worker nodes.
    • Select the workload manager software (Slurm, Torque, SGE)
    • Connectivity options (SSH keys etc.)
  • Launch the product.
  • Once the product is in Active state, connect to the URL in the Output parameters on the Product Details page. This connects you to the UI from where you can submit jobs to the cluster.
  • You can SSH into the master nodes using the key pair selected in the Input form.

RLCatalyst Research Gateway uses the CfnCluster method to create an HPC cluster. This allows the HPC cluster to be created just like any other products in our Research Gateway catalog items. Though this provisioning may take upto 45 minutes to complete, it creates an URL in the outputs which we can use to submit the jobs through the URL.

Advanced Use Cases for HPC

  • Computational Fluid Dynamics
  • Risk Management & Portfolio Optimization
  • Autonomous Vehicles – Driving Simulation
  • Research and Technical Computing on AWS
  • Cromwell on AWS
  • Genomics on AWS

We have specifically looked at the use case that pertains to BioInformatics where a lot of the research uses Cromwell server to process workflows defined using the WDL language. The Cromwell server acts as a manager that controls the worker nodes, which execute the tasks in the workflow. A typical Cromwell setup in AWS can use AWS Batch as the backend to scale the cluster up and down and execute containerized tasks on EC2 instances (on-demand or spot).



Prospect of Serverless Scientific Computing and HPC
“Function As A Service” Paradigm for HPC and Workflows for Scientific Research with the advent of serverless computing and its availability on all major computing platforms, it is now possible to take the computing that would be done on a High Performance Cluster and run it as lambda functions. The obvious advantage to this model is that this virtual cluster is highly elastic, and charged only for the exact execution time of each lambda function executed.

One of the limitations of this model currently is that only a few run-times are currently supported like Node.js and Python while a lot of the scientific computing code might be using additional run-times like C, C++, Java etc. However, this is fast changing and cloud providers are introducing new run-times like Go and Rust.


Summary
Scientific computing can take advantage of cloud computing to speed up research, scale-up computing needs almost instantaneously and do all this with much better cost efficiency. Researchers no longer worry about the expertise required to set up the infrastructure in AWS as they can leave this to tools like RLCatalyst Research Gateway, thus compressing the time it takes to complete their research computing tasks.

To learn more about this solution or participate in using the same for your internal needs feel free to contact marketing@relevancelab.com

References
Getting started with HPC on AWS
HPC on AWS Whitepaper
AWS HPC Workshops
Genomics in the Cloud
Serverless Supercomputing: High Performance Function as a Service for Science
FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC



0

2020 Blog, AWS Platform, Blog, Featured, Research Gateway

While there is rapid momentum for every enterprise in the world in consuming more Cloud Assets and Services, there is still lack of maturity in adopting an “Automation-First” approach to establish Self-Service models for Cloud consumptions due to fear of uncontrolled costs, security & governance risks and lack of standardized Service Catalogs of pre-approved Assets & Service Requests from Central IT groups. Lack of delegation and self-service has a direct impact on speed of innovation and productivity with higher operations costs.

Working closely with AWS Partnership we have now created a flexible platform for driving faster adoption of Self-Service Cloud Portals. The primary needs for such a Self-Service Cloud Portal are the following.

  • Adherence to Enterprise IT Standards
    • Common architecture
    • Governance and Cost Management
    • Deployment and license management
    • Identity and access management
  • Common Integration Architecture with existing platforms on ITSM and Cloud
    • Support for ServiceNow, Jira, Freshservice and Standard Cloud platforms like AWS
  • Ability to add specific custom functionality in the context of Enterprise Business needs
    • The flexibility to add business specific functionality is key to unlocking the power of self-service models outside the standard interfaces already provided by ITSM and Cloud platforms

A common way of identifying the need for a Self-Service Cloud portal is based on following needs.

  • Does your enterprise already have any Self-Service Portals?
  • Do you have a large user base internally or with external users requiring access to Cloud resources?
  • Does your internal IT have the bandwidth and expertise to manage current workloads without impacting end user response time expectations?
  • Does your enterprise have a proper security governance model for Cloud management?
  • Are there significant productivity gains by empowering end users with Self-Service models?

Working with AWS partnership and with our existing customer we see a growing need for Self-Service Cloud Portals in 2020 predominantly centred around two models.

  • Enterprises with existing ITSM investments and need to leverage that for extending to Cloud Management
  • Enterprises extending needs outside enterprise users with custom Cloud Portals

The roadmap to Self-Service Cloud portals is specific to every enterprise needs and needs to leverage the existing adoption and maturity of Cloud and ITSM platforms as explained below. With Relevance Lab RLCatalyst products we help enterprises achieve the maturity in a cost effective and expedited manner.


Examples of Self-Service Cloud Portals



Standard Needs Platform Benefits
Look-n-Feel of Modern Self-Service Portals Professional and responsive UI Design with multiple themes available, customizations allowed
Standards based Architecture & Governance Tightly Built On AWS products and AWS Well Architected with pre-built Reference Architecture based Products
Pre-built Minimum Viable Product Needs 80-20 Model – Pre-built vs Customizations based on key components of core functionality
Proprietary vs Open Source? Open-source foundation with source code made available built on MEAN Stack
Access Control, Security and Governance Standard Options Pre-built, easy extensions (SAML Based). Deployed with enterprise grade security and compliances
Rich Standard Pre-Build Catalog of Assets and Services Comes pre-built with 100+ catalog items covering all standard Asset and Services needs catering to 50% of any enterprise infrastructure, applications and service delivery needs


Explained below is a sample AWS Self-Service Cloud for driving Scientific Research.



Getting started
To make is easier for enterprises for experiencing the power of Self-Service Cloud Portals we are offering two options based on enterprise needs.

  • Hosted SAAS offering of using our Multi-tenant Cloud Portal with ability to connect to your existing Cloud Accounts and Service Catalogs
  • Self-Hosted RLCatalyst Cloud Portal product with option to engage us for professional services on customizations, training, initial setup & onboarding needs

Pricing for the SAAS offering is based on user based monthly subscription while for self-hosting model an enterprise support model pricing is available for the open source solution that allows enterprises the flexibility to use this solution without proprietary lock-ins.

The typical steps to get started are very simple covering the following.

  • Setup an organization and business units or projects aligned with your Cloud Accounts for easy billing and access control tracking
  • Setup users and roles
  • Setup Budgets and controls
  • Setup standard catalog of items for users to order
  • With the above enterprises are up to speed to use Self-Service Cloud Portals in less than 1-Day with inbuilt controls for tracking and compliance

Summary
Cloud Portals for Self-Service is a growing need in 2020 and we see the momentum continuing for next year as well. Different market segments have different needs for Self-Service Cloud portals as explained in this Blog.


  • Scientific Research community is interested in a Research Gateway Solution
  • University IT looks for a University in a Box Self Service Cloud
  • Enterprises using ServiceNow want to extend the internal Self Service Portals
  • Enterprises are also developing Hybrid Cloud Orchestration Portals
  • Enterprises looking at building AIOps Portal needs monitoring, automation and service management
  • Enabling Virtual Training Labs with User and Workspace onboarding
  • Building an integrated Command Centre requires an Intelligent Monitoring portal
  • Enterprise Intelligent Automation Portal with ServiceNow Connector

We provide pre-build solutions for Self-Service Cloud Portals and a base platform that can be easily extended to add new functionality for customization and integration. A number of large enterprises and universities are leveraging our Self Service Cloud portal solutions using both existing ITSM tools (Servicenow, Jira, Freshservice) and RLCatalyst products.

To learn more about using AWS Cloud or ITSM solutions for Self-Service Cloud portals contact marketing@relevancelab.com



0