Working on non-scientific tasks such as setting up instances, installing software libraries, making model compile, and preparing input data are some of the biggest pain points for atmospheric scientists or any scientist for that matter. It’s challenging for scientists as it requires them to have strong technical skills deviating them from their core areas of analysis & research data compilation. Further adding to this, some of these tasks require high-performance computation, complicated software, and large data. Lastly, researchers need a real-time view of their actual spending as research projects are often budget-bound. Relevance Lab help researchers “focus on science and not servers” in partnership with AWS leveraging the RLCatalyst Research Gateway (RG) product.
Why RLCatalyst Research Gateway?
Speeding up scientific research using AWS cloud is a growing trend towards achieving “Research as a Service”. However, the adoption of AWS Cloud can be challenging for Researchers with surprises on costs, security, governance, and right architectures. Similarly, Principal Investigators can have a challenging time managing the research program with collaboration, tracking, and control. Research Institutions will like to provide consistent and secure environments, standard approved products, and proper governance controls. The product was created to solve these common needs of Researchers, Principal Investigator and Research Institutions.
- Available on AWS Marketplace and can be consumed in both SaaS as well as Enterprise mode
- Provides a Self-Service Cloud Portal with the ability to manage the provisioning lifecycle of common research assets
- Gives a real time visibility of the spend against the defined project budgets
- The principal investigator has the ability to pause or stop the project in case the budget is exceeded till the new grant is approved
In this blog, we explain how the product has been used to solve a common research problem of GEOS-Chem used for Earth Sciences. It covers a simple process that starts with access to large data sets on public S3 buckets, creation of an on-demand compute instance with the application loaded, copying the latest data for analysis, running the analysis, storing the output data, analyzing the same using specialized AI/ML tools and then deleting the instances. This is a common scenario faced by researchers daily, and the product demonstrates a simple Self-Service frictionless capability to achieve this with tight controls on cost and compliance.
GEOS-Chem enables simulations of atmospheric composition on local to global scales. It can be used off-line as a 3-D chemical transport model driven by assimilated meteorological observations from the Goddard Earth Observing System (GEOS) of the NASA Global Modeling Assimilation Office (GMAO). The figure below shows the basic construct on GEOS-Chem input and output analysis.
Being a common use case, there is documentation available in the public domain by researchers on how to run GEOS-Chem on AWS Cloud. The product makes the process simpler using a Self-Service Cloud portal. To know more about similar use cases and advanced computing options, refer to AWS HPC for Scientific Research.
Steps for GEOS-Chem Research Workflow on AWS Cloud
Prerequisites for researcher before starting data analysis.
- A valid AWS account and an access to the RG portal
- A publicly accessible S3 bucket with large Research Data sets accessible
- Create an additional EBS volume for your ongoing operational research work. (For occasional usage, it is recommended to upload the snapshot in S3 for better cost management.)
- A pre-provisioned SageMaker Jupyter notebook to analyze output data
Once done, below are the steps to execute this use case.
- Login to the RG Portal and select the GEOS-Chem project
- Launch an EC2 instance with GEOS-Chem AMI
- Login to EC2 using SSH and configure AWS CLI
- Connect to a public S3 bucket from AWS CLI to list NASA-NEX data
- Run the simulation and copy the output data to a local S3 bucket
- Link the local S3 bucket to AWS SageMaker instance and launch a Jupyter notebook for analysis of the output data
- Once done, terminate the EC2 instance and check for the cost spent on the use case
- All costs related to GEOS-Chem project and researcher consumption are tracked automatically
Sample Output Analysis
Once you run the output files on the Jupyter notebook, it does the compilation and provides output data in a visual format, as shown in the sample below. The researcher can then create a snapshot and upload it to S3 and terminate the EC2 instance (without deleting the additional EBS volume created along with EC2).
Output to analyze loss rate and Air mass of Hydroxide pertaining to Atmospheric Science.
Scientific computing can take advantage of cloud computing to speed up research, scale-up computing needs almost instantaneously, and do all this with much better cost-efficiency. Researchers no longer need to worry about the expertise required to set up the infrastructure in AWS as they can leave this to tools like RLCatalyst Research Gateway, thus compressing the time it takes to complete their research computing tasks.
The steps demonstrated in this blog can be easily replicated for similar other research domains. Also, it can be used to onboard new researchers with pre-built solution stacks provided in an easy to consume option. RLCatalyst Research Gateway is available in SaaS mode from AWS Marketplace and research institutions can continue to use their existing AWS account to configure and enable the solution for more effective Scientific Research governance.
To learn more about GEOS-Chem use cases, click here.
If you want to learn more about the product or book a live demo, feel free to contact email@example.com.
Enabling Immediate Access to Earth Science Models through Cloud Computing: Application to the GEOS-Chem Model
Enabling High‐Performance Cloud Computing for Earth Science Modeling on Over a Thousand Cores: Application to the GEOS‐Chem Atmospheric Chemistry Model