2021 Blog, Blog, Feature Blog, Featured

Relevance Lab and AWS collaborate to release new EC2-RStudio-Server solution for Service Workbench

Provide researchers access to secure RStudio instances in the AWS cloud by using Amazon issued certificates in AWS Certificate Manager (ACM) and an Application Load Balancer (ALB)

Cloud computing offers the research community access to vast amounts of computational power, storage, specialized data tools, and public data sets, collectively referred to as Research IT, with the added benefit of paying only for what is used. However, researchers may not be experts in using the AWS Console to provision these services in the right way. This is where software solutions like Service Workbench on AWS (SWB) make it possible to deliver scientific research computing resources in a secure and easily accessible manner.

RStudio is a popular software used by the Scientific Research Community and supported by Service Workbench. Researchers use RStudio very commonly in their day-to-day efforts. While RStudio is a popular product, the process of installing RStudio securely on AWS Cloud and using it in a cost-effective manner is a non-trivial task, especially for Researchers. With SWB, the goal is to make this process very simple, secure, and cost-effective for Researchers so that they can focus on “Science” and not “Servers” thereby increasing their productivity.

Relevance Lab (RL), in partnership with AWS, set out to make the experience of using RStudio with Service Workbench on AWS simple and secure.

Technical Solution Goals

  1. A researcher should be able to launch an RStudio instance in the AWS cloud from within the Service Workbench portal.
  2. The RStudio instance comes fully loaded with the latest version of RStudio and a variety of other software packages that help in scientific research computing.
  3. The user launches a URL to the RStudio from within the Service Workbench. This URL is a unique URL generated by SWB and is encoded with an authentication token that ensures that the researcher can access the RStudio instance without remembering any passwords. The URL is served over SSL so that all communications can be encrypted in transit.
  4. Maintaining the certificates used for SSL communication should be cost-effective and should not require excessive administrative efforts.
  5. The solution should provide isolation of researcher-specific instances using allowed IP lists controlled by the end-user.

Comparison of Old and New Design Principles to make Researcher Experience Frictionless
The following section summarizes the old design and the new architecture to make the entire researcher experience frictionless. Based on feedback from researchers, it was felt that the older design required a lot of setup complexity and lifecycle upgrades for security certificate management, slowing down researchers productivity. The new solution makes the lifecycle simple and frictionless along with smart and innovative features to keep ongoing costs optimized.


No. RStudio Feature Original Design Approach New Design Approach
1 User Generated Security Certificate for SSL Secure Connections to RStudio. Users have to create a certificate (like LetsEncrypt) and use it with RStudio EC2 Instance with NGINX server. This creates complexity in the Certificate lifecycle. Complex for end-users to create, maintain and renew. The RStudio AMI also needs to manage the Certificate lifecycle. Move from External certificates to AWS ACM.

Bring in a shared AWS ALB (Application Load Balancer) and use AWS ACM certificates for each Hosting Account to simplify the Certificate Management Lifecycle.
2 SSL Secure Connection. Create an SSL connection with Nginx Server on RStudio EC2. Related to custom certificate management. Replaced with ALB at an Account level and shared by all RStudio Instances in an account. User Portal to ALB connection secured by ACM. For ALB to RStudio EC2 secure connection, use unique self-signed Certificates to encrypt connection per RStudio.
3 Client Role (IAM) changes in SWB. Client role is provided necessary permissions for setup purposes. Additional role privileges added to handle ALB.
4 ALB Design. Not existing in the original design. Shared ALB design per Hosting Account to be shared between Projects. Each ALB is expected to cost about $20-50 monthly in shared mode with average use. API model used to create/delete ALB.
5 Route 53 Changes on the Main account. A CNAME record gets created with the EC2 DNS name. A CNAME record gets created with the ALB DNS name.
6 RStudio AMI. Embedded with Certificate details. Related to custom certificate management. Independent of user-provided Certificate details. Also, AMI has been enhanced to include the following: Self-signed SSL and additional packages (as commonly requested by researchers) are baked into the AMI.
7 RStudio Cloud Formation Template (CFT). Original one to be removed from SWB. Added a new output to indicate the “Need ALB” flag. Also, create a new target group to which the ALB can route requests.
8 SWB Hosting Account Configuration. Did not have to provision certificate AWS ACM. Manual process to set up a certificate in a new hosting account.
9 Provisioned RStudio per Hosting Account Active Count Tracking. None. Needed to ensure ALB is created the first time when RStudio is provisioned and deleted after the last RStudio is deleted to optimize cost overheads of ALB.
10 SWB DynamoDB Table Changes. DynamoDB used for all Tables by SWB. Modifications needed to support the new design. Added to the existing DeploymentStore table in SWB design.
11 SWB Provision Environment Workflow. Standard design. Additional Step added to check if “Workspace Type” needs ALB and if it does when checking for ALB and either create or pass the reference to existing one.
12 SWB Terminate Environment Workflow. Standard design. Additional Step added to check if last Active RStudio being deleted and if so, also delete ALB to reduce idle costs.
13 Secure “Connection” Action from SWB Portal to RStudio instance. To ensure each RStudio has a secure connection for each user a unique connection URL is generated during the user session that is valid for a limited period. The same design of the original implementation is preserved. Internally the routing is managed through ALB but the concept remains the same. This ensures users do not have to remember user id/password for RStudio and a secure connection is always made available.
14 Secure “Connection” from SWB Portal disallowing other users from accessing RStudio resources given shared ALB. NA. Using the design feature (Step-13) ensures that even post ALB the connection for a User (Researcher and PI) is still restricted to their provisioned RStudio only and they cannot access other Researchers Instances. The unique connection is system generated using User to RStudio mapping uniquely.
15 ALB Routing Rules for RStudio secure connections given shared nature. NA. Every time an RStudio is created or deleted, changes are made to ALB rules to allow a secure connection between the User session and the linked RStudio. The same rules are cleaned up during RStudio delete lifecycle. These changes to ALB routing rules are managed from SWB code under Workflow customizations. (Step-11 and 12) using APIs.
16 RStudio Configuration parameters related to CIDR. Original design allows only whitelisted IP addresses to connect to associated RStudio instances – this can be modified also from configurations. RStudio Cloud Formation Template (CFT) should take Classless Inter-Domain Routing (CIDR) as Input Parameter and pass it through as an Output Parameter for the SWB to take it and create the ALB Listener Rule.
SWB code will take CIDR from RStudio CFT output, subsequently, update the ALB Listener Rule with the respective Target Group.
17 Researcher costs tracking. The original design had RStudio costs tracked for Researchers. Custom certificate costs were not tracked if any. In the new design, RStudio costs are tagged and tracked per researcher. ALB costs are treated as shared costs for the Hosting account.
18 RStudio Packaging and Delivery for a new customer – Repository Model. Bundled with standard SWB repo and installed. New model for RL to create a separate Repo and host RStudio with associated documentation and templates for customers to use.
19 RStudio Packaging and Delivery for a new customer – AWS Marketplace model. None. RL to provide RStudio on AWS Marketplace for SWB customers to add to standard Service Catalog and import (Future Roadmap item).
20 Upgrade and Support Models for RStudio. SWB teams ownership. To be managed by RL teams.
21 UI Modification for Partner Provided Products. No partner provided products. Partner-provided products will reside in the self-hosted repo. SWB UI will provide a mechanism to show details of partner names and a link to additional information.


The diagram below explains the interplay between different design components.


Secure and Scalable Solution Architecture
Keeping in mind the above design goals, a secure and scalable architecture is implemented that solves the problem of shared groups using products like RStudio requiring secure HTTPS access without the overheads of individual certificate management. The architecture also enables sharing the same concept for all future researcher products with similar needs without any additional implementation overheads resulting in increased productivity and lower costs.


The Relevance Lab team designed a solution centered on an EC2 Linux instance with RStudio and relevant packages pre-installed and delivered as an AMI.

  1. When the instance is provisioned, it is brought up without a public IP address.
  2. All traffic to this instance is delivered via an Application Load Balancer (ALB). The ALB is shared across multiple RStudio instances within the same account to spread the cost over a larger number of users.
  3. The ALB serves over an SSL link secured with an Amazon-issued certificate which is maintained by AWS Certificate Manager.
  4. The ALB costs are further brought down by provisioning it on demand when the first RStudio instance is provisioned. Conversely, the ALB is de-provisioned when the last RStudio instance is de-provisioned.
  5. Traffic between the ALB and the RStudio instance is also secured with an SSL certificate which is self-signed but unique to each instance.
  6. The ALB listener rules enforce the IP allowed list configured by the user.

Conclusion
Both SWB and Relevance Lab RLCatalyst Research Gateway teams are committed to making scientific research frictionless for researchers. With a shared goal, this new initiative speeds up collaboration and will help provide new innovative open-source solutions leveraging Service Workbench on AWS and partner-provided solutions like this RStudio with ALB from Relevance Lab. The collaboration efforts will soon be adding more solutions covering Genomic Pipeline Orchestration with Nextflow, use of HPC Parallel Cluster, and secure research workspaces with AppStream 2.0, so stay tuned.

To get started with RStudio on SWB provided by Relevance Lab use the following link:
Relevance Lab Github Repository for SWB Templates

For more information, feel free to contact marketing@relevancelab.com.

References
Service Workbench on AWS for driving Scientific Research
Service Workbench on AWS Documentation
Service Workbench on AWS Github Repository
RStudio Secure Architecture Patterns
Relevance Lab Research Gateway