Extranet Re-architecture & Migration

Arizona Department of Health Services

Problem Statement

The Arizona Department of Health Services (ADHS) needed to improve the capacity of its systems infrastructure and website to better support its customers. Traffic to the website is highly volatile: high utilization for a few weeks of the year and low to medium utilization for the remainder. This variability has led to website/system overload during peak periods and large capital expenditures paired with high costs for underutilized IT assets and infrastructure during troughs.

ADHS required a solution that would:

  • Accommodate fluctuations in demand with high availability and scalability.
  • Improve the presentation of its public face with an attractive and reliable website for promoting high-profile events, such as Opioid Epidemic and Zika Outbreak.
  • Reflect the true costs of infrastructure use (i.e. pay only for what you use).
  • Enable in-house website and systems management groups to accomplish department requests in the future quickly and effectively.
  • Re-engineer the code promotion process between environments.
  • Encourage a more innovative development and test environments.
  • Build automation into various process flows using DevSecOps.

Proposed Solution & Architecture

ADHS recognized the opportunity for a scalable high performing infrastructure along with the necessity to reduce infrastructure expenses. Adopting Cloud technology was the best solution available. Zuggand designed, implemented and trained ADHS staff on the use of a Cloud-based solution using Amazon Web Services (AWS).

In addition, Zuggand discovered several organizational barriers with the existing process. For instance, both the development & delivery team did not understand the systems and infrastructure and therefore could not code for program efficiency. Changes were being pushed out to production environment and promoted back to non-production environments.

After studying the existing architecture and the business requirements, Zuggand proposed a Cloud architecture which leverages AWS’s dynamic scaling capabilities in the platform and at the same time takes care of issues related to availability during failovers and upgrades.

Arizona Department of Health Services architecture diagram

List of AWS Services Used

The following AWS Services were implemented as part of the solution:

  • AWS Auto Scaling: A service that helps maintain application availability and allows to dynamically scale Amazon EC2 capacity up or down automatically according to conditions defined.
  • Elastic Load Balancing: A service that automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, and IP addresses and handles the varying load of your application traffic in a single Availability Zone or across multiple Availability Zones.
  • Amazon EC2: A web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
  • Amazon Aurora: A MySQL and PostgreSQL compatible relational database built for the cloud, that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
  • Amazon Elastic File System: Provides simple, scalable, elastic file storage for use with AWS Cloud services and on-premises resources.
  • AWS CloudFormation: A service which gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.
  • Amazon Simple Notification Service: A web service that makes it easy to set up, operate, and send notifications from the cloud. It provides developers with a highly scalable, flexible, and cost-effective capability to publish messages from an application and immediately deliver them to subscribers or other applications.
  • Amazon Simple Storage Service (S3): An object storage built to store and retrieve any amount of data from anywhere.
  • Amazon CloudFront: A global content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to your viewers with low latency and high transfer speeds.
  • Amazon Route 53: A web service that provides highly available and scalable cloud Domain Name System (DNS).
  • Amazon CloudWatch: A monitoring service for AWS cloud resources and the applications that run on AWS.
  • AWS Systems Manager: A service that gives visibility and control of infrastructure on AWS. Systems Manager provides a unified user interface to view operational data from multiple AWS services and allow automation of operational tasks across AWS resources
  • AWS WAF: A web application firewall that helps protect web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.
  • AWS Key Management Service (KMS): A managed service that makes it easy for to create and control the encryption keys used to encrypt data.

List of Third Party Services Used

Third Party Services that were implemented as part of the solution

  • GitLab: A service that enables teams to collaborate and work from a single conversation, instead of managing multiple threads across disparate tools.
  • Jenkins: A service that is used to automate all sorts of tasks related to building, testing, and delivering or deploying software.
  • LoadImpact: An on-demand service that provides online load and performance testing service that lets you test your website, web-app, mobile app or API over the Internet.

Process

The entire process took a couple of months and involved the following steps:

Consultation and diagnosis:

  • Discussions with key stakeholders to determine the suitability of an AWS solution for the ADHS website.
  • Check application inter-dependencies with other on-premise applications and shared services.
  • Develop a migration strategy for databases and files.
  • Verify how the application is classified in the business. Business critical and LOB applications demand high availability

Architecture & Delivery:

  • Re-engineer and Re-architect ADHS’s internally hosted infrastructure to the AWS platform using DevSecOps automation.
  • Deliver a solution that included high availability, multiple availability zones and elastic load balancers.
  • Leverage managed services from AWS as much as possible – Aurora MySQL RDS, CDN, WAF, etc.

Training and Knowledge Transfer:

  • Ensure self-sufficiency of staff for future management of the environments
  • Transition developers to DevOps (an amalgamation of two roles spanning development and operations)

Change management:

  • Build relationships with key ADHS stakeholders to bring them onboard with the project and encourage adoption of the AWS solution.

Results

ADHS now has a strong online presence to support its various programs and needs. The automated AWS solution has enabled efficient and cost-effective management of traffic to the website, www.azdhs.gov.

In particular, the agency now can automatically:

  • Downscale during troughs and thereby lower costs (pay only for what you use).
  • Upscale for high profile events to eliminate ‘site busy’ responses and present a seamless, professional front-end to the public.

The solution has also improved agility, time to market and innovation by:

  • Leveraging Amazon’s proven infrastructure and builds around available solutions thereby reducing investment and maintenance cost.
  • Increasing the speed with which non-production environments can be created and replicated, thereby encouraging developers to experiment with changes and updates.
  • Introducing a self-service infrastructure, in which developers become DevOps with ownership of the website and greater motivation to innovate, deploy changes easily, and scale up and down when the need arises.
  • Providing a robust, creative open environment to work.
  • Implementing round-the-clock monitoring mechanism that sends all necessary alerts.

Alignment to Well Architected Framework (WAF)

By optimizing the architecture around the five (5) WAF pillars, ADHS was able to gain the benefits of a well-architected design in the cloud:

Cost Optimization:

  • Shrinked the server foot bring from 7 on-premise servers to 2 servers lowering overall cost.
  • Match supply with demand by taking advantage of autoscaling capabilities.
  • Leveraged reserved instanced to further reduce TCO.
  • Was able to shrink the storage footprint from around 1TB to 150 GB EFS storage.
  • Use Trusted Advisor for optimizing resources – instance types and sizes.
  • Use tiered storage where possible – S3, IA, Glacier.
  • Optimizing over time by taking advantage of new services or features.

Performance Efficiency:

  • Leveraged CDN (AWS CloudFront) for caching and reducing load on web, app servers.
  • Performance efficiency by opting for burstable t2 instances.
  • Bring operational efficiencies by migrating from a stand along on-premise MySQL server to Aurora MySQL RDS.
  • Leveraged higher level & managed services – RDS, Lambda, etc.
  • Leveraged AWS Marketplace services for a few services.
  • Review resources using benchmarking and load tests.

Operational Excellence:

  • Re-architected the stack to leverage Infrastructure as a Code
  • Automated deployments using Continuous Integration & Continuous Deployment (CI/CD)
  • Configure CloudWatch for monitoring of environment and alert operations staff
  • Tag resources for billing and operational needs.
  • Bring operational insights through dashboards and visualizations using AWS Athena, ElasticSearch and QuickSight services
  • Create runbooks & playbooks for operations team

Security:

  • Configure WAF and Inspector rules to secure workloads.
  • Centralized log management solution to better track and log application, system and network activities.
  • Patching and systems automation for ensuring security of resources and workloads.
  • Leverage IAM roles for instances along with Active Directory integration of users and roles paired with 2-factor authentication.
  • Segregation of resources between VPC’s and subnets.
  • Secure workloads using NACL’s and security groups.
  • Network, application and operating system hardening.

Reliability:

  • Multiple Direct Connect paths for better network performance and redundancy.
  • Leverage Application Load Balancers for reliability of services.
  • Architect applications for high availability and resiliency.
  • Use AWS CloudFormation templates for the creation of AWS resources and provisions them in an orderly and predictable fashion.
  • Leverage CloudWatch to alert on metrics, including custom metrics.

Lessons Learned / Outcomes

For most organizations, acquiring physical infrastructure requires capital expenses, while acquiring more Cloud resources requires operational expenses.  This is a big change for most organizations. In terms of managing the new Cloud infrastructure, ongoing operational expenses will need to be planned. Without proper management and governance, the operational expenses could increase unexpectedly.

Migrating to the Cloud should divert the capital expenses to operational expenses, resulting in improved & efficient management of funds. Management will then focus more on highly-skilled  resources who would be able to more efficiently develop quality product.

This implementation in particular has also led to process re-engineering and automation, allowing management to focus on functional enhancements by reallocating resources in business analysis, coding & testing.