Author – Ali Hussain
Checklist: Validate DevOps Architecture
Understand business needs
An organization moving to the cloud truly understand cloud’s benefits only when setting up good DevOps methodologies and cloud automation to meets its needs. The process is replete with tool choices at every stage and the overall goal is to understand and meet the organization’s needs.
From our experience in setting up DevOps infrastructure multiple times the business needs of your organization can be summed up as below:
Business Continuity And Disaster Recovery
Disasters are inevitable and it is necessary for an organization to be prepared to handle them. The Disaster Recovery method depends
On the size of the organization and what’s at stake.
Cost of a downtime
Cost to prevent downtime
It should be noted that there are diminishing returns on implementing good disaster recovery and availability. In the same vein, the cost of an outage increases super-linearly with the duration. So even if your organization is small, there is a huge incentive for picking the low-hanging fruit and having a rudimentary disaster recovery plan in place.
Meeting Customer Demands
The goal of any service is to meet varying customer demands. Questions to consider for the varying demands:
Would there be surges in demand?
To what level does our system scale?
No system can scale indefinitely. An investment in the architecture from the ground up is required to attain higher levels of scaling. These solutions are inevitably more expensive if not used to their full capabilities.
It is critical to protect business IP and customer data not only for a competitive advantage and for customer privacy purposes but also for the legal requirements on various kinds of data. The role of a DevOps architecture is to ensure the required security constraints are not compromised in the transition to a DevOps workflow which means that there are strict access rules for resources’ access. For instance, entities have access to a certain resource which does not mean a new entity when added will be granted the same access.
Reducing Time To Market
An organization needs to run like a well-oiled machine. This encompasses using the right tools that enable rapid turnaround on the application development, setting up a good dev workflow, improve software QA, and improving operations turnaround time.
Minimizing cost in terms of machines or manpower is always a significant need. The cloud forces rethinking on operational vs capital costs and how to handle the cost variability during budgeting.
Several other sub-points could be added to the above list including latency, quality of service, bug rate. However these are just different aspects of the above points and not orthogonal ideas as such. Understanding these business needs is necessary to have your DevOps strategy make a meaningful impact on your organization.
Next Monday we will discuss how these goals translate into questions you can use to validate your DevOps architecture.
we explored how business goals should inform every good DevOps strategy. This week we’ll discuss how to use those goals to validate your DevOps architecture. From our experience at Flux7, the best way to do this is to define the workflows of key users.
To ensure that an architecture will meet a client’s business goals, we ask ourselves the following questions:
What is the developer workflow and how will we enable it?
How will we handle mirroring environments for disasters?
How will we handle scaling up and down?
How will we update the environment?
How will we update the code?
How will we keep the code and environment aligned?
How will we make changes to the infrastructure?
To illustrate how these questions inform our work, we’ll walk you through them using our setup from the previous post, “The Best Way To Deploy Ruby On Rails in AWS”, which was as follows:
Chef used to deploy and bake the environment.
Capistrano used to handle code deployments.
Git repository on GitHub used to store code.
We used CloudFormation templates for infrastructure deployment.
Now let’s examine how this setup addressed the seven questions above.
What was the developer workflow and how did we enable it?
Using CloudFormation templates to orchestrate infrastructure deployment, the developers selected a pre-baked AMI with the correct environment setup. Even though we deployed the code with Capistrano, we also created a Chef recipe for deployment.
How did we handle mirroring environments for disasters?
Our Ruby on the Rails deployment was a real-time experience for a startup client. They could afford a cold DR provided the right alerts were set up for monitoring the website. It’s a good idea to make regular production-AMI backups to S3 and to make a copy to the DR region. In case of disaster, the environment can be retrieved by using the CloudFormation template with the latest AMI in the new region and then updating the route 53 to point to the new region.
How did we handle scaling up and down?
We implemented autoscaling. It’s important to know that an app server is “hot” when online without having to intervene manually. This may require scripting because the same AMI needs to work in several different environments.
How did we update the environment?
We edited the Chef recipe, checked for proper functioning and then baked the AMI. To improve Chef recipe debug loops, we experimented with recipes inside a Docker container. This approach ensured rapid revert to a previous state in case of failure.
How did we update the code?
We pushed the code from the dev branch to the master branch and ran the Capistrano recipe. Capistrano connected to the GitHub account and checked the latest copy of the required code revision. Since the code was pulled at deployment, rather than being baked into the AMI, baking a new AMI for each code update wasn’t needed. This approach is particularly suitable for hotfixes.
How did we keep the code and environment aligned?
Manual overhead made sure that deployed code worked in its respective environment. Docker may come in handy in such cases since it versions both code and environment, but we haven’t yet tried this approach.
How did we make changes to the infrastructure?
We updated the CloudFormation template, deployed the environment and code, checked for complete proper functioning, and qualified template changes. We assessed the outage caused by the template update and, depending on the outage, updated the previous stack or created a new stack, and transitioned to S3 when completed.
Given the wide variety of needs for various organizations, there’s no right or wrong approach to developing your DevOps architecture. But it’s always best to make small iterative-but-real improvements because a huge project that tries to accomplish everything is far more likely to fail. The key to success is not to prevent failure, but rather to maintain a low failure cost.