Monthly Archives: March 2018

Cloud Preparation – More than Technology?

Cloud is easy, right?  Someone builds a data center, runs cables, slaps some hardware and software together and *poof*, you are running a cloud.  If a company wants to join, they just sign some documents and push their applications into the deep blue and everyone is happy!

Of course, I am being sarcastic.  Generally, when we engage a company about a failing cloud initiative or a new cloud initiative, we generally get the same thoughts.  Cloud is easy, and anyone can put one together.  Even the companies / teams that have failed after 2+ years with nothing to show wonder why they have not delivered.

Over the past 6 years, I have been focused on deploying private cloud implementations for multiple clients.  Although most have been on engineered or converged systems, there have been multiple with physical and / or virtual systems over many technologies.  These engagements include either “bail-outs” of failed implementations from other big-named firms or a complete end-to-end deployment. For the failed engagements, we generally start with a system health check to identify the challenges.  Our thoughts were cloud providers understand the process for on-boarding a client – which is a false statement.

Over these engagements, we have implemented a methodology that has been very successful in the preparation and rapid delivery of cloud implementations.  Over the next set of blog posts, I will discuss the methodology and some of the common errors that we have seen.

“We generally start everything with a health check – because everyone is ready for the cloud … right?”

Technology is fun and cloud technology is more fun.  The issue that we find very often is that people understand cloud technology, but they don’t understand the cloud.  Whether it is a public cloud or a private cloud, most people we talk with understand the benefits of implementing a cloud.  The excitement over the benefits often overwhelms corporate leaders with promises of cost savings and rapid deployments.  But, when we ask them what they want – generally, we get a blank stare.  In fact, if you read my last blog, we get a lot of blank stares at the beginning. In reality, we find the client is ill prepared for the migration to the cloud, either through expectation or level of effort.  The blank stare presents the moment when reality becomes apparent.

“As a good friend and colleague told me today – “We need to learn better English…”.  Perhaps that will remove the blank stare syndrome.

The blank stares will never cease, as I believe the disconnect is in the expectation, not the comprehension.  Regardless of the topic, technology is supposed to be simple.  Bottom line, cloud is simple if you are prepared.  It is just like riding a bike! But just like riding a bike in today’s world, you have to know what you want and how you are going to use the bike.  Then you have to be prepared for the hills, the elements, the virtual riding– so really, is cloud simple?

IMG_4237 (1).jpgThis is a picture of my bike, in my house “torture chamber”. I can guarantee that it is more complicated than just getting on and peddling!

So, over the next few weeks we will talk about being prepared for the cloud.  Whether you are going public cloud or private cloud, the preparation is nearly the same.  If you are wondering “why do I need to prepare my environment for a cloud?”, please read-on, it will change the way you deploy and save money.

Here is what I will discuss over the next few weeks …

  • “Are those requirements? There’s no requirements in the cloud!”
  • Does a reference architecture really help?
  • Patterns, not just for sewing anymore!
  • Service Catalog and Christmas List – hope eternal
  • Automation, Supply Chain and physics eternal

Technology is fun, and the cloud is fun.  As an infrastructure engineer, database architect and application architect, I spend more time debating with myself than other people.  They are good debates and I generally stop the discussion before they get violent.  I hope we can have some fun as we go through this cloud journey and I welcome comments and thoughts!


Defining Disaster – What is missing in Disaster Recovery

 

Over the past year, I have been involved in numerous Disaster Recovery (DR) engagements – including review and implementations.  When I start, my first question is NOT “What is your RTO/RPO?”.  (although important).

My first question is “What is your goal?”  It is amazing the blank stares I get from this question.  Obviously, the thoughts are probably “Idiot, my goal is disaster recovery!”  But, I then explain that there are many different types of disasters and many types of disaster recovery options. The most entertaining response I get when I ask this question is “Well, someone told us we needed a disaster recovery solution”.  Of course, this is normally in Florida during hurricane season …

My mind drifts off to my favorite Jimmy Buffett song – “Tryin’ To Reason With Hurricane Season”

Anyway, I thought for a fun refresher, I would throw out the discussion of WHY we do a Disaster Recovery implementation, NOT HOW do we do a DR implementation.  Of course, determining how to do a DR implementation is easy once we determine why we want to do a DR solution.

Seriously, most teams jump into the technology solution before considering the requirements or the goals of the solution.  So let’s go and have some fun!

Goals of Disaster Recovery

Most people know me as an advocate of flexibility and a DR implementation is no different.  The company DR goals should be defined by the type and scope of the disaster.  Some types of disasters are obvious, including natural and man-made disasters resulting in a “smoking hole”.  However, other failures may also require a disaster declaration or at least utilize the environment for recovering from an incident.  The following graphic provides a focus on the goals of a disaster recovery solution.

description

The DR implementations focus on an Enterprise focus, which maintains a secondary site for their production site.  Most of the time, the disaster recovery site reflects a “lights out” location and rarely tested.  This implementation only satisfies a compliance person or government compliance check box, which is important.  However, how does this support the business objective or the customer’s experience?

The disaster recovery goal should focus on a) How does this support our customer experience and b) how does this allow us to drive business?

After all, the resources for disaster recovery are expensive.  Also, the management of data centers and environments requires focus and support of tools and personnel.  Why not make these environments work for the company as well?

The following graphic represents a goal oriented focus on the disaster recovery solution.  As the goals become more focused and flexible, the beneficiary transitions from internal operations to client experience and business focus.

scope

Expanding the goals of the disaster recovery implementation for flexibility removes the self serving goal of checking a box.  I have reviewed DR solutions which force customer applications to fail to disaster recovery due to a business application failure.  Or worse, I have evaluated a client environment that isolates customers into separate environments for customer isolation; however, the DR plan requires all customers to transition to DR in the event of a single customer outage.

Isolating applications into groups supporting distinct applications or customer installments is great.  Providing a “Site Switching” strategy for each application group is excellent and improves the customer experience and confidence! The isolation of applications, databases and incidents provide an effective solution for disaster recovery.  Although moving a mountain during a disaster gets notice.  Causing a customer multiple days outage due to a deleted table gets more notice.

As a manager of operations for a workforce staffing company, I recall an incident with Peoplesoft.  A very capable person was performing an upgrade of a Peoplesoft application and mistakenly deleted the Vendor table.  Our ability to quickly rebuild this table in our DR location and place it back in production not only saved our butts, but save the company 2 – 3 days of embarrassment while we were trying to rebuild the table.

Disaster Recovery Testing Frequency

Testing a disaster recovery cut-over requires coordination and resources.  I am often asked “how often should our company perform a DR test?”.  The question is simple, but most of the time, misguided.

A disaster is not convenient or forgiving.  In operations at the time of a disaster, everything is on fire and everyone is yelling.  Only experience and muscle memory provides the difference between minutes and days.

Testing DR is not proving the ability to move applications from one point to another, that is done daily.  The POINT to testing DR is for providing experience and process for the individuals.

In one of my favorite movies – The Last Samurai – Tom Cruise’s character challenges a soldier to shoot him while being attacked.  Of course the soldier panics and fails the challenge and the movie continues.  But, the point is clear – things change in times of stress!

The following graphic provides a guide line for when to test the switchover for the efficiency of the team.

frequency

As the testing flows down the graphic, the team’s efficiencies also improve.  As we test the common Site Switching on a monthly or quarterly basis, the experience of the supporting teams increase as well.  Therefore, the testing of the Enterprise cutover would encompass smaller flexible site transitions, which are tested frequently.

Can your team perform a disaster recovery test?  Can your team DEFINE a disaster for your operations?  If not, you have some homework!

Better question, when was the last time you checked your spare tire?  When was the last time you changed your tire?  Your tire will not go flat when it is convenient, it will only go flat when it is raining and on the freeway.

My last story … when I was a director of operations, I managed a team of DBA and system administrators.  I had a triage DBA that rotated between application and triage, but during triage they handled the weekly issues and requests.  In the middle of this stressful period, I would write the name of a database, a date and a time on a piece of paper and put it on their desk.  They would have to recover the database, while they were doing triage, based on that time frame.  The process was to keep them sharp, not to test their abilities.

 


Cloud Architecture Blog

with a focus on Oracle Cloud

Database Engineering

Data design for the real world

Frits Hoogland Weblog

IT Technology; Yugabyte, Postgres, Oracle, linux, TCP/IP and other stuff I find interesting

Kerry Osborne's Oracle Blog

Just another WordPress.com site

Martins Blog

Trying to explain complex things in simple terms