Category Archives: Oracle

Cloud Transformation: Transactional vs Relational

In a previous life, we were engaged by an account team to support a cloud architecture and migration.  Our challenge: the cloud destination and application were decided before an analysis of requirements or capability.  Since the underlying architecture included Oracle and Oracle Real Application Clusters (RAC), our team was engaged for the migration effort.  In the initial discussion with the client team, they were not able to describe the application requirements or the client’s resiliency requirements.

Our initial discovery uncovered the client legacy environment included multiple two-node Oracle Real Application Cluster (RAC) configurations connected geographically by Oracle GoldenGate.  The Oracle GoldenGate implementation supported an ACTIVE/ACTIVE implementation between the two Oracle RAC deployments.  The two connected Oracle RAC environments support the primary web portal supporting a large population of the client connections.  The following diagram presents the legacy environment prior to the cloud migration.

Resilient RAC Environment

The implementation characteristics infer an emphasis on system/application availability.  The ACTIVE/ACTIVE implementation of Oracle GoldenGate enabled a low latency transition in the event of environment failure.  The dual node Oracle RAC systems were deployed for resiliency, not throughput.  This architecture represents a majority of the non-engineered systems RAC installations configured for high availability. We easily deduced the client’s focus was on availability, followed by throughput – which was verified during our first client discussion.

Unfortunately, the pre-selected cloud target did not provide the technical capability to support the resiliency and agility requirements of the application installation.  The goal of the account team and the cloud team was to get to the cloud first, then figure out the details.  While I won’t get into the details of the solution architecture and migration process, I wanted to share my opinion to the effort.  Unfortunately, “move to the cloud first and analyze later” is a common approach with most account teams, which is when we get involved to save a failing project.

The focus on this “cloud first approach” represents a transactional cloud strategy effort. Each individual transaction represents a single major event, including migration, performance tuning, resiliency deployment.  The migration transaction would consist of successfully moving the data from an on-premises environment to a cloud destination. 

…   This is the same as purchasing a vehicle and the sales manager hands you the keys, but they don’t provide the vehicle. …

There is more to an application than putting data in the cloud, so much more!

Additional transactions, such as resiliency and performance, are not considered.  Quite often, this afterthought leads to failures due to poor performance or system outages.  The following diagram represents an actual situation we were engaged after a migration occurred to a popular cloud provider.

Transactional Cloud Strategy example

In other situations, the performance issue is replaced with a system failure.  However, in all occurrences, the impact to the client is the same.  These impacts include prolonged outages, cost increases to cloud solution and lost confidence from client business units.  In some cases, the client reverses the current and future cloud initiatives.

The unfortunate difference between a transactional cloud strategy and a relational cloud strategy is repetition.  A transactional cloud strategy means a single transaction is potentially repeated multiple times for the same application due to ignored transactions.  These missed requirements commonly include resiliency or performance requirements, impacting the functionality of the application and application experience.  The relational cloud strategy focuses on the long-term impact of a solution.  Therefore, the relational cloud strategy includes the review of the current resource utilization, client requirements and client projections for defining a strategic cloud provisioning plan.

I learned a strategic methodology combines years of experience and effective tools for defining a strategic cloud implementation.  The experience includes more than 30 years of infrastructure and Oracle deployments focused on resiliency and performance.  Utilizing a tool suite that includes open-source tools developed in collaboration of many Oracle experts for analyzing, engineering and provisioning.  The following diagram represents a standard cloud engagement focused on relational cloud strategy.

Relational Cloud Strategy

The relational cloud strategy focuses on build right and migrate once, which is a strategic implementation.  The focus not only includes resiliency and performance, but also agility.  Providing an architecture focused on standard patterns and procedures supports application requirements.  A strategic implementation ensures system available and stable cost models.

Eliminating the “redo transactions”, the

Relational Cloud Strategy

enables the focus on business, not on redo efforts.


Defining Disaster – What is missing in Disaster Recovery

 

Over the past year, I have been involved in numerous Disaster Recovery (DR) engagements – including review and implementations.  When I start, my first question is NOT “What is your RTO/RPO?”.  (although important).

My first question is “What is your goal?”  It is amazing the blank stares I get from this question.  Obviously, the thoughts are probably “Idiot, my goal is disaster recovery!”  But, I then explain that there are many different types of disasters and many types of disaster recovery options. The most entertaining response I get when I ask this question is “Well, someone told us we needed a disaster recovery solution”.  Of course, this is normally in Florida during hurricane season …

My mind drifts off to my favorite Jimmy Buffett song – “Tryin’ To Reason With Hurricane Season”

Anyway, I thought for a fun refresher, I would throw out the discussion of WHY we do a Disaster Recovery implementation, NOT HOW do we do a DR implementation.  Of course, determining how to do a DR implementation is easy once we determine why we want to do a DR solution.

Seriously, most teams jump into the technology solution before considering the requirements or the goals of the solution.  So let’s go and have some fun!

Goals of Disaster Recovery

Most people know me as an advocate of flexibility and a DR implementation is no different.  The company DR goals should be defined by the type and scope of the disaster.  Some types of disasters are obvious, including natural and man-made disasters resulting in a “smoking hole”.  However, other failures may also require a disaster declaration or at least utilize the environment for recovering from an incident.  The following graphic provides a focus on the goals of a disaster recovery solution.

description

The DR implementations focus on an Enterprise focus, which maintains a secondary site for their production site.  Most of the time, the disaster recovery site reflects a “lights out” location and rarely tested.  This implementation only satisfies a compliance person or government compliance check box, which is important.  However, how does this support the business objective or the customer’s experience?

The disaster recovery goal should focus on a) How does this support our customer experience and b) how does this allow us to drive business?

After all, the resources for disaster recovery are expensive.  Also, the management of data centers and environments requires focus and support of tools and personnel.  Why not make these environments work for the company as well?

The following graphic represents a goal oriented focus on the disaster recovery solution.  As the goals become more focused and flexible, the beneficiary transitions from internal operations to client experience and business focus.

scope

Expanding the goals of the disaster recovery implementation for flexibility removes the self serving goal of checking a box.  I have reviewed DR solutions which force customer applications to fail to disaster recovery due to a business application failure.  Or worse, I have evaluated a client environment that isolates customers into separate environments for customer isolation; however, the DR plan requires all customers to transition to DR in the event of a single customer outage.

Isolating applications into groups supporting distinct applications or customer installments is great.  Providing a “Site Switching” strategy for each application group is excellent and improves the customer experience and confidence! The isolation of applications, databases and incidents provide an effective solution for disaster recovery.  Although moving a mountain during a disaster gets notice.  Causing a customer multiple days outage due to a deleted table gets more notice.

As a manager of operations for a workforce staffing company, I recall an incident with Peoplesoft.  A very capable person was performing an upgrade of a Peoplesoft application and mistakenly deleted the Vendor table.  Our ability to quickly rebuild this table in our DR location and place it back in production not only saved our butts, but save the company 2 – 3 days of embarrassment while we were trying to rebuild the table.

Disaster Recovery Testing Frequency

Testing a disaster recovery cut-over requires coordination and resources.  I am often asked “how often should our company perform a DR test?”.  The question is simple, but most of the time, misguided.

A disaster is not convenient or forgiving.  In operations at the time of a disaster, everything is on fire and everyone is yelling.  Only experience and muscle memory provides the difference between minutes and days.

Testing DR is not proving the ability to move applications from one point to another, that is done daily.  The POINT to testing DR is for providing experience and process for the individuals.

In one of my favorite movies – The Last Samurai – Tom Cruise’s character challenges a soldier to shoot him while being attacked.  Of course the soldier panics and fails the challenge and the movie continues.  But, the point is clear – things change in times of stress!

The following graphic provides a guide line for when to test the switchover for the efficiency of the team.

frequency

As the testing flows down the graphic, the team’s efficiencies also improve.  As we test the common Site Switching on a monthly or quarterly basis, the experience of the supporting teams increase as well.  Therefore, the testing of the Enterprise cutover would encompass smaller flexible site transitions, which are tested frequently.

Can your team perform a disaster recovery test?  Can your team DEFINE a disaster for your operations?  If not, you have some homework!

Better question, when was the last time you checked your spare tire?  When was the last time you changed your tire?  Your tire will not go flat when it is convenient, it will only go flat when it is raining and on the freeway.

My last story … when I was a director of operations, I managed a team of DBA and system administrators.  I had a triage DBA that rotated between application and triage, but during triage they handled the weekly issues and requests.  In the middle of this stressful period, I would write the name of a database, a date and a time on a piece of paper and put it on their desk.  They would have to recover the database, while they were doing triage, based on that time frame.  The process was to keep them sharp, not to test their abilities.

 


Cloud Architecture Blog

with a focus on Oracle Cloud

Database Engineering

Data design for the real world

Frits Hoogland Weblog

IT Technology; Yugabyte, Postgres, Oracle, linux, TCP/IP and other stuff I find interesting

Kerry Osborne's Oracle Blog

Just another WordPress.com site

Martins Blog

Trying to explain complex things in simple terms