Stability is Bad for Your Business

Tuesday, June 19, 2012

Rafal Los


Recently I said something that in my head sounded right, but James Urquhart sent me this article (which has nothing to do with IT, by the way) which made me re-evaluate what I believed about stability, resiliency and how this all applied to IT.  

Since then, I promised him, and many of you on Twitter, that I would do some analysis and thinking on this topic and write at least a few pieces that would try to explain why stability is bad for resiliency.

Let me be clear, this wasn't an easy mental stretch for me.  Since I can remember I was taught that IT should strive for stability.  I can't be the only one out there that believed this, and took it to absolute heart.  

Everything I've done over the years dating back to 1997 or so has served to stabilize the IT structure of an organization - whether I have been a consultant or an IT staffer. Stability has been not just a goal, but the goal of most of the organizations I've worked with and for.

Now I'm reading that stability is bad?  My brain wasn't ready to make that stretch easily.

Having given it a ton of thought, and really re-evaluating what my whole push behind enterprise resiliency is all about - I've come to realize that this stability / resiliency tradeoff is actually quite intuitive... it's just that not many of us (at least not I) were taught to think this way.

What we're really saying is that stability is bad.  Let's look at just a few reasons why stability is a bad thing, and in future posts I'll do a more in-depth analysis of the whole situation to see if I can figure out what level of chaos is good for you, and when it all turns bad.

  • Complacency - When nothing ever breaks, we start to come to a mindset that nothing will go wrong.  This is what we call "being lulled into a false sense of security" except that you can replace security with stability and it works just fine. You start to hear executives and users alike say things like "we don't have those types of problems"... mainly because they've not witnessed them and believe they won't happen.  This is the type of trouble bridge builders run into when there are long periods without earthquakes on the west coast of the United States.  We start to fool ourselves into believing that nothing bad will happen in the future, and then start to have that creep into our designs.
  • Change resistance - When we have a long period of stability we start to develop a resistance to change.  This is human nature, and can be seen in organizations that have perfectly good working DOS applications and systems.  They don't change because they don't have a reason to do so... the existing system has never failed them, so they are resistant to change to another which may be more prone to failure that they aren't currently experiencing in their system.
  • Rigidity - Much like change resistance, rigidity creeps into organizations that don't experience any failure or chaos.  It is common to hear things like "this is how we do it"... Why? because it's always how it's been done and there hasn't been any reason for change.  It doesn't even matter that there are better, more efficient, more effective methods or means - processes and systems become rigid in order to maintain that stability at all cost - including productivity and agility.
  • Inability to recover - Organizations that don't experience regular failure don't learn how to recover.  I'm not talking about the staged failures that we perform on the stage of disaster recovery drills where we know what will fail, when, and how. These aren't really failures because we're not left scrambling and trying to figure out what failed and why.  It sounds nuts but when you have unpredictable failures that aren't staged you really learn to recover.  You don't learn exactly what buttons to push and how, but how to diagnose failures in complex systems, how to do seriously hard RCAs, and how to organize as a team across silo's or business units.  In short, when you don't practice failing for real (I have a talk on this) you will not be prepared when things go unpredictably sideways.

I'm sure there are more reasons, and if you think of them let me know.  I will do some additional thinking about this topic and speak to some very smart customers I have meetings lined up with - and write about it again in a few more days.  At least for now you have something to think about - and now you know what I'm thinking about.

The relationship between stability and resiliency is a delicate one, I have no doubt on that.  There is, I'm sure, a point where there is a balance struck between a usable system and one that is always in a state of expected chaos... but that's something I don't have the answers to just yet.

More soon...

Cross-posted from Following the White Rabbit

Possibly Related Articles:
Enterprise Security
Information Security
Enterprise Security Security Strategies Disaster Recovery Incident Response Network Security Systems Resilience IT Security
Post Rating I Like this!
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.