Many of the Services provided by a typical IT Operations organization requires a '24x7' availability for those services.  To manage those expectations, there are a variety of planning exercises, metrics, processes and many meetings.  When a service becomes unavailable, 'All hands on deck' is the battle cry until resolution.  After resolution, there is a series of sub-processes to document the event, identify root cause, identify opportunities of improvement, coordinate those improvements all in an effort to ensure it doesn't happen again.

With that said, over time, when the same people are involved with these types of activities (day in and day out) they will eventually feel the grind overwhelming and may slow down... some may even let their passion and zest burn out.  Although they may not acknowledge it, the impact is real.  For example, the focus on customer service or the impact on users or business function may slip a bit...  where they discount the impact as minor... 'the business will survive'... it wasn't so bad...  where the prior focus was a passion to improve and drive to operational excellence... 

Compare to team sports... people get replaced at times by a 'sub'... someone capable of filling in for them...while they take a breather... while they are on the bench, they step away from the action, get some water, take a break... sit down and watch others... observe... all very necessary so when they go back in, they play at the top of their game.  But when there is only '1' player that everyone relies on and they always have to be in the game, they will not be able to carry the team ALL the time.

Same goes for an IT organization...  you have to build up various levels of support and skills so that your senior most staff don't have to resolve everything or don't have to engage on every call/outage. That takes advanced planning, training and preparation...  coordinate efforts to have people 'practice' for the big game... they can't just be expected to be ready after a few knowledge training efforts.  People take years to become familiar with certain aspects of technology...  you can't just plug someone in and have them be successful...  takes daily, weekly, monthly preparation over long periods of time...

The most successful organizations have 'depth'... depth means you are able to have your '1st stringers' take a break... go on vacation...   spend time with their family/friends... go fly fishing... and the business keeps on running!  Everyone needs time away from the office... pursuit hobbies, push their mind in other areas... come back refreshed...  So... Do you have depth or someone to cover for you so you can step away for a week or?
 
 
Over the past decade we have seen numerous efforts of Infrastructure staffs to learn about “ITIL” (IT Infrastructure Library) and then attempting to drive these learnings into their environment.  There are successes and failures.  Why the difference in results? 

The failures seem to be a result of the 'focus'.  When the focus seems to be on the 'control' of others (ie application groups) rather than the improvement of delivery of services you tend to get off track.  When off track, the manner in which the planning, design and process changes are made and how procedures are updated--tend to be without the involvement of the people that use the processes.  Or stated another way, that lack of focus on the current state and needs of the business seems to alienate parts of IT and introduce 'dictatorial' approaches. 

“DevOps” seems to be evolving as a movement.  It has a variety of definitions based on how it’s being used.  My summary is that it’s all about communication and people focusing on the right thing…the business.  It is an effort to step outside the ‘bureaucracy’ created and start ‘anew’ with limited ‘necessary’ controls and shedding all the unnecessary.  Streamlining approaches so they are more ‘agile’ in nature.  Focusing more on the delivery of results vs the check of a box to satisfy a process requirement.  Eliminating the need for processes that manage other processes that manage other processes that really do the work.

Organizational shifts do not solve problems, communicating and rallying around visions and priorities solves problems.  So ‘shifting’ functions to another group will not make the process better…  the process owners and stewards are the targets and have the accountability to make things better… a few key thoughts to process improvement:

·        Be a good listener – understand the needs of the business

·        Focus on the ‘data flow’ and not the ‘org flow’

·        Minimize the unnecessary ‘wait states’

·        Manage by exceptions and don’t try to control things that don’t need control

·        Hold people accountable

·        Don’t implement punitive steps because ‘1’ person or group caused a problem. 

Your thoughts?
 
 
Will managing in a cloud environment require change in process and tools?

During some meetings with one of our primary vendors and our IT leadership we were discussing managing the ‘cloud’.  During the dialogue an interesting analogy was used in referring to the transition we are making.   The example was given of the ‘stealth’ bomber that requires about 2,000 adjustments to be made every second in order to fly.  No pilot can make that many decisions individually.  The pilot’s role is different and requires them to ‘trust’ their automation and instrumentation more than ever. 

We then talked about how managing the ‘cloud’ is similar as we have lost specific control on where an application is running.  With virtualization, that app can shift to anywhere within the environment.  At any given time, you may not know exactly which physical server it’s located.  We have many people that are afraid to let go of that kind of control.  But we need to get to a point where we worry less about the underlying infrastructure as our applications become agnostic to the hardware platform.  As we cross over that physical boundary, it opens us up to more agility and flexibility.  However, the transition is sometimes difficult and requires a concerted effort and focus.

Think back about that pilot.  How do pilots make the transition from one type of plane to another?  They don’t just jump into the cockpit and adapt while heading down the runway.  Rather, they incorporate a very critical training device called a simulator!  They spend hours and hours in that ‘safe’ environment that ‘simulates’ the ‘production’ environment.  They can practice and crash without burning!  Their reliance on the instrumentation evolves over time.  They move from ‘blind trust’ to more of ‘experience’ as they continue the training and the simulation.  The more hours in the simulator, the more confident they are and they better understand the capabilities of the new environment.

So how many of us provide a safe environment for our admins to really become familiar with the new environment?  Too often, we cram the learning into 2-5 day training and then force them to go live.  That’s a scary situation and can cause some real anxiety.  That’s why so many resist the change.  They just don’t feel comfortable with the ‘new environment’.   We must provide ‘simulators’ and plenty of hours for folks to ramp up.

Another point that was brought up was the failures of so many organizations that attempt to run the new environment like they do the old.  Same processes, same tools and same org structure.  Which result in not maximizing the capabilities of the new environment and technology.  Yet, if you tried to take the legacy environment and change your processes/tools you would break it.  So there needs to have an approach that keeps legacy running down its’ existing path-while consistently looking for opportunities to improve. 

The new environment, however, is going to require us to jump into ‘simulators’ and start testing out the new instrumentation and get to the point that we understand the new capabilities.  We then will need to build new processes—most likely very streamlined in comparison to what we are doing now.  These new processes should yield quicker turn-around times for the business and more standardized infrastructure.  And to be successful, we may need to tweek our traditional silo’d organizational structures.  Possibly create a separate 'Cloud Ops' group?

What are your thoughts?

 
 
(Read the full article:    Retail IT:  How important is a phone?)

Retail IT departments need to establish a strategy to take advantage of advances in telecommunications while maintaining simplicity and taking advantages of a variety of applications at the store.  As we witness rapid advances in telecommunications, internet and mobile communications there are opportunities to enhance the customer experience, reduce costs, improve overall communications and efficiency at the store.  However, if not careful, stores can be easily overwhelmed with complexity.  The store requires IT to simplify the use and maximize the functionality of technology in the store.

Here’s the summary page from the article:

Although working at a store is not a prerequisite to understanding the store, a lack of that background does require listening skills and a genuine passion about the store!  Time needs to be spent in stores and take the opportunity to ask the right level of questions which will increase awareness to important store functions that help them better serve the customer.

The telephone system is capable of doing so much more for a store than generally assumed.  It can be used as a platform for applications that integrate with it thereby taking advantage of business intelligence regarding customers.  These applications can also be used to assist in better managing the staffing, the customer and the inventory.

Some key points: 

·         Ask questions “How? Who? When? What Benefit?”

·         Validate assumptions with the store, listen and observe!

·         Validate the ease of use in new phone systems

·         Understand what features are used most at the store

·         Integrate the phone system with other business apps

·         Using Call reporting can help improve store sales

·         Integrate the use of smart phones in addition to mobility

·         When evaluating phone systems, use checklists
           o   One for the store
           o   One for IT
           Both need to be satisfied as requirements vary.

There are always choices regarding which vendor to use.  The key decision criteria should not be limited to the lowest bid or biggest vendor.  Focus should be on demonstrated results and the ability to understand the store environment.  Retail companies are always looking for an edge.  Partner with companies that stand behind their commitment to deliver results, with ease of use and that provide a distinct advantage or provides that ‘edge’.

A final comment:  Be the store!  Keep the vision and communication process simple and consistent.  Visit the stores regularly and help out.  Experience and understanding is gained by performing some of the store tasks.  Methodologies and Frameworks are there to help-but wisdom is required in order to correctly apply the right processes and technology in stores.

(Read the full article:    Retail IT:  How important is a phone?)

 
 
This month’s blog is all about simplicity of vision.  I zeroed in on the retail environment and wrote up a non-technical white paper that gives specific guidance on being ‘store focused’.   Management and leadership go hand in hand-but they are different.  Yet, the people that follow or are supervised need to know what you stand for and what your real objectives are:  Help the company be successful?  Help yourself advance your career?  Help them succeed?  Gain experience to get the next job?

Here’s the summary page from the article:

There are so many demands, details and challenges running an IT department.  Step away from the details and see what ‘themes’ exist in your organization.  Do those ‘themes’ align with a store focused culture?  In this document we have outlined some key principles that will help you lay your foundation: 

·         Positive Culture
·         The power of terminology
·         Seeking business value
·         Simplify how the store interacts with IT
·         Teamwork in IT as you focus on the store
·         Architectural framework that is flexible to address uniqueness
·         Service Levels that align with objectives and critical elements
·         The importance of metrics to manage and to communicate
·         Follow through and reporting back

Be the store!  Keep your vision and communication process simple and consistent.  Visit the stores regularly and help out.  By performing some of the store tasks you can get firsthand experience and can better relate to their difficulties and challenges.  Methodologies and Frameworks are there to help-but correctly apply to your environment.

When you are at the store, be a good listener and avoid being defensive.  If you run into a situation where the store is extremely frustrated, take notes and seek to understand why they have a high level of frustration.  Remember, that when you are on the front lines and get frustrated, the frustration is on the situation, not you-help relieve the pressure creating the frustration!

A final comment:  There is nothing more invigorating than watching staff members step up and take initiative on driving improvement in the store environment.  There are always opportunities for improvement.  To see them, you need to be standing at the right vantage point and your vision needs to be clear.  Vision clarity is maximized when you are part of a positive culture where people work together in a creative manner for the success of the stores!

Read the rest of the article:  Retail IT:  Store Focused or Enterprise Focused?

 
 
Do you really think you can ‘Run your data center from an IPhone’?

Article from information week: 
http://www.informationweek.com/news/software/soa_webservices/231601915


The title implies you can run your data center from a mobile device—but rather what they are talking about is a vendor that has built a tool that enables a mobile way to restart services or servers ‘remotely’ that would be simple and convenient-giving you complete access to all your systems running in the data center.  Although just Microsoft at the moment, they apparently envision going much farther into other technologies.

When using the term ‘remotely’ I’m talking about an individual not being at the location that they normally perform their work.  We have had engineers supporting data centers from remote locations for years.  However, they are generally at a desk and have access to various monitors and systems.  So as we say remotely, I’m picturing that Sr Engineer sitting with their IPad at their child’s soccer game.

Actually running a data center has a multitude of aspects that can’t be performed remotely (staffing, vendor coordination, equipment issues, etc.).  However, this is a ‘cool’ aspect of technology that implies that you can do ‘some’ things from where ever you are—and I agree there is definitely some benefit for being able to see and execute certain things remotely.

As an aside, from my experience, I am always pushing to have staff work very hard when they are on the clock and when it’s time for personal time-I like them to focus on their families and personal life.   When there is a critical issue, yes we need to engage staff and possibly have them leave their family barbecue.  But there is a work life balance that needs to be in your planning and if it is not, your people are prone to make mistakes.

For example, when your Sr Engineer is at their child’s soccer game and there is 3:48 seconds left in the game and the team is down by 1 and his child is driving down the field to score-- and they get an urgent alert and they immediately decide to restart a server instead of just a particular service because they aren’t concentrating on the task as they normally would—thus negatively impacting the situation.  This is life and we need to build processes that enable people to be away from work.  Again, if it’s an urgent situation, then people talking through the situation while focusing on the task at hand is important to approach to the right resolution.

I see a greater value to jump on a conference call and talk thru situations when people are remote because they may be not as focused until you get them on the phone.  You’d be surprised how many outages occur because someone is hurrying to get back to the game or get to a family commitment so they just ‘do it’, ‘push the button’, ‘initiate the upgrade’, etc. when they haven’t taken the time to validate.

On the other hand, if the solution to a particular problem is to just restart a service or server then build that into your automation scheme and send an informational alert that it occurred and that the engineer should follow up at their convenience--after the soccer game or when they get back to their desk. 

Your thoughts?
 
 
When you introduce a new service should you do anything different than you do when you just upgrade something?  If the new service is good for the users, but they push back, how do you make progress?

Let’s take the evolution of email over the past 25+ years as an example of the maturation of a service and increasing expectations.  This may seem like a history lesson for those who never new that mobile phones use to mean that you had a 25 foot chord that you could stretch into another room or that the only video game used to be just an electronic ping pong game and social networking meant going to a dance and watching the guys stand on one side talking about tough guy stuff while the girls stood on the other side and wondered why the guys weren’t asking them to dance.  However, history lessons are good because patterns exist and they repeat.  Learning to find the patterns from the past and applying them to the present is a talent that you should develop in either yourself or your staff. 


Email started as a technology for the ‘computer people’--an application that few in the rest of the company wanted to use.  It was part of the system that you already had up and running for the ‘other stuff’ that justified the purchase of the computers.  Seemed like a cool ‘gadget’ that they could do without.  If people needed to communicate they would walk over and talk to the person or pick up the phone.   Besides that, most people didn’t have typing lessons and it took longer to type a simple message than any other form of communication.  So where was the value?  It seems easy to understand the value now, but answering that question back then and helping people get started wasn’t easy.  Here are the specific objections we were confronted with:  Email was new.  It didn’t show hard dollar savings.  It seemed to add cost.  It added steps to the way people functioned.  It was impersonal.  It was complicated.  Do these seem familiar?

So, back then, it didn’t matter if email was down once a day for a few hours.   We could take it down whenever we (IT) wanted because no one had integrated email into how they got work done.  

So how did email take off?  Demonstrated Process Efficiency!   We took the inner-office memo process and reduced the number of steps and time and people it took to deliver a memo across the company.   A ‘memo’ was something created by a manager.  They would write it by hand or record it on a tape recorder or dictate the memo to a secretary (where the secretary came and sat down in the office and wrote down what the manager said in an old cryptic language called ‘shorthand’).  The memo was then typed up.  Then copies were made and put into inner-office envelopes.  The recipient’s names were written on the outside of the inner-office envelopes.  The envelopes were then picked up from the secretary’s desk by a mail clerk who then sorted and delivered to the various recipients.  (Often, someone wouldn’t receive the ‘memo’.  Yes, that was a common occurrence and not just a joke!)

So showing how you can eliminate all the copying, stuffing, sorting and delivering while including a new benefit that you can validate who received the memo was a true demonstration of value.  We took a major memo creating department (not the biggest, but substantial) and started them on email.  Once they started talking to their peers in other departments, the demand grew rapidly for all the departments to be brought on to email. 

Then email began to take off as an internal communication tool and then it expanded to people outside of the company.   But that comparison took the “IT” person to thoroughly understand the business process and the value that existing process provided.  Then be able to see which steps of the existing process was not adding true value—those that could be replaced by technology and the benefit to the business was realized through reduced cost and improved efficiency.  BOTH were important in order to introduce a new technology!

Email did have some competition—the facsimile machine!  (It was a copier like machine that sent a copy of a document to another fax machine across a phone line).  Yes,’ faxing’ still seemed to be a preferred tool over email for many people. Especially from the vendors that were selling all the fax machines.   Sounds crazy but it was used as an alternative to email.  And it is still in use today by many pharmacies and doctors and attorneys.  Interesting!

It was primarily the secretaries (who did most of the manual work) that really understood that it was faster for them to email to many than it was to fax to many.  However, if a secretary was concerned about their job, they were slow to adapt for fear they wouldn’t have a job after the conversion.  Email utilization continued to grow and the reliability requirements increased as people relied more and more on that ‘service’ to get their work done.  The expectation became like ‘the phone’ where dial tone just seemed to always be there.   Hence today’s expectation:  Email should never be down!!

And now, email as we know it could be on the downtrend of its lifecycle.  Sounds crazy but as I write this you have college kids that spend more time texting and tweeting than they do on email.  They email to the ‘group’ that doesn’t text or tweet—their professors, their parents and future employers.    Sounds like how ‘faxing’ is still around. 

Although that transformation occurred over a number of years, today that same ‘process’ of introducing technology must go through the same steps, but the timeline can be (and often is) over a matter of weeks and months.  I call this out because in the world of fast evolution, although you are pressured for speed, you must go thru the same steps—you just need to reduce the cycle time.

So let’s walk thru the steps from the email history lesson and extract the steps of introducing a new service:

1.    Identification of the new technology (Email…)
2.    Proof of concept (POC) - IT validation to understand how it works, rough order of magnitude on the potential user base, the costs involved to deploy and support

a.
   
Understand the alternative ways to deliver the service and there value (facsimile machine)

b.    Filter through the vendor hype.  Understand the strength of support from the vendor.
c.    Compare to the industry, where is it already in use?  Check references. 
d.    SEE IT IN PRODUCTION somewhere else (unless you are a cutting edge environment that likes to take risks and OK being the first to use something—basically a learning center for the vendor)
e.    Are there already ways to get the work done or receive the business benefit you have identified?  You may discover that you have another application/service that may be used in a manner you didn’t think of—getting better utilization of an existing asset.
f.     Seek for the business value and be able to understand what changes to the business will occur (positive or negative)
g.    How much customization is required?  Can you use a standard configuration?  Big warning on new technology—if you have to customize too much you may be susceptible to turning a simple opportunity into a complex project with high risk that may never get off the ground.  We call that building a ‘concrete airplane’-  Very strong but may never get off the runway.

IF POC is successful continue, otherwise back to step 1

3.    Validation of business value (hard dollar… increase of revenue, reduction of cost, increase in efficiency—the memo example)

IF Validation is successful continue, otherwise back to step 1

4.    Pilot to validate the business value assumptions and deployment plans

a.    While this is happening, the awareness of what is going on must be communicated to all appropriate future users.  Begin the user buy-in process early.
b.    Build your deployment plan, cost and timing estimates.  Work to build in a phased roll out to limit business impact and reduce potential risk.  Automate as much of the deployment as possible.  Use this to deploy the pilot.
c.    Validate vendor support structure
d.    Validate your training/adaptation assumptions.  How easy is it for users to adapt?
e.    Hold tight to the scope and benefits.  Avoid making changes that will take longer to deploy—add those changes in to a phase 2 deployment.  Go after that 70-80% of the business value rather than pushing higher. 
f.     Business groups/users that will use the service must be part of the analysis of the results in this phase. 
g.    Avoid being so emotionally attached to the service at this juncture that you can’t walk away. 
h.    Push for the true validation of the benefit identified.  Be a business man that has to write checks out of your own checkbook—if the benefit isn’t there, kill it now!!  Don’t continue and reach for more soft savings that ‘may’ occur—we call that putting lipstick on a pig.  (Not that a pig isn’t a good thing—just have to know when you want a pig and when you don’t)!

IF Pilot is successful continue, otherwise back to step 1

5.    Continue with deployment

a.    Automated deployments are always preferred.   Reduce as many manual steps as possible.
b.    A phased roll out is always preferred and must be part of the planning process.  At this stage you should not be trying to retrofit a phased roll out plan. 
c.    Monitor training and make appropriate adjustments as you learn.  Sometimes less training is required, so scale down
d.    Communicate progress until deployment is completed.

6.    Validate user acceptance

7.    Validate and demonstrate business objectives met

a.    Most environments do not follow through on this step.  However, this can help you learn and improve your process.  You want to get better—don’t be afraid of the results.   It will help you the sooner you find out than waiting for someone else to tell you later.

With new technology you want to keep it as simple as possible.  Target that 70-80% of the potential benefits.  Focus on getting that out ‘phase 1’ as quickly as possible while understanding your ability to provide stability and availability.   Again, communication on expectations must coincide with the maturation of the application.
 
 
An excellent vehicle to stimulate improvement in the organization is something I call an Event Review.   The objective is to extract learning’s and your own best practices that stem from good things or bad things that have happened.  Then share those learning’s with everyone for the benefit of the organization.

The ‘Event’ could be anything:  A project, an outage, a deployment, etc. but most important is the environment or culture you establish.   You need to reinforce the benefits of the learning’s.  When it is a positive event you are reviewing, people like to be recognized-but do it live and as quickly as possible—don’t wait for an event review. 

The ‘Review’ itself should be held as soon as possible after the event has occurred.   In the case of an outage, I recommend within 48 hours as details get forgotten the longer you wait.  The key here in the timeliness is that you catch the important details that are still fresh in people’s mind.  The longer you wait, the more that is lost. 

You may find you don’t have enough time in the day to perform all the reviews you’d like so you’ll need to prioritize.  Focus on what is necessary for your organization at ‘this time’.  Adjust as you mature and as time permits.  Start slow and make each one meaningful. 

Many will be weary that your event review is a witch hunt and it may take time to build up the trust.  However, to reinforce that trust you must separate the event review from any management issues that you may need to deal with, such as people not following policy.  To get to the benefits, it will take time to build up the required trust within your staff and you must respect the process and be patient.  Focus is on ‘what’ went wrong, ‘what’ went right and ‘what’ can be done to improve or repeat the success.   The ‘what’ is important and avoid the ‘who’ during the event review--especially when you are reviewing an outage or event that had a negative impact on the business or the organization!

Participants 
Assign a facilitator to lead the sessions-generally someone from your problem management staff or management team.  Invite anyone that had direct involvement in the event.   It could include IT staff as well as non-IT staff. 

The facilitator should prepare for the session by assembling as much information as possible.  This could include the timeline, decision points, start time, end time, etc.  This will help keep the discussion on topic.

There are some people that should not be invited to the event review and that is primarily the ‘brass’ of the organization.  If you really want staff to speak up then the ‘safe environment’ should be void of senior management.  Even the best of the senior staff seem to want to chime in or over react and it will send unwanted signals to the participants.  If the objective of the senior management is to provide praise, then they should do it elsewhere—not in the event review!

There is often extreme pressure from some senior staff members that may even demand to be in attendance.  If that occurs, then you have an opportunity to clarify the objective of the event review and separate out any objectives that the senior staff member may have.  You can provide them a ‘separate’ session and review the output of the event review—work to maintain the event review for just those that were directly involved.  At the same time, respect the needs of your senior management—it’s the ‘wants’ that you need to manage.  And yes, there have been many times that I have been in that situation and almost fearing my own job by not allowing my boss to attend an event review.  And I remember a number of times being requested for a name because someone’s head had to roll.  Again, we need to separate the management issues from the event review process. 

Agenda
Begin by documenting the timeline of actions/events related to the event.   What steps were taken leading up to the event?  Was the plan followed?  What validation occurred?  Literally document all activities leading up to the event as well as during the event and the steps that closed the event.   In the case of outages, often time is spent during the restoral of service to try and re-engineer something or fix something else that delays the restoral of service.  Flush those activities out during the review.

Once you have the timeline documented, you now focus on 3 key things:  1-Do we know ‘what’ was the root cause of the outage.   2- ‘What’ can be done to prevent the outage from happening again.  3-If it does happen again, ‘what’ can be done to resolve it quicker?  

After you have documented these areas, specific action items for each area must be captured with specific timeframes and owners and a process to follow up on the action items must be in place.  If you leave the event review without action items, you will not see the improvements.  The action items must be meaningful and have a documented benefit.  It is better to have fewer quality action items that can be achieved.  Giving someone an unreasonable action item that can’t be completed will mean you’ll get a lot of bad reports on status.  Follow and close out all your action items—it will be a negative reflection on your management team if action items continue without closure.

A final item to cover in the agenda is key learnings from the event.  This is best done at the end of the session.  You should open this topic with an overall summary of the event based on the prior agenda items and then step back and open up the discussion for some observations from the group.   This dialogue is often the most beneficial as it can provide great insights into the group and what they really learned from the event.  Don’t interrupt!  Let the participants speak and engage each other—just observe and then document the summary of the key learnings!

Always stick to the agenda and keep the discussion focused at the right level.  Try and keep the session crisp and to 1 to 1 ½  hours.  Separate meetings can be used to drill down on further analysis if required or designing solutions to re-engineer something.  That activity should not be done in the event review but rather, can be assigned as an action item.

I can’t say enough about the importance of building the foundation of the event review around the ‘what’ rather than the ‘who’.  When you use this review to zero in on the ‘who done it’ you will lose the openness and candor so important for continual process improvement.   Follow up with individuals is necessary, just be careful not to use the event review process as the vehicle.  Otherwise your ‘event review’ will be perceived as a visit to the principal’s office and that means people stop talking and only tell you what they think you need to know.

Another negative factor will occur when people feel they will get in trouble if they are mentioned in the event review is that they will spend more time pointing fingers and doing what they can to ensure the finger isn’t pointed at them.  Again, another unproductive activity that brings negativity into your environment and must be replaced with a true focus on the business!

The event review is truly a powerful tool that can help you mature your organization.  To do that you must prioritize the long term benefits over the short term witch hunts! 

 
 
If you are the person that gets the calls from the business, the CIO, the CEO or any other senior executive when something isn’t working you are going to want a solid change management process backing you up.  Although all processes are important, what differentiates a mediocre IT organization and a very efficient and effective one is the maturity and efficiency of their change management process.

I want to drill down in detail on this topic because it represents the health and strength of your organization.  This is all about respect.  Respect for peers, the company, the environment and the leadership.  You will not always be present when people will need to act.  How they act is a direct reflection on the leadership of the organization.   The goal is to heighten the level of respect so that when mistakes happen, they are just mistakes.

Each change faces these questions:  Why should we make the change?  Who needs to know?  Who’s impacted?  What’s the benefit?  What’s the risk?  What’s the cost?  When should we do it?  Who should do it?  What precautions should we take?  What training is required?  How do we support the change?  What if this doesn’t go as planned?   When the people in the organization maintain a high level of respect, they will ensure that those questions get answered for you.

Another key aspect beyond respect is that the process owner of change management needs to maintain the vision of change management:  That the primary objective in change management is to address the needs of the business:  Help the business be responsive to the market and increase revenue and profit.  Changes will need to be made to update tools, update applications, implement new applications, update equipment, test some assumptions, try new things, etc.  The change management process is there to help make these changes in a way to maximize availability during the process of transition, perform the work expeditiously and ensure proper knowledge exists in order to ensure supportability.

From a simplified view, these are the basic steps to change management? 

1.       Understand the business value of the change and collaborate appropriately to obtain buy-in

2.       Identify who is impacted by the change and ensure appropriate coordination occurs

3.       Schedule when the change will go in

4.       ‘Plan’ out how the change will be deployed and what the back out plans are in case the change doesn’t go right

5.       Test the change

6.       Deploy the change

7.       Validate that the change works as intended

8.       Updating appropriate documentation/system configuration/users 

Those 8 steps are simple enough and, at times, they can all be completed within a very short time.   Some steps can be omitted as in the case of simply replacing a redundant hard drive in a rack.  However, always use discretion as you determine when and what steps you are skipping.  The reason you would skip a step is because the step is not adding any value, rather than you don’t have time to perform the step. 

Often I see environments that have changes being made in the production environment and the only person aware of those changes are the people making the change.   This becomes apparent as outages occur and you eventually identify that the change was the problem.  This can occur when you have an overdesigned process, or an infrastructure group that is not responsive, or ‘change management’ has been distributed to various groups.    

When you find yourself in the situation where not everyone is following your change process, stop and simplify.  It’s more important to have a simplified process that everyone uses and all are aware of the changes being made than it is to have a process perfectly documented, perfectly engineered with numerous checkpoints that no one is following!! 

 
 
Daily Ops Call A daily conference call should occur each morning of a normal business day-generally Monday thru Friday.   The primary purpose of the call is to make sure your staff is informed as to the current state of the operations and any major events to occur during the next 24 hours.

The facilitator of the call needs to be carefully selected.  This person needs to be a leader and be able to demonstrate control.  They need to be well respected and should be at the right management level.  They need to ensure that the call sticks to the agenda and covers what is most important.  They need to make the group feel comfortable and also set a tone of punctuality and brevity.  Some people will want to talk and take over the call.  Think of this call as a radio show and the last thing the host wants to do is allow all the callers to take over the show—results would be chaos.  So the facilitator needs to allow dialogue for updates on each of the outages, but when the dialogue turns to problem solving or pointing blame, this is not the forum and the facilitator will need to cut people off (tactfully) and ask them to take the dialogue offline.  Consistency is also important and I’d recommend sticking with the same person for a year at a time.

The duration of the call should be 15 minutes.  There may be times it will go longer, generally on Monday when your covering activity of the weekend or if you’ve never had them and are just starting out.  But work aggressively to keep the call to 15 minutes.  The discipline around the time is important as it helps you keep the dialogue to the point and it sets a tone for everything in your operations.  The best time to have this call is 8am in the morning.  If you have staff spread across various time zones, you’ll need to figure out the best time zone to anchor to.  Always start the meeting on time or even a minute early!! 

The participants should include your management team, support staff, field staff (especially important when you have staff in various locations) and whoever else would like to attend from IT.  It is also beneficial to have your technical architects and senior engineers either on the call or receive output from the call (I explain later why this is important).  But the call should not include people outside of IT. If the business needs an update, other communications should be established, do not use your daily ops call for that purpose. 

The agenda should include Outages in progress, Updates on outages that occurred and resolved in the last 24 hours (or over the weekend on the Monday morning call), any major changes or events that are going to occur in the next 24 hours and any follow up actions regarding the outages.  It is very important that you stick to the agenda and not deviate from the agenda.  You should avoid using the call as a way to resolve outages or discuss what should have been done.  Those discussions should occur offline.  This call should focus on awareness of the current state and whether you have the right people engaged.  Then appropriate follow up throughout the day.

Following the call you should distribute a daily operations report to appropriate IT staff.  Depending upon how you structure the report, it may be a good communication tool to people outside IT.  The report should communicate the current state of outages in progress as well as updates from the past 24 hours.  The updates from the ops call should be included as well as other information you deem beneficial that may not have been included in the daily ops call.  For example you can include major project updates, especially if you have a large deployment in progress.