Business Down Driven Monitoring & Management
Written By: Matt Weaver
Its 2011, year of the Super Hero Movies (boo-ya!), where people are creating technology breakthrough’s daily enabling us to set new records for life longevity annually, seriously thinking of going to Mars, and enjoy air conditioning …and yet in the enterprise IT space I have never, in my life, seen a company implement a monitoring system completely conducive to the actual business process as opposed to IT system-centric, partially ITIL compliant solutions. In other words – folks implementing monitoring and management tools generally are thinking IT first, which makes sense if that is what they are well versed in. Consider then it is much more convenient for IT, and cheapest for outsourcers to boot. Taxonomies are built focused on individual technologies…unfortunately most business processes in this environment traverse numerous technologies spanning multiple file systems, servers, data centers, network segments, intranets, extranets, etc. – meaning you will need many groups, and possibly some external assistance from a customer/supplier/bank/3PL/etc.
It is often up to the individuals owning the business area to decipher what is going on in an incident – generally by running upstream and working through the various technology teams if something funky is happening with a transaction that does not process in the end system. When folks are troubleshooting this they may start and roughly run serially – did a DMZ get & process this? Check. AS2 Decrypt? Check. Post-scripts all run? Check. Pre-scripts run ok? Check. Transported to an EDI validator? Mediation Layer runs? Check. EDI translated into any formats you need? Check. EDI Router passes to EAI system? Check. Content based routing or enrichment ok? Check. Back to EAI? Check. Consumed by service on end system? Check? Pre-processing run? Check. Throughout all of the above is the metadata snagged and sent to a DB for track and tracing visualizations/reports? Check. Data to be archived consumed? Check. Even if you go backwards, it can take equivalent time.
When you traverse a multitude of groups of different technology, each one self-aware for the most part, possibly with different SLA’s not optimally aligned to the business, you begin to realize how long and painful the process can be. Just the discourse and time spent via a ticketing system can really add up (enter update, status, blah, blah, blah) for one group to put it in the system and another to accept, read, and react. Rinse. Repeat. This could happen over multiple teams, even if in the same organization, and causes the business to turn red in the face waiting to figure out where the problem is and immediately restore service. Naturally there is some parallelism that would occur – but it takes time to spread the word – support for one group (e.g. your Infrastructure group) may be next door, but your applications group may be on the other side of the world.
After all the effort just described, now the business owner still doesn’t necessarily know the problem. The value of knowing this, and associated the business impact immediately with associated context helps prioritize and identify impacted systems automatically and, in the end, reduce your downtime. So how do you start to move in this direction? Stay tuned and we’ll share some ideas – and thanks for checking us out!
