Ensuant's Blog

Ramblings about the Enterprise, Applications & Information Technology

CEP Performance Tips part 2

leave a comment »

Written by: Matt Weaver

Last week we offered some general tips and thoughts on vendor agnostic Complex Event Processing (CEP) performance. I’ll expand on that this week with some more thoughts – and broaden the scope a bit. CEP is NOT just a technical problem – you obviously need people and some very pragmatic, effective processes – which are as unobtrusive as possible. Below are some technical, other procedural, items to consider which may impact the scalability of your enterprise CEP initiatives.

  1. One general philosophy applies in CEP – “less is often more”. If components are not needed to deliver the project and its business goals, whether objects in the ontology, elements in an event, number of concepts needed, they should not be used or introduced. This sounds easy enough, but the mind-set shift needed from traditional app dev to CEP can be difficult for some. We have observed, and helped remedy, several projects by simply scaling back unnecessary ‘moving parts’, and re-engineering what was essentially simplicity & common sense. Remember – you can add extensibility into your schemas and add more content if needed, so no worries.
  2. CEP Vendors utilize various threading models in their different CEP offerings. It is important you understand different thread types, why they exist, and how to leverage them for your project’s benefit.  Ensure your threading settings are proper for the HW or virtualized server(s) you are using
  3. If you are deploying in a private/public cloud, or virtualized OS’s, ensure you keep a close eye on the controllers. Depending upon the automation tools you are leveraging it is possible issues could arise during peak traffic periods, specifically if multiple apps are handling peak periods at the same time, while sharing the same provisioned network controllers. I have seen this before, albeit not related to a CEP system.
  4. Understand the impact to the underlying network as touched on in the last post. Are you handling events at lower OSI layers via multicast vs. unicast? Depending upon the amount of DCs, LAN segmentations within a DC and your networking equipment this can make a big difference. Relatively straightforward – but another thing to check, as there are numerous messages being sent across the network as underlying system advisory messages which you may never see in the logs.
  5.  Performance has a direct correlation to uptime, and minimizing down component’s mean time to restore (MTTR), and extend mean time between failures (MTBF) for system components is key. A well-documented system where engines, projects and deployments are captured in a ‘single source of truth’ document is invaluable to the support team, and reduces MTTR figures. With proper problem service management (e.g. leveraging parts of ITIL), covering problem management for RCA/systemic problems should help extend MTBF.  These documents may also provide ‘Current Best Approaches’ to remedy issues without having to escalate to your final level of support resources/engineers. This is a set of living docs.
  6. In addition to the production information captured for operations and service management, it is critical you capture and document assets involved in your CEP initiatives. Developing the proper asset catalogues, and ensuring they are living documents kept up, and included in standard change processes, is important for objects, events and relationships, and ultimately the success of your CEP initiative long-term.
  7. Ensure you understand all the caches you are using, and what you are using them for. Your L1 cache sitting on a CPU for highly requested data is great, but you can’t use too much. The cache built into the CEP engine located in the various JVMs, depending upon vendor, provide more space than a multiprocessor, is faster than an in-memory cache, but obviously cannot handle as much data as an in-memory cache such as Oracle Coherence.
  8. Is your test data good? Are you able to stress test with good data from the business, representative of peaks to be seen in production? It sounds like common sense, but you should know by now that common sense is uncommon. I have seen the business off by a factor of 10,000% in one case in BAT testing (vs. production data), which crippled a really well designed ERP system for over a week while it was being triaged.
Advertisement

Written by ensuant

July 15, 2011 at 8:12 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.