Archive

Archive for August, 2009

Coding Season

August 31st, 2009
Comments Off

Earlier today when I took my bike to work I realized that our short summer is now definitely over. It’s windy and 13 degrees with some rain and that’s not optimal conditions for a pleasant bike ride.

The nice part is that it feels more ok to sit inside and code during the fall and winter than in the summer. It’s generally unpleasant to be outside during the winter around here so one can spend more time coding without feeling that there’s another beach that must be visited.

We do a part of the ruleCore technology stack in Java. This winter there will be a lot of JavaEE coding going on. Here we use a combo of JPA, JSF, SEAM and ICEfaces to produce server-side logic and user interfaces. One thing that annoys me is that everything about this kind of development is so complex. It seems that we need to start off by assembling bits and pieces together before we can actually write any code. Although the resulting setup is both powerful and flexible, but it takes hours and hours to get everything configured. It seems that we developers are given half ready tools which would be more suitable for a tool vendor than programmers which are trying to write code to solve business problems.

I understand why some code in .net as everything seems to be more polished in that world. But sadly there are no sane execution environments for .net code so we are stuck writing code in Java as there are all kinds of nice options for deployment.

Sometimes I feel like returning to the roots and use emacs and a C compiler as my only tools. I’m just looking for a good excuse to write some CEP code in C. Plain C with pthreads running on a a large box can do some wonders when it comes to performance. I would love to spend some time doing some old time systems programming for a change.

 

 

 

Share/Save/Bookmark

Complex Event Processing

CEP Patterns – Draft 0.1

August 25th, 2009

[I have started to write a short article on patterns in CEP, here's the first draft and comments are most welcome.]

A very common and interesting problem in event processing is the detection of complex events.

As the Complex Event Processing (CEP) field is pretty young we still have some problems with terminology. You will find different terms used for pretty much the same thing. There’s “event patterns”, “composite events”, “event correlation”, “complex events” and “situations” all used to describe the same concept.

Although the main idea is the same – Allow users to describe a specific and interesting combination of multiple events over time.

What differs is what kind of combinations the user can describe.

 

The existing tools and languages gives the user different ways by relating events, for example:

  • logically (and, or…)
  • temporally (within 10 seconds, no later than 10 seconds after X)
  • semantically (all events must relate to the same business entity).
  • spatially (events in the same geographical region)

The goal is to provide the user with an easy way, graphically or textually, to describe complex combinations of events. Some tools prefer graphical tools, some uses XML and others use something that look more like a programming language.

 

I will use the term “situation” to describe this complex combination of events. Thinking of situations is in my opinion the most user friendly approach to this problem. A situation starts at some point, develops over time and is detected when a last triggering event occurs. The situation consists of multiple events and you might want to track the progress of the situation as it develops too. A neat and easy to understand concept as it is similar to how we normally think about everyday situations.

 

If you don’t like the term situation, then you can just replace it with one of the other terms. Many prefer other terms too. Aleri and StreamBase talks about event patterns. IBM’s Amit uses the word situation too. Esper (BEA) talks about patterns and I think Apama does that too. The term “composite event” is used more in the active database community. So products which has their roots in the Event-Condition-Action (ECA) paradigm of the active database world would probably call it composite events. I think that AptSoft have been calling it “event correlation” and so does many systems monitoring tools.

 

So, what kind of situations would you be interested in detecting and thus in having an easy way of describing:

 

  • Any number of events in a set – This is the equivalent to the “or” in any programming language. It’s common to require the detection of a situation consisting of any/n number of known events. A typical example would be to detect events which all indicate an error. You might like to know when ERROR or BAD_ERROR or FATAL_ERROR have occurred. A little twist to the normal semantics of the programming language or is that you might like to see n number of the possible events before considering them to describe a fully developed situation. For example, you could require that A or B or C or D occurs but add to that you really like at least two of them to occur.
  • All events in a set – Continuing with the programming language metaphors, this is the equivalent to the ‘and’ operator. A situation can consist of a number of events which all must occur before we consider the situation to be detected. The situation will obviously start with any event in the set and will be considered detected when the last event occurs. In between we can see that the situation is developing and might be interested in the degree of completion of the situation. If 2 out of 10 events have occurred, we might like to know that there is still 80% to go. But if we have seen 8 out of 10, we might be building on an application which could benefit from an early warning saying that it is probable that the situation is soon detected.
  • Sequences – A sequence of events might be the most common situation used in different CEP applications. A sequence starts with a first initiating event and is detected when the last terminating event occurs. Sequences also develop over time. We have the first event in the sequence as the starting point and can track the progress of the sequence as it develops much in the way as described in the previous bullet. A situation described as a sequence is detected at the point in time when the last event occurs. Actually there are a number of different types of sequences that you might be interested in detecting, but I won’t go into the fine details of sequence semantics in this post.
  • Non occurrence – It’s not only events that your event processor receives that you might be interested in. Missing events can be equally interesting. Thus there must exist a good way of describing situations which consists of events that did not occur as we expected them to. In order to describe an event which does not occur you must always include a deadline. It would make no sense to say that a situation is detected when event A does not occur. For how long would we then wait until we can conclude that A will never occur? What you really would like to say is that the situation is defined as “trigger when A does not occur within 10 minutes” or something similar. The deadline could be an offset from a known point in time or a fixed time such as lunch time. 
  • Rate – Too high or too low. A common scenario in operations monitoring is when something happens too slowly of too quickly. Many other areas will also benefit in knowing when something happens at an unexpected rate. The tempo of orders to the sales department could for instance be used as an indirect indication on how your web shop is doing. Rates are commonly specified as X events over a period of time. For example 1000 events during the last 10 minutes.
  • Related in time – When you need to detect multiple events which are related by timing constraints you are usually out of luck if you are trying to solve the problem using your traditional tools (like SQL databases). It’s also hard to create custom code to do near real-time tracking of thousands of events and their temporal relationships. No, don’t even think of using one thread (ok, maybe in Erlang) for each situation as there can be thousands and thousands of these at any given point in time. Typical timing constraints on events which make up a situation is that they all occur within a certain time frame, one event occurs within X minutes relative to another event or that events occur during a specified period of time or on a particular day.
  • Semantically related – Many typical event processing applications are fed with events describing something that happened in/to/at some kind of entity. Like a room, a server, a person, a car, a business process, a web service or something else identifiable. As you can see the entity does not have to represent something in the real world. A Unix process can be considered as an entity even if it’s just an abstraction inside a CPU. Situations are typically specified to only consider events which are somehow related semantically. For example originating from the same room and describing something a known groups of persons just did.
  • Spatially related -Events originating from the same area of from entities close to each other. Or other (geo)spatial conditions. Related by being on the same road, city or other area.
  • Trends – A trend describes the tendency for something to go in certain direction. A typical situation found in systems management is when disk usage increases more than 1% in a week. Here we are interested in detecting a situation which will develop during a long period of time. When describing a trend you can have a known “normal”. Your situation detector must have a way to start from this normal value and track progress as your situation develops. A more advanced solution would help you figuring out what normal behavior is. Historical events could be used to calculate your base values. You could for example conclude that 1% increase in http traffic is normal each month and that’s what your hardware upgrade plans can cope with. Here you would like to detect when the increase is faster than that any month.

     

To summarize I see that the following situations/event patterns/composite events/complex events are very common:

  • Any n events in a set.
  • All events in a set 
  • Sequence of events.
  • Non occurring events.
  • Event rate 
  • Temporally related events
  • Semantically matching events.
  • Events describing a trend.

      

We should really give these cool names as is common for software design patterns.

 

I’m starting to build a catalog of commonly seen situations and these are my starting point. you are welcome to add to this list. I’ll only add situations to this list. There are many other things you can do with an complex event processor which are outside of my scope for now…

 

 

Share/Save/Bookmark

Complex Event Processing

Up to Speed

August 10th, 2009
Comments Off

Now it’s time to start working again after a nice summer break.  I’m trying to catch up on my google alerts and see if there’s something interesting.

First thing that I found is that Oracle now has support for registering queries in 11g. If I got it right you can now dump in a query into the db and get a notification when the result set of the query changes. This is really really cool. It might be one of the most interesting things related to CEP this year. Now you can basically build your SQL based CEP stuff in Oracle’s db.

On the business front I notice that Senactive have been acquired by a company called UC4. So one less CEP player left then. It also seems that Vhayu have bee acquired, -2 CEP players then…

There seems also be a new report from Forrester which many CEP companies quote favorable parts from. It’s always a good sign then these research firms thinks there’s money to be made on writing about CEP.

Paul Vincent from Tibco have drawn a nice CEP history timeline which gives a good overview of the CEP world as of today and some perspective as to where the products come from. Although we started a bit earlier (in 2003 if I remember correctly) than the illustration suggests. Actually first design drafts to ruleCore are from 2001. It’s nice to look through them and see that some ideas have actually survived to today’s product.

From Opher’s blog I notice that CEP is climbing on the hype hype cycle as seen by Gartner. If I get the graphics right CEP is some 5 years from mainstream according to Gartner. CEP vendors does not seem to agree if you read comments on various blogs. No surprise there. But it is about the same lines as we think over here, we have for long predicted 2012 as the year when CEP starts to take off. Until then, it’s wise not to burn too much fuel if you like to stay in business. Meanwhile, it seems that at least some argue that CEP vendors need to treat developers better.

My summer quiz seems to have attracted zero (0) interest :( So I don’t even bother posting the answer.

I read a lot about Twitter in the CEP blogs now. Is it just me, but I really can’t understand the greatness of this Twitter thing… My guess is that in two years it’s forgotten and placed somewhere among the rest of the hyped up websites that pop up now and then. Before that the founders will rip some VC off and move to some warm cosy place. And for CEP, not interesting at all from where I stand. It’s more a parsing and text analytic problem as I see it.

Anyway, nice to be back at work and it’s good to see that the CEP community is moving forward and I hope that all of you that work with CEP will have a really nice fall doing all kinds of cool CEP stuff…

Here’s some Swedish pine for you:

 

Share/Save/Bookmark

Complex Event Processing