Home > Complex Event Processing > So Why Not Use An In-Memory db Instead?

So Why Not Use An In-Memory db Instead?

April 24th, 2009

The world of CEP is constantly evolving and it will take some time before the world+dog agrees on what CEP is,  and more importantly what it is not.

Nobody (well, there’s people for everything…) would put heavy computation inside a normal SQL database in the year of 2009. I suppose you could, for example PL/SQL can do about anything. But it is not natural and most developers feel instinctly that it’s wrong.

For CEP, developers and architects have not developed these instincts yet.

This is not a surprise as the world of CEP evolves constantly and it’s still unclear to what the current vendor’s event processing offerings will evolve to in 10 years or so. I’m sure there’s new types of event processing tools being developed as we speak. Adding even more concepts usable for event processing.

So basically every developer and vendor has their own idea on what CEP is and how it should be done and more importantly what kind of problems should be solved using CEP.

For me, questions and statements like "Could CEP have used to stop the financial melt-down" is in the same category to ask SQL to do optimization of a logistics chain. To me and my personal view of CEP, they sound just wrong. You need financial algorithms (or whatever they are called in that world) to solve problems in that domain. Maybe running inside a CEP product, but still not CEP.

In this confusion the most asked question that I get is  – "Can’t we just use an in-memory database to do this?" . I really do understand the question. They see all these SQL based approaches and equate that to a special purpose engine doing stuff in memory instead of storing data on disk. So what’s the big deal they ask and move along…

The short answer is, for some type of problems; Go ahead and use a "normal" database, for example MySQL with an in-memory storage engine. After all, a server with 256Gb (quick check: A Dell R900 with 256Gb RAM and six cores is about $40k) memory does not cost that much anymore. So much of the performance benefits claimed by CEP vendors are rapidly decreasing in importance and more importantly not worth paying for as you can solve to problem by adding hardware. The group of customer which need more performance than a single server can deliver when doing standard in-memory SQL is constantly shrinking.

For a sub-set of problems, the conceptual power of SQL just is not enough. You can do lots of trickery with plain old SQL. But at some point it does not feel natural anymore and you need a PhD in SQL in order to understand what’s up.

Then you need something more powerfull and conceptually more suitable for just event processing. Even if you have one event per second, your SQL can get totally unmaintainable. So the complexity is in the complexity of the rules you apply, not in the number of events processed.

The smart vendor and customer will soon understand this and start focusing on concepts tailored specially for event processing and not old re-used concepts which are tweaked to the max to work in a new and fundamentally different paradigm.

I’m sure this process is happening as we speak (or otherwise I predict that I can add more RIP into my vendors list) and I’m excited to see what will come out of this in the years to come.

 

Share/Save/Bookmark

Complex Event Processing

  1. daSepp
    April 24th, 2009 at 11:32 | #1

    Nice post!

    I had some thoughts about relating topics regarding SQL-based CEP approaches.

    So my idea was to buy a huge and strong machine, set up a powerfull database with in-memory tables and do the event processing with triggers! Not standard ANSI-SQL – we talk about event processing right?

    In addition how about a nice graphical user-interface to model decision trees that are then translated into the target DB trigger syntax?

    The event integration could be done through DB interfaces.

    Fast, powerfull enough for a lot of event-related use-cases and cheap. This can be done with out-of-the-box functionalities…

  2. April 24th, 2009 at 12:06 | #2

    Truviso is IMHO converging CEP into a classical RDBMS database (PostgreSQL). Others vendors happens to provide pluggable storage technologies f.e. providing a wide range of options from full speed pervasive heap (local in-memory RDMBS or in-memory B-tre) to disk backed cache down to … classical RDBMS.
    For example the Streamcruncher project (RIP, author now working at Tibco on TibcoBusinessEvents) was interesting in that regards.

    Ultimately what matters is not whatever storage the vendor provide or not. It is about event processing language expressiveness, application deployment / lifecycle mode, adaptors, and performance capabilities (latency, throughput and resiliency – which all depends on your use case). Left aside ad-hoc querying, integration with 3rd party eco system and dashboard.

  3. April 24th, 2009 at 18:08 | #3

    Hello Marco and Alex, I think all of us agree now that there’s no single, perfect solution to CEP.

    I’ve always maintained that In-memory databases can be used as a base for some of the simpler ESP cases. With my prior work on StreamCruncher, as Alex rightly said, I did use DBs. And I’m very pleased to see Apache Derby coming up with a pure-memory back end in 10.5. A request I had made some 3 years ago (See http://javaforu.blogspot.com/2009/04/in-memory-derby-db.html).

    For more complex correlations, the Rete algorithm works extremely well.

    So, the language has to be expressive enough to hide the internal workings. Customers don’t care about what goes on inside as long as it works. Of course, sometimes the internal workings leak. Remember the Law of Leaky Abstractions?

    In TIBCO BusinessEvents (Disclaimer: I work on this product) we have a blend of several technologies to suite different needs.

    Ashwin.

  4. April 27th, 2009 at 08:03 | #4

    @daSepp
    The systems programmer in me sees one problem with this. Or, not sure if it’s a problem but I’m guessing here. If you take a db (or any piece of software) which was designed with a couple of gigs of RAM in mind, I’m not sure it will do OK performance wise with 245Gb of RAM. That’s so much and the PC architecture we have today looks to me very unbalanced (to little memory bandwidth) with that kind of RAM. Suddenly you need to start thinking of how and when to allocated RAM and I think you need actually to start to indexing your reads and writes to RAM.

  5. April 27th, 2009 at 08:07 | #5

    @Alex
    The power of the language will indeed be what separates products in the years to come. What they were designed for in the first place will shine through and it will be at least a couple of years before a product appears with a language specially designed for CEP with a good theorethical foundation. Today all language look very much like engineering products (good ones I must say) but there’s that last rough edge left.

  6. April 27th, 2009 at 08:13 | #6

    @Ashwin Jayaprakash
    Ashwin! Nice to hear from you again! I hope life is good at TIBCO and work on BE is fun!

    With some small modifications to a normal SQL engine (Like PostgreSQL) I think one could create a very good CEP platform which can solve a good subset of the available CEP problems.

    The most successful products in this category should focus on the CEP problems which are ideally suited for processing using a concept based on SQL. If someone can keep that focus, I think we could have a very neat tool in a “CEP Enabled” SQL database. I think Coral8/Aleri shows the way, many feel that the problem there is that it’s completely new engine and would prefer a CEP enabled postgres/mysql/oracle instead.

  7. daSepp
    April 27th, 2009 at 15:18 | #7

    @Marco
    Thats true – but if you take a look at current SQL-based CEP/ESP solutions they all operate based on time-windows – at least if you want to preserve the promised performance.

    That means that I could to the same within in-memory tables – they represent sliding-windows where you can perform joins between tables (which represent event types). No placing triggers over those managable windows you can easly create a high performing event-driven application. You could also create conjunctions with historical entities (with certain performance tradeoffs).

    The thing I want to point out is that you could easly integrate event-driven applications within existing rdbms with simple trigger languages and SQL. Now, that is what some of the rdbms vendors and BI-guys are trying to achieve and sell it as CEP or real-time BI-Add-On.

    I think the CEP marketing guys need to set up a more distinctive commmunication…

    • April 27th, 2009 at 16:51 | #8

      It would be very interesting to see a system where we have a small special purpose language for describing the event processing logic. Then take this language and auto generate SQL, triggers and perhaps some supporting procedural code to run in the database. This would allow you to use a rather standard execution environment (The SQL database) and concentrate on language design and supporting tools. With careful selection of the database engine and cleverly generated code put into it I’m sure you could build a decent CEP solution which would perform OK on a sub set of CEP related tasks.

  8. Prabakaran
    May 7th, 2009 at 12:56 | #9

    Try CSQL main memory database cache if you are looking for an main memory DBMS.

  9. May 7th, 2009 at 13:01 | #10

    CSQL might solve the performance problem. But that’s something rapidly vanishing with large amounts of RAM and SSD disks. So for a faster traditional SQL database CSQL and other types of caches might be great. But the major problem as I see it is the weak support for temporal queries in SQL. In CEP time is a very important dimension in processing. Without good temporal support your queries will quickly become very hard to understand.

  10. Hans
    May 11th, 2009 at 00:27 | #11

    WRT doing event processing with triggers. Here is why it doesn’t work:

    Triggers don’t bunch records the way EP logic wants. Triggers run usually over a transaction, or over each record as it is CRUD. So the logic in the trigger has no way to know if there is another instance of the trigger executing over a record that should be temporally before the record being processed by this instance. The solution is to create all kinds of complex table structures and multiple levels of triggers, which is the recipe for failure.

  1. No trackbacks yet.
Comments are closed.