Distributed Entity Cache Clearing Mechanism

Why and when use it?

The distributed cache clearing mechanism is needed when you have multiple servers in a cluster that are sharing a single database. When one entity engine has either a create, update or delete operation that goes through it, it will clear its own caches. But it can also sent out a message to the other servers in the pool to clear their caches.

This is a feature that runs through the service engine which operates on top of the entity engine. When you're doing a distributed cache clear it will result in service calls. In most cases you will use Java Messaging Service, to send a message to a JMS server that can then be distributed to other servers in the cluster.

How to set it?

To keep it simple we will only set the mandatory values. There are other options which are covered by defaults.

The Entity Engine

This is the easiest part, for a given delegator you only have to set its distributed-cache-clear-enable attribute to "true" (false by default)

The Service Engine

The location of the JMS definition is in the service-engine.xml file. By default you set a jms-service of name "serviceMessenger". You define there a JMS server with its name, a jndi name and a topic name. To make as less changes as possible we use "default" for the server name and set the values in the jndi.properties file. I could have also set a server name in jndiservers.xml but my catch phrase is "the less changes the better". This is the service-engine.xml setting, same on each servers

<!-- JMS Service Active MQ Topic Configuration (set as default in jndi.properties, the less changes the better) -->
<jms-service name="serviceMessenger" send-mode="all">
    <server jndi-server-name="default"
        jndi-name="topicConnectionFactory"
        topic-queue="OFBTopic"
        type="topic"
        listen="true"/>
</jms-service>

I decided to use Apache ActiveMQ as JMS server and to simply set these properties in the jndi.properties files (commenting out the OOTB default):

java.naming.factory.initial=org.apache.activemq.jndi.ActiveMQInitialContextFactory
java.naming.provider.url=tcp://172.18.7.4:61616
topic.OFBTopic=OFBTopic
connectionFactoryNames=connectionFactory, queueConnectionFactory, topicConnectionFactory

AvtiveMQ provides also a point to point model with queues. We rely rather on topics because they use a Publish/subscribe model and we need to broadcast messages to all servers in the cluster.

At this stage you need to install an ActiveMQ server somewhere. Initially, I decided to install the last available release of ActiveMQ : 5.5.0. But it turned that there are some known issues in this release. So I finally took the 5.4.2 release. To test, I installed it on my XP development machine, and on the cluster. It could be embedded in OFBiz, but I decided to simply run it as an external broker. I don't think it's interesting to have it embedded in OFBiz: you just install, run it and forget about it (it sets /etc/ for you). For testing I used the ActiveMQ recommended default setting for that. For production you will want to run it as a Unix Daemon (or Windows Service).

You need also to put the corresponding activemq-all-x.x.x.jar in the framework/base/lib OFBiz directory. Then the Distributed Cache Clearing Mechanism should be ready to use.

You can then monitor ActiveMQ using the Web Console by pointing your browser at http://localhost:8161/admin/ and then topics page

Single point of failure

The setting above is sufficient in a staging environment but is a single point of failure in a production environment. So we need to create a cluster of ActiveMQ brokers. Since they should not consume much resources (only 256MB of memory at max and not much CPU cycles), we can put each instance on the same machines than the OFBiz instances.

There is a simple way to load balance ActiveMQ queues in an ActiveMQ cluster. But, as explained above, tough we use async services for clearing distributed caches, it does not make sense for us to use queues since we need to broadcast messages. There is also the so called virtual destinations solution but it's a bit complicated and it seems still uses a queue underneath. After some resarches, I have finally decided to go with the the Failover Transport solution.

It's fairly simple to set through JNDI. For this we only need to replace in jndi.properties files

java.naming.provider.url=tcp://172.18.7.4:61616

by

java.naming.provider.url=failover:(tcp://172.18.7.4:61616,tcp://172.18.7.5:61616)?randomize=false&backup=true&trackMessages=true

See Transport Options for details on the 2 last parameters. There is a also a link at bottom of this page if ever you need to escalate more smoothly dynamic setting of failover. But it would need more work in OFBiz...

Notes

If you get a "Too many open files" error. Edit /etc/security/limits.conf increase nofile for the users running ActiveMQ. like

user - nofile 10000
root - nofile 10000

In Unix like systems, Network connections are actually backed by files descriptors... While at it, I recommend also to increse the max heap used by ActiveMQ. For instance we increased ActiveMQ max heap to 512MB and 1GB respectively on stagging and production servers from default 256MB, can't hurt...

We also increased the max number of files descriptors for root which runs ActiveMQ at startup (for now only on staging)

Some possible pitfalls

This section can be bypassed but might help in case of troubles
I first installed ActiveMQ 5.5.0 on my developement machine on XP. When I ran OFBiz I got this non blocking error

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

This lead to this tip

*Failed to load class org.slf4j.impl.StaticLoggerBinder*

This error is reported when the org.slf4j.impl.StaticLoggerBinder class could not be loaded into memory. This happens when no appropriate SLF4J binding could be found on the class path. Placing one (and only one) of slf4j-nop.jar, slf4j-simple.jar, slf4j-log4j12.jar, slf4j-jdk14.jar or logback-classic.jar on the class path should solve the problem. As of SLF4J version 1.6, in the absence of a binding, SLF4J will default to a no-operation (NOP) logger implementation. You can download SLF4J bindings from the project download page.

I firts wondered if this was due to ActiveMQ itself or something else in OFBiz which was interacting with activemq JAR. I then thought that, to work with ActiveMQ, it needed a special setting and following the advice I (hastily) installed the last SLF4J version (slf4j-simple-1.6.1.jar) in the framework/base/lib OFBiz directory. I then got a message "Multiple bindings were found on the class path". But as all was working correctly I thought it was only a warning about using a slf4j version in ActiveMQ and another in OFBiz. Later when I began to run the Distributed Cache Clearing Mechanism on the cluster I crossed some issues and began to wonder what was wrong. It took me a moment to realize ''<<SLF4J will default to a no-operation (NOP) logger implementation>>'' sent me the wrong way. Actually I did not need to add any slf4j JAR, pfew...

Space shortcuts

Child pages

Distributed Entity Cache Clearing Mechanism

Why and when use it?

How to set it?

The Entity Engine

The Service Engine

Single point of failure

Notes

Some possible pitfalls

Space shortcuts

Child pages

Distributed Entity Cache Clear Mechanism

Distributed Entity Cache Clearing Mechanism

Why and when use it?

How to set it?

The Entity Engine

The Service Engine

Single point of failure

Notes

Some possible pitfalls