Galera cluster crashes down on WAN with unstable connection

We are using Galera MariaDB 10.0.26 + Wildfly 10 on a cluster with 6 nodes. The individual nodes are placed in separate virtual hosts in two distinct datacenters. It means they are connected over WAN.

Sometimes we have a problem that one of the servers disconnects and soon after it, the whole cluster crashes down. In the log, there is usually a record about a non-existing foreign key or another inconsistency that led one node to be kicked off the cluster.

It seems that the situation occurs typically when the node has a temporary network problem when it is temporarily inaccessible or the connection is significantly slowed down. If it would be just detached from the cluster in such a situation, it wouldn't mean a problem. The problem is that other nodes usually disconnect one by one soon after the first one and the whole cluster is down.

As Wildfly is connected via datasources, it goes down just after the Galera. Of course, it means a substantial problem as Galera should be a means of high-availability, but it is rather means of high-unavailability.

I don't think that our particular use-case is somehow special. Could anyone help? Many thanks.


Time: 2016-07-30

