ZooKeeper favicon

Apache ZooKeeper

Communication using the Netty framework

Explains how to configure ZooKeeper to use Netty instead of NIO for client/server communication, including Quorum TLS setup and zero-downtime migration from non-TLS clusters.

Netty is an NIO based client/server communication framework, it simplifies (over NIO being used directly) many of the complexities of network level communication for java applications. Additionally the Netty framework has built in support for encryption (SSL) and authentication (certificates). These are optional features and can be turned on or off individually.

In versions 3.5+, a ZooKeeper server can use Netty instead of NIO (default option) by setting the environment variable zookeeper.serverCnxnFactory to org.apache.zookeeper.server.NettyServerCnxnFactory; for the client, set zookeeper.clientCnxnSocket to org.apache.zookeeper.ClientCnxnSocketNetty.

Quorum TLS

New in 3.5.5

Based on the Netty Framework ZooKeeper ensembles can be set up to use TLS encryption in their communication channels. This section describes how to set up encryption on the quorum communication.

Please note that Quorum TLS encapsulates securing both leader election and quorum communication protocols.

Create SSL keystore JKS to store local credentials. One keystore should be created for each ZK instance. In this example we generate a self-signed certificate and store it together with the private key in keystore.jks. This is suitable for testing purposes, but you probably need an official certificate to sign your keys in a production environment. Please note that the alias (-alias) and the distinguished name (-dname) must match the hostname of the machine that is associated with, otherwise hostname verification won't work.

keytool -genkeypair -alias $(hostname -f) -keyalg RSA -keysize 2048 -dname "cn=$(hostname -f)" -keypass password -keystore keystore.jks -storepass password

Extract the signed public key (certificate) from keystore. This step might only be necessary for self-signed certificates.

keytool -exportcert -alias $(hostname -f) -keystore keystore.jks -file $(hostname -f).cer -rfc

Create SSL truststore JKS containing certificates of all ZooKeeper instances. The same truststore (storing all accepted certs) should be shared on participants of the ensemble. You need to use different aliases to store multiple certificates in the same truststore. Name of the aliases doesn't matter.

keytool -importcert -alias [host1..3] -file [host1..3].cer -keystore truststore.jks -storepass password

Use NettyServerCnxnFactory as serverCnxnFactory, because SSL is not supported by NIO. Add the following configuration settings to your zoo.cfg config file:

sslQuorum=true
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.quorum.keyStore.location=/path/to/keystore.jks
ssl.quorum.keyStore.password=password
ssl.quorum.trustStore.location=/path/to/truststore.jks
ssl.quorum.trustStore.password=password

Verify in the logs that your ensemble is running on TLS:

INFO  [main:QuorumPeer@1789] - Using TLS encrypted quorum communication
INFO  [main:QuorumPeer@1797] - Port unification disabled
...
INFO  [QuorumPeerListener:QuorumCnxManager$Listener@877] - Creating TLS-only quorum server socket

Upgrading existing non-TLS cluster with no downtime

New in 3.5.5

Here are the steps needed to upgrade an already running ZooKeeper ensemble to TLS without downtime by taking advantage of port unification functionality.

Create the necessary keystores and truststores for all ZK participants as described in the previous section.

Add the following config settings and restart the first node. Note that TLS is not yet enabled, but we turn on port unification.

sslQuorum=false
portUnification=true
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.quorum.keyStore.location=/path/to/keystore.jks
ssl.quorum.keyStore.password=password
ssl.quorum.trustStore.location=/path/to/truststore.jks
ssl.quorum.trustStore.password=password

Repeat step 2 on the remaining nodes. Verify that you see the following entries in the logs, and double-check after each node restart that the quorum becomes healthy again.

INFO  [main:QuorumPeer@1791] - Using insecure (non-TLS) quorum communication
INFO  [main:QuorumPeer@1797] - Port unification enabled
...
INFO  [QuorumPeerListener:QuorumCnxManager$Listener@874] - Creating TLS-enabled quorum server socket

Enable Quorum TLS on each node and do a rolling restart:

sslQuorum=true
portUnification=true

Once you verify that your entire ensemble is running on TLS, disable port unification and do another rolling restart:

sslQuorum=true
portUnification=false
Edit on GitHub

On this page