Tools
Reference for the scripts and tools bundled with ZooKeeper — including server management scripts, snapshot and transaction log utilities, benchmark tools, and testing frameworks.
Scripts
zkServer.sh
Manage the ZooKeeper server process.
# start the server
./zkServer.sh start
# start the server in the foreground (useful for debugging)
./zkServer.sh start-foreground
# stop the server
./zkServer.sh stop
# restart the server
./zkServer.sh restart
# show the status, mode, and role of the server
./zkServer.sh status
JMX enabled by default
Using config: /data/software/zookeeper/conf/zoo.cfg
Mode: standalone
# print the startup parameters
./zkServer.sh print-cmd
# show the ZooKeeper server version
./zkServer.sh version
Apache ZooKeeper, version 3.6.0-SNAPSHOT 06/11/2019 05:39 GMTThe status command establishes a client connection to execute diagnostic commands.
When the ZooKeeper cluster is started in TLS-only mode (by omitting clientPort from
zoo.cfg), additional SSL configuration must be provided:
CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty \
-Dzookeeper.ssl.trustStore.location=/tmp/clienttrust.jks \
-Dzookeeper.ssl.trustStore.password=password \
-Dzookeeper.ssl.keyStore.location=/tmp/client.jks \
-Dzookeeper.ssl.keyStore.password=password \
-Dzookeeper.client.secure=true" \
./zkServer.sh statuszkCli.sh
See ZooKeeper CLI.
zkEnv.sh
Sets environment variables for the ZooKeeper server. Key variables:
ZOO_LOG_DIR— the directory where logs are stored.
zkCleanup.sh
Clean up old snapshots and transaction logs.
# Usage: ./zkCleanup.sh dataLogDir [snapDir] -n count
# dataLogDir -- path to the transaction log directory
# snapDir -- path to the snapshot directory (optional)
# count -- number of recent snaps/logs to keep (must be >= 3)
# Keep the latest 5 logs and snapshots
./zkCleanup.sh -n 5zkTxnLogToolkit.sh
Dump and recover transaction log files with broken CRC entries.
$ bin/zkTxnLogToolkit.sh
usage: TxnLogToolkit [-dhrv] txn_log_file_name
-d,--dump Dump mode. Dump all entries of the log file. (this is the default)
-h,--help Print help message
-r,--recover Recovery mode. Re-calculate CRC for broken entries.
-v,--verbose Be verbose in recovery mode: print all entries, not just fixed ones.
-y,--yes Non-interactive mode: repair all CRC errors without askingThe default behavior is safe — it dumps the entries of the given transaction log file
to the screen (same as -d,--dump):
$ bin/zkTxnLogToolkit.sh log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
4/5/18 2:15:58 PM CEST session 0x16295bafcc40000 cxid 0x0 zxid 0x100000001 createSession 30000
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
4/5/18 2:16:12 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x100000003 createSession 30000
4/5/18 2:17:34 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x200000001 closeSession null
4/5/18 2:17:34 PM CEST session 0x16295bd23720000 cxid 0x0 zxid 0x200000002 createSession 30000
4/5/18 2:17:34 PM CEST session 0x16295bd23720000 cxid 0x2 zxid 0x200000003 create '/andor,#626262,v{s{31,s{'world,'anyone}}},F,1
EOF reached after 6 txns.In recovery mode (-r,--recover), the original file is left untouched and all transactions
are copied to a new file with a .fixed suffix. CRC values are recalculated; if the calculated
value does not match the original, the new value is used. By default the tool is interactive,
asking for confirmation on each CRC error:
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ?- Yes — write the recalculated CRC to the new file.
- No — copy the original CRC value.
- Abort — abort the operation. The
.fixedfile will not be deleted and may be in a half-complete state.
$ bin/zkTxnLogToolkit.sh -r log.100000001
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
Would you like to fix it (Yes/No/Abort) ? y
EOF reached after 6 txns.
Recovery file log.100000001.fixed has been written with 1 fixed CRC error(s)Use -v,--verbose to print all records (not just broken ones). Use -y,--yes to fix all
CRC errors automatically without prompting.
zkSnapShotToolkit.sh
Dump a snapshot file to stdout, showing detailed information for each znode.
# show usage
./zkSnapShotToolkit.sh
USAGE: SnapshotFormatter [-d|-json] snapshot_file
-d dump the data for each znode
-json dump znode info in json format
# show each znode's metadata without data content
./zkSnapShotToolkit.sh /data/zkdata/version-2/snapshot.fa01000186d
/zk-latencies_4/session_946
cZxid = 0x00000f0003110b
ctime = Wed Sep 19 21:58:22 CST 2018
mZxid = 0x00000f0003110b
mtime = Wed Sep 19 21:58:22 CST 2018
pZxid = 0x00000f0003110b
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x00000000000000
dataLength = 100
# -d: include data content
./zkSnapShotToolkit.sh -d /data/zkdata/version-2/snapshot.fa01000186d
/zk-latencies2/session_26229
cZxid = 0x00000900007ba0
ctime = Wed Aug 15 20:13:52 CST 2018
mZxid = 0x00000900007ba0
mtime = Wed Aug 15 20:13:52 CST 2018
pZxid = 0x00000900007ba0
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x00000000000000
data = eHh4eHh4eHh4eHh4eA==
# -json: output in JSON format
./zkSnapShotToolkit.sh -json /data/zkdata/version-2/snapshot.fa01000186d
[[1,0,{"progname":"SnapshotFormatter.java","progver":"0.01","timestamp":1559788148637},[{"name":"\/","asize":0,"dsize":0,"dev":0,"ino":1001},[{"name":"zookeeper","asize":0,"dsize":0,"dev":0,"ino":1002},{"name":"config","asize":0,"dsize":0,"dev":0,"ino":1003},[{"name":"quota","asize":0,"dsize":0,"dev":0,"ino":1004},[{"name":"test","asize":0,"dsize":0,"dev":0,"ino":1005},{"name":"zookeeper_limits","asize":52,"dsize":52,"dev":0,"ino":1006},{"name":"zookeeper_stats","asize":15,"dsize":15,"dev":0,"ino":1007}]]],{"name":"test","asize":0,"dsize":0,"dev":0,"ino":1008}]]zkSnapshotRecursiveSummaryToolkit.sh
Recursively collect and display child count and data size for a selected node.
$ ./zkSnapshotRecursiveSummaryToolkit.sh
USAGE:
SnapshotRecursiveSummary <snapshot_file> <starting_node> <max_depth>
snapshot_file: path to the ZooKeeper snapshot
starting_node: the path in the ZooKeeper tree where traversal begins
max_depth: depth limit for output (0 = no limit; 1 = starting node + direct children;
2 = one more level, etc.). Only affects display, NOT the calculation.# display stats for the root node and 2 levels of descendants
./zkSnapshotRecursiveSummaryToolkit.sh /data/zkdata/version-2/snapshot.fa01000186d / 2
/
children: 1250511
data: 1952186580
-- /zookeeper
-- children: 1
-- data: 0
-- /solr
-- children: 1773
-- data: 8419162
---- /solr/configs
---- children: 1640
---- data: 8407643
---- /solr/overseer
---- children: 6
---- data: 0
---- /solr/live_nodes
---- children: 3
---- data: 0zkSnapshotComparer.sh
Compare two snapshots with configurable thresholds and filters, outputting the delta — which znodes were added, updated, or deleted. Useful for offline consistency checks and data trend analysis. Only permanent nodes are reported; sessions and ephemeral nodes are ignored.
Tuning parameters:
--nodes— threshold for number of descendant nodes added/removed.--bytes— threshold for bytes added/removed.
Locating Snapshots
Snapshots are stored in the ZooKeeper data directory
configured in conf/zoo.cfg.
Supported Snapshot Formats
Uncompressed snapshots and compressed formats (snappy, gz) are all supported.
Snapshots in different formats can be compared directly without manual decompression.
Running the Tool
Running the tool with no arguments prints the help page:
usage: java -cp <classPath> org.apache.zookeeper.server.SnapshotComparer
-b,--bytes <BYTETHRESHOLD> (Required) The node data delta size threshold, in bytes, for printing the node.
-d,--debug Use debug output.
-i,--interactive Enter interactive mode.
-l,--left <LEFT> (Required) The left snapshot file.
-n,--nodes <NODETHRESHOLD> (Required) The descendant node delta size threshold, in nodes, for printing the node.
-r,--right <RIGHT> (Required) The right snapshot file.Example command:
./bin/zkSnapshotComparer.sh -l /zookeeper-data/backup/snapshot.d.snappy -r /zookeeper-data/backup/snapshot.44 -b 2 -n 1Example output:
...
Deserialized snapshot in snapshot.44 in 0.002741 seconds
Processed data tree in 0.000361 seconds
Node count: 10
Total size: 0
Max depth: 4
Count of nodes at depth 0: 1
Count of nodes at depth 1: 2
Count of nodes at depth 2: 4
Count of nodes at depth 3: 3
Node count: 22
Total size: 2903
Max depth: 5
Count of nodes at depth 0: 1
Count of nodes at depth 1: 2
Count of nodes at depth 2: 4
Count of nodes at depth 3: 7
Count of nodes at depth 4: 8
Printing analysis for nodes difference larger than 2 bytes or node count difference larger than 1.
Analysis for depth 0
Node found in both trees. Delta: 2903 bytes, 12 descendants
Analysis for depth 1
Node /zk_test found in both trees. Delta: 2903 bytes, 12 descendants
Analysis for depth 2
Node /zk_test/gz found in both trees. Delta: 730 bytes, 3 descendants
Node /zk_test/snappy found in both trees. Delta: 2173 bytes, 9 descendants
Analysis for depth 3
Node /zk_test/gz/12345 found in both trees. Delta: 9 bytes, 1 descendants
Node /zk_test/gz/a found only in right tree. Descendant size: 721. Descendant count: 0
Node /zk_test/snappy/anotherTest found in both trees. Delta: 1738 bytes, 2 descendants
Node /zk_test/snappy/test_1 found only in right tree. Descendant size: 344. Descendant count: 3
Node /zk_test/snappy/test_2 found only in right tree. Descendant size: 91. Descendant count: 2
Analysis for depth 4
Node /zk_test/gz/12345/abcdef found only in right tree. Descendant size: 9. Descendant count: 0
Node /zk_test/snappy/anotherTest/abc found only in right tree. Descendant size: 1738. Descendant count: 0
Node /zk_test/snappy/test_1/a found only in right tree. Descendant size: 93. Descendant count: 0
Node /zk_test/snappy/test_1/b found only in right tree. Descendant size: 251. Descendant count: 0
Node /zk_test/snappy/test_2/xyz found only in right tree. Descendant size: 33. Descendant count: 0
Node /zk_test/snappy/test_2/y found only in right tree. Descendant size: 58. Descendant count: 0
All layers compared.Interactive Mode
Add -i / --interactive to enter interactive mode:
./bin/zkSnapshotComparer.sh -l /zookeeper-data/backup/snapshot.d.snappy -r /zookeeper-data/backup/snapshot.44 -b 2 -n 1 -iThree navigation options are available:
- Press Enter to print the current depth layer.
- Type a number to jump to and print all nodes at that depth.
- Enter an absolute path (starting with
/) to print the immediate subtree of that node.
Note: only nodes passing the bytes and nodes thresholds are shown.
Press Enter to move to the next depth layer:
Current depth is 0
Press enter to move to print current depth layer;
...
Printing analysis for nodes difference larger than 2 bytes or node count difference larger than 1.
Analysis for depth 0
Node found in both trees. Delta: 2903 bytes, 12 descendantsType a number to jump forward or backward to a specific depth:
Current depth is 1
...
Type a number to jump to and print all nodes at a given depth;
...
3
Printing analysis for nodes difference larger than 2 bytes or node count difference larger than 1.
Analysis for depth 3
Node /zk_test/gz/12345 found in both trees. Delta: 9 bytes, 1 descendants
Node /zk_test/gz/a found only in right tree. Descendant size: 721. Descendant count: 0
Filtered node /zk_test/gz/anotherOne of left size 0, right size 0
Filtered right node /zk_test/gz/b of size 0
Node /zk_test/snappy/anotherTest found in both trees. Delta: 1738 bytes, 2 descendants
Node /zk_test/snappy/test_1 found only in right tree. Descendant size: 344. Descendant count: 3
Node /zk_test/snappy/test_2 found only in right tree. Descendant size: 91. Descendant count: 2
Current depth is 3
...
Type a number to jump to and print all nodes at a given depth;
...
0
Printing analysis for nodes difference larger than 2 bytes or node count difference larger than 1.
Analysis for depth 0
Node found in both trees. Delta: 2903 bytes, 12 descendantsOut-of-range depth is handled gracefully:
Current depth is 1
...
10
Printing analysis for nodes difference larger than 2 bytes or node count difference larger than 1.
Depth must be in range [0, 4]Enter an absolute path to print the immediate subtree of a node:
Current depth is 3
...
Enter an ABSOLUTE path to print the immediate subtree of a node.
/zk_test
Printing analysis for nodes difference larger than 2 bytes or node count difference larger than 1.
Analysis for node /zk_test
Node /zk_test/gz found in both trees. Delta: 730 bytes, 3 descendants
Node /zk_test/snappy found in both trees. Delta: 2173 bytes, 9 descendantsInvalid path and invalid input are handled:
Enter an ABSOLUTE path to print the immediate subtree of a node.
/non-exist-path
Analysis for node /non-exist-path
Path /non-exist-path is neither found in left tree nor right tree.
12223999999999999999999999999999999999999
Input 12223999999999999999999999999999999999999 is not valid. Depth must be in range [0, 4]. Path must be an absolute path which starts with '/'.The tool exits interactive mode automatically when all layers are compared, or press ^C to exit at any time.
Benchmark
YCSB
YCSB (Yahoo Cloud Serving Benchmark) can be used to benchmark ZooKeeper. Follow the steps below to get started.
Start ZooKeeper Server(s)
Start your ZooKeeper ensemble before running any benchmark.
Install Java and Maven
Ensure a JDK and Maven are installed on the machine running the benchmark.
Set Up YCSB
Clone and build the ZooKeeper binding:
git clone http://github.com/brianfrankcooper/YCSB.git
cd YCSB
mvn -pl site.ycsb:zookeeper-binding -am clean package -DskipTestsSee the YCSB README for more details.
Configure ZooKeeper Connection Parameters
Set the following properties in your workload file or via the shell:
zookeeper.connectString— connection string (e.g.127.0.0.1:2181/benchmark)zookeeper.sessionTimeout— session timeout in millisecondszookeeper.watchFlag— enable ZooKeeper watches (trueorfalse, defaultfalse). This measures the effect of watch overhead on read/write performance, not watch notification latency.
./bin/ycsb run zookeeper -s -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark \
-p zookeeper.watchFlag=trueOr set properties directly on the command line (create the /benchmark namespace first using create /benchmark in the CLI):
./bin/ycsb run zookeeper -s -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark \
-p zookeeper.sessionTimeout=30000Load Data and Run Tests
Load data:
# -p recordcount: number of znodes to insert
./bin/ycsb load zookeeper -s -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark \
-p recordcount=10000 > outputLoad.txtRun the workload (workloadb is recommended as the most representative read-heavy workload):
# test the effect of value size on performance
./bin/ycsb run zookeeper -s -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark -p fieldlength=1000
# test with multiple fields
./bin/ycsb run zookeeper -s -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark -p fieldcount=20
# HDR histogram output
./bin/ycsb run zookeeper -threads 1 -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark \
-p hdrhistogram.percentiles=10,25,50,75,90,95,99,99.9 \
-p histogram.buckets=500
# multi-client test (increase maxClientCnxns in zoo.cfg as needed)
./bin/ycsb run zookeeper -threads 10 -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark
# timeseries output
./bin/ycsb run zookeeper -threads 1 -P workloads/workloadb \
-p zookeeper.connectString=127.0.0.1:2181/benchmark \
-p measurementtype=timeseries -p timeseries.granularity=50
# cluster test
./bin/ycsb run zookeeper -P workloads/workloadb \
-p zookeeper.connectString=192.168.10.43:2181,192.168.10.45:2181,192.168.10.27:2181/benchmark
# test leader performance only
./bin/ycsb run zookeeper -P workloads/workloadb \
-p zookeeper.connectString=192.168.10.43:2181/benchmark
# large znodes (default jute.maxbuffer = 1 MB; set the same value on all ZK servers)
./bin/ycsb run zookeeper -jvm-args="-Djute.maxbuffer=4194304" -s -P workloads/workloadc \
-p zookeeper.connectString=127.0.0.1:2181/benchmark
# clean up after benchmarking: CLI: deleteall /benchmarkzk-smoketest
zk-smoketest provides a simple smoketest client for a ZooKeeper ensemble. Useful for verifying new, updated, or existing installations.
Testing
Fault Injection Framework
Byteman
Byteman is a tool for tracing, monitoring, and testing Java application and JDK runtime code. It injects Java code into methods without requiring recompilation, repackaging, or redeployment — and injection can be performed at JVM startup or while the application is running. See the Byteman tutorial for a quick introduction.
# Attach Byteman to 3 ZooKeeper servers at runtime
# (55001/55002/55003 = Byteman ports; 714/740/758 = ZK server PIDs)
./bminstall.sh -b -Dorg.jboss.byteman.transform.all -Dorg.jboss.byteman.verbose -p 55001 714
./bminstall.sh -b -Dorg.jboss.byteman.transform.all -Dorg.jboss.byteman.verbose -p 55002 740
./bminstall.sh -b -Dorg.jboss.byteman.transform.all -Dorg.jboss.byteman.verbose -p 55003 758
# load a fault injection script
./bmsubmit.sh -p 55002 -l my_zk_fault_injection.btm
# unload a fault injection script
./bmsubmit.sh -p 55002 -u my_zk_fault_injection.btmExample 1: Force a leader re-election by rolling over the leader's zxid.
cat zk_leader_zxid_roll_over.btm
RULE trace zk_leader_zxid_roll_over
CLASS org.apache.zookeeper.server.quorum.Leader
METHOD propose
IF true
DO
traceln("*** Leader zxid has rolled over, forcing re-election ***");
$1.zxid = 4294967295L
ENDRULEExample 2: Make the leader drop ping packets to a specific follower. The leader will close
the LearnerHandler for that follower, and the follower will re-enter the quorum.
cat zk_leader_drop_ping_packet.btm
RULE trace zk_leader_drop_ping_packet
CLASS org.apache.zookeeper.server.quorum.LearnerHandler
METHOD ping
AT ENTRY
IF $0.sid == 2
DO
traceln("*** Leader drops ping packet to sid: 2 ***");
return;
ENDRULEExample 3: Make a follower drop ACK packets. This has limited effect during broadcast since the leader only needs a majority of ACKs to commit a proposal.
cat zk_follower_drop_ack_packet.btm
RULE trace zk.follower_drop_ack_packet
CLASS org.apache.zookeeper.server.quorum.SendAckRequestProcessor
METHOD processRequest
AT ENTRY
IF true
DO
traceln("*** Follower drops ACK packet ***");
return;
ENDRULEJepsen Test
Jepsen is a framework for distributed systems verification with fault injection. It has been used to verify eventually-consistent databases, linearizable coordination systems, and distributed task schedulers.
Running the Dockerized Jepsen is the simplest way to get started.
Installation:
git clone git@github.com:jepsen-io/jepsen.git
cd docker
# initial setup may take a while
./up.sh
# verify one control node and five DB nodes are running
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8265f1d3f89c docker_control "/bin/sh -c /init.sh" 9 hours ago Up 4 hours 0.0.0.0:32769->8080/tcp jepsen-control
8a646102da44 docker_n5 "/run.sh" 9 hours ago Up 3 hours 22/tcp jepsen-n5
385454d7e520 docker_n1 "/run.sh" 9 hours ago Up 9 hours 22/tcp jepsen-n1
a62d6a9d5f8e docker_n2 "/run.sh" 9 hours ago Up 9 hours 22/tcp jepsen-n2
1485e89d0d9a docker_n3 "/run.sh" 9 hours ago Up 9 hours 22/tcp jepsen-n3
27ae01e1a0c5 docker_node "/run.sh" 9 hours ago Up 9 hours 22/tcp jepsen-node
53c444b00ebd docker_n4 "/run.sh" 9 hours ago Up 9 hours 22/tcp jepsen-n4Running the test:
# enter the control container
docker exec -it jepsen-control bash
# run the ZooKeeper test
cd zookeeper && lein run test --concurrency 10
# passing output looks like:
INFO [2019-04-01 11:25:23,719] jepsen worker 8 - jepsen.util 8 :ok :read 2
INFO [2019-04-01 11:25:23,722] jepsen worker 3 - jepsen.util 3 :invoke :cas [0 4]
INFO [2019-04-01 11:25:23,760] jepsen worker 3 - jepsen.util 3 :fail :cas [0 4]
INFO [2019-04-01 11:25:23,791] jepsen worker 1 - jepsen.util 1 :invoke :read nil
INFO [2019-04-01 11:25:23,794] jepsen worker 1 - jepsen.util 1 :ok :read 2
INFO [2019-04-01 11:25:24,038] jepsen worker 0 - jepsen.util 0 :invoke :write 4
INFO [2019-04-01 11:25:24,073] jepsen worker 0 - jepsen.util 0 :ok :write 4
...............................................................................
Everything looks good! ヽ('ー`)ノRead this blog post to learn more about the Jepsen analysis of ZooKeeper.