Confluent Kafka Helm Chart – how to choose a different Zookeeper version

According to provider documentation for Helm Release for example 5.5.0 there is a wide range of Zookeeper version that can be utilized, how can I choose that, I can’t find any info about it or example, it’s missing from the chart and it looks to be hardcoded in the docker as files.

Go to Source
Author: anVzdGFub3RoZXJodW1hbg

ERROR Unexpected exception causing shutdown while sock still open (org.apache.zookeeper.server.quorum.LearnerHandler) – Zookeper Cluster failure

Often enough to damage business our Zookeper memembers fail with “ERROR Unexpected exception causing shutdown while sock still open (org.apache.zookeeper.server.quorum.LearnerHandler)”, I think it happens on one member and it breaks the whole Zk cluster and Brokers after, it looks like https://issues.apache.org/jira/browse/ZOOKEEPER-3036 is related, why is this version in the release, what about upcoming Zookeeper releases, is there anything I can do about it?

Logs:

ZK-0 is [2020-07-03 03:40:28,552] WARN Exception when following the leader (org.apache.zookeeper.server.quorum.Learner)
java.net.SocketTimeoutException: Read timed out


ZK-1 [2020-07-02 10:54:17,681] WARN Unable to read additional data from client sessionid 0x200887c07d10004, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn)
[2020-07-03 03:40:45,745] INFO Expiring session 0x200887c07d10004, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2020-07-03 03:40:45,745] INFO Submitting global closeSession request for session 0x200887c07d10004 (org.apache.zookeeper.server.ZooKeeperServer)
[2020-07-03 03:40:45,745] INFO Expiring session 0x300887c2cee0002, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2020-07-03 03:40:45,745] INFO Submitting global closeSession request for session 0x300887c2cee0002 (org.apache.zookeeper.server.ZooKeeperServer)
[2020-07-03 03:40:45,745] INFO Expiring session 0x300887c2cee0001, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2020-07-03 03:40:45,745] INFO Submitting global closeSession request for session 0x300887c2cee0001 (org.apache.zookeeper.server.ZooKeeperServer)
### !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! I think here everything starts
[2020-07-03 03:40:45,752] ERROR Unexpected exception causing shutdown while sock still open (org.apache.zookeeper.server.quorum.LearnerHandler)
java.net.SocketTimeoutException: Read timed out
[2020-07-03 03:40:45,765] WARN ******* GOODBYE /10.233.106.50:54428 ******** (org.apache.zookeeper.server.quorum.LearnerHandler)
[2020-07-03 03:40:45,765] WARN Unexpected exception at LearnerHandler Socket[addr=/10.233.113.68,port=36912,localport=2888] tickOfNextAckDeadline:422901 synced?:true queuedPacketLength:2 (org.apache.zookeeper.server.quorum.LearnerHandler)
java.net.SocketException: Broken pipe (Write failed)
[2020-07-03 03:40:45,762] ERROR Unexpected exception causing shutdown while sock still open (org.apache.zookeeper.server.quorum.LearnerHandler)
java.net.SocketException: Connection reset


ZK-2 - [2020-07-03 03:41:15,203] ERROR Unexpected exception causing shutdown while sock still open (org.apache.zookeeper.server.quorum.LearnerHandler)
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)


BR-0 - [2020-07-03 03:40:37,481] ERROR Uncaught exception in scheduled task 'isr-expiration' (kafka.utils.KafkaScheduler)


BR-1 - [2020-07-03 03:40:37,481] ERROR Uncaught exception in scheduled task 'isr-expiration' (kafka.utils.KafkaScheduler)


BR -2 - [2020-07-03 03:40:43,411] ERROR Uncaught exception in scheduled task 'isr-expiration' (kafka.utils.KafkaScheduler)
kafka.zookeeper.ZooKeeperClientExpiredException: Session expired either before or while waiting for connection (edited) 

Kafka and zookeeper image versions:

kubectl describe pods -n kafka-prod | grep -i image |  grep -i zookeeper | tail -1
  Normal  Pulled     30m   kubelet, prod-k8s-w1  Container image "confluentinc/cp-zookeeper:5.4.1" already present on machine
kubectl describe pods -n kafka-prod | grep -i image |  grep -i cp-enterprise-kafka | tail -1
  Normal  Pulled     29m   kubelet, prod-k8s-w1  Container image "confluentinc/cp-enterprise-kafka:5.4.1" already present on machine

Helm chart version:

helm list confluent-prod
NAME            REVISION        UPDATED                         STATUS          CHART               APP VERSION     NAMESPACE
confluent-prod  1               Tue May  5 16:53:20 2020        DEPLOYED        cp-helm-charts-0.4.11.0             kafka-prod

Kubernetes version:

kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Go to Source
Author: anVzdGFub3RoZXJodW1hbg

Run kafka-exporter against Confluent Kafka Cluster

Whenever I install the helm chart and point the kafkaServer to my k8s broker service the pod crashes due to failing to connect to the api, is this some confluent restriction and can it be workaround somehow? I need this as I want to monitor lag/offset and other things.

Go to Source
Author: anVzdGFub3RoZXJodW1hbg