config.yaml is not setting up a new context

I have a kubernetes Config file such as:

apiVersion: v1
kind: Config
preferences: {}
- context:
    cluster: test-sim-development
    namespace: test-sim
    user: developer
  name: test-sim

When I issue a command such as:

kubectl config --kubeconfig infra_k8/config.yaml set-context test-sim && kubectl config use-context test-sim

I get back the following error output:

Context "test-sim" modified.
error: no context exists with the name: "test-sim"

Why is it not finding the “test-sim” name? It is clearly referenced according to the kubernetes docs.

Go to Source
Author: Nona

Kubernetes API server unexpectedly stops responding

I have a managed Kubernetes cluster in Azure (AKS). There is one pod running a simple web service that responds to REST API calls from outside and calls the Kubernetes API server. These calls list and create some jobs.

For AKS, I have the ‘advanced’ (Azure CNI) networking with a custom routing table that redirects traffic to a virtual appliance – this is my company’s setup.

I’m using the official Python client for Kubernetes. The calls look like:

k8s_batch_api_client = client.BatchV1Api()
jobs = k8s_batch_api_client.list_namespaced_job(namespace = 'default')

So nothing special.

Most of the time, everything is working fine. However, from time to time, the Kubernetes API server just doesn’t respond to the requests, so my pod’s web service gets restarted after a timeout (it runs a gunicorn-based web server).

I installed tcpdump on my pod and sniffed the TCP traffic. I’m not a networking nerd, so bear with me.

The Python client keeps a TCP connection pool (using the urllib3 library). And it seems that the Kubernetes API server just silently ‘loses’ a TCP connection, just doesn’t react anymore without closing the connection.

In Wireshark, I see this for a working request-response:

2438   09:41:50,796695     TLSv1.3   1614   Application Data
2439   09:41:50,798552   TCP       66     443 → 56480 [ACK]
2440   09:41:50,804064   TLSv1.3   2196   Application Data is my pod, is the Kubernetes API server. We see a request and a response here.

But then:

2469   09:48:48,853533      TLSv1.3   1580   Application Data
2470   09:48:48,853604      TLSv1.3   1279   Application Data
2471   09:48:48,868222      TCP       1279   [TCP Retransmission] 56480 → 443 [PSH, ACK]
2472   09:48:49,076276      TCP       1452   [TCP Retransmission] 56480 → 443 [ACK]
... lots of retransmissions...

I see no FIN TCP packet from the Kubernetes API server (which would mean, the server wants to close the connection).

After restarting (2 minutes of retransmissions -> reboot), my pod can establish a connection to the API server right away – so the API server itself isn’t overloaded.

The same app runs without any issues on my local Minikube cluster (but there’s of course only one node, so not really representative).

How can I investigate the issue further? Can it be caused by the client side (by my pod or by the Python client)? Is there any special setting I must change on AKS or on my client side to avoid this? Does it look like a ‘server bug’ or a ‘network issue’?

Go to Source
Author: dymanoid

Kubernetes deployment – specify multiple options for image pull as a fallback?

We have had image pull issues at one time or another with all of our possible docker registries including Artifactory, AWS ECR, and GitLab. Even DockerHub occasionally has issues.

Is there a way in a kubernetes deployment to specify that a pod can get an image from multiple different repositories so it can fall back if one is down?

If not, what other solutions are there to maintain stability? I’ve seen things like Harbor and Trow, but it seems like a heavy handed solution to a simple problem.

NOTE: Cross posted on SO just to get help faster, but it belongs here.

Go to Source
Author: John Humphreys – w00te

Traefik HTTPS ingress for application outside of cluster

I have an application that runs in it’s own vm on the same bare metal server. I also have a k8s setup which runs multiple applications behind traefik. I want to use the k8s traefik to reverse proxy the application running on the VM? Is that possible?

It looks like I can define a service which points to the IP address but it’s not recommended, instead it points to headless services but this doesn’t seem like it’ll work.

Go to Source
Author: digital

Control GKE CICD from a Jenkins in a lab with private network?

For a test purpose I need to use my locally provisioned Jenkins with Vagrant in order to connect to GKE and use pods to build. Is that possible, because from what I read K8s will need access to Jenkins as well. How can I achieve that?

Looks to be possible, but I am stuck on access rights for now:

o.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://xxxxxx/api/v1/namespaces/cicd/pods?labelSelector=jenkins%3Dslave. Message: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" in the namespace "cicd". Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" in the namespace "cicd", metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}).

Go to Source
Author: anVzdGFub3RoZXJodW1hbg

Cannot mount CIFS storage on k8s cluster

I have to mount CIFS storage, trying to use flexvolume, fstab/cifs, but I have no idea what i’m doing wrong.

Using microk8s v1.18

root@master:~/yamls# cat pod.yaml 
apiVersion: v1
kind: Secret
  name: cifs-secret
  namespace: default
type: fstab/cifs
  username: 'xxxxxxxxxxx='
  password: 'xxxxxxxxxxxxxxxxxxxxxx=='
apiVersion: v1
kind: Pod
  name: busybox
  namespace: default
  - name: busybox
    image: busybox
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    - name: test
      mountPath: /data
  - name: test
      driver: "fstab/cifs"
      fsType: "cifs"
        name: "cifs-secret"
        networkPath: "//srv/storage"
        mountOptions: "dir_mode=0755,file_mode=0644,noperm"


root@master:~/yamls# kubectl apply -f pod.yaml 
pod/busybox configured
The Secret "cifs-secret" is invalid: type: Invalid value: "fstab/cifs": field is immutable

On changing type of secret to Opaque I get this

  Type     Reason       Age                   From                                      Message
  ----     ------       ----                  ----                                      -------
  Normal   Scheduled    <unknown>             default-scheduler                         Successfully assigned default/busybox to
  Warning  FailedMount  17m (x23 over 48m)    kubelet, master  MountVolume.SetUp failed for volume "test" : Couldn't get secret default/cifs-secret err: Cannot get secret of type fstab/cifs

What I have to use with CIFS driver on Secret? Why this is so hard? Is it changing API or else? Why API version changing from version to version, is it invented in order to give version compability?

And, in future, what can you suggest to NFS mounting? Even more, which practices do you use to provide mounts’ snapshots (or any other backup system)?

Go to Source
Author: Deerenaros

K8s sig-storage-local-static-provisioner hostDir against /vagrant mount?

I am trying to set storage sig-storage-local-static-provisioner to /vagrant mapped folder on windows host, my expectation is that the localstorage calss will automatically provision pvs based on pvc requests, is that possible, trying to use confluent kafka with it with following config:
namespace: kube-system

  • name: local-storage
    hostDir: /vagrant/kafkastorage
    However I am stuck on waiting for consume I don’t see PVs getting created, any idea if that is possible at all?
    Latest events are:
    2m52s Normal WaitForFirstConsumer persistentvolumeclaim/datadir-0-confluent-prod-cp-kafka-0 waiting for first consumer to be created before binding
    2m52s Normal WaitForFirstConsumer persistentvolumeclaim/datadir-confluent-prod-cp-zookeeper-0 waiting for first consumer to be created before binding
    2m52s Normal WaitForFirstConsumer persistentvolumeclaim/datalogdir-confluent-prod-cp-zookeeper-0 waiting for first consumer to be created before binding
kind: StorageClass
  name: local-storage
volumeBindingMode: WaitForFirstConsumer

Go to Source
Author: anVzdGFub3RoZXJodW1hbg

How to keep secrets out of version control with kustomize?

I’ve started using kustomize. It lets you generate secrets with something like:

  - name: mariadb-env
      - mariadb.env

This is great because kustomize appends a hash so that every time I edit my secret, kubernetes will see it as being new and restart the server.

However, if I put kustomization.yaml under version control, then it kind of entails that I put mariadb.env under version control too. If I don’t, then kubernetes build x will fail because of the missing file [for anyone that tries to clone the repo]. Even if I don’t put it under VCS, it still means I have these secret files on my dev workstation.

Prior to adopting kustomize, I’d just create the secret once, send it to the kubernetes cluster, and let it live there. I could still reference in my configs by name, but with the hash, I can’t really do that anymore. But the hash is also incredibly useful for forcing the restart.

How are people dealing with this?

Go to Source
Author: mpen

Can a kubernetes pod be forced to stay alive after its main command fails?

After starting a long running kubernetes job we’ve found that the final result will fail to upload to its final location. We would like to force the pod to stay open after the main process fails so we can exec in and manually process the final results. If the main process fails and the pod exits before uploading the final result we will lose a large amount of time to re-process the job.

Is there a way to ensure the pod stays alive manually?

Go to Source
Author: David Parks