Version: 8.2411.x.x RR

Kubernetes Deployment Troubleshooting

How to Check the Log of nevisAdmin 4

To check the log of nevisAdmin 4, perform the following commands:

# Find out which pod is used for nevisadmin4
kubectl get pods --all-namespaces | grep nevisadmin4-

# Update the pod name below and check the nevisAdmin4 logs
kubectl logs -n nevisadmin4 nevisadmin4-<random_id>

If there are no issues, the startup should produce a similar output:

Using /usr/lib/jvm/jre-1.8.0/bin/java
2020-05-07 12:02:32,442 [main] INFO c.n.a.v.c.c.u.PropertyProviderImpl - reading config file at 'command line config path: /var/opt/nevisadmin4/conf/nevisadmin4.yml'

 o o o__ __o__/_ o o __o__ o__ __o o o__ __o o o __o__ o o o o
 <|\ <|> <| v <|> <|> | /v v\ <|> <| v\ <|\ /|> | <|\ <|> <|> <|>
 / \\o / \ < > < > < > / \ /> <\ / \ / \ <\ / \\o o// \ / \ / \\o / \ / > < \
 \o/ v\ \o/ | \o o/ \o/ _\o____ o/ \o \o/ \o \o/ v\ /v \o/ \o/ \o/ v\ \o/ \o__ __o/
 | <\ | o__/_ v\ /v | ___o__ <|__ __|> | |> | <\/> | | | <\ | \|__ __|
 / \ \o / \ | <\ /> < > \ / \ / \ // / \ / \ < > / \ \o / \ |
 \o/ v\ \o/ <o> \o o/ | \ / o/ \o \o/ / \o/ \o/ | \o/ v\ \o/ <o>
 | <\ | | v\ /v o o o /v v\ | o | | o | <\ | |
 / \ < \ / \ _\o__/_ <\/> __|>_ <__ __/> /> <\ / \ __/> / \ / \ __|>_ / \ < \ / \

2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Starting SpringRestApplication v4.7.0.162 on nevisadmin4-0.nevisadmin4.user7.svc.cluster.local with PID 8 (/opt/nevisadmin4/bin/nevisadmin4.jar started by nevis in /opt/nevisadmin4/bin)
2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - The following profiles are active: jpa,jwthmac,mariadb,health
2020-05-07 12:02:39,024 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2020-05-07 12:02:39,124 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Start completed.
2020-05-07 12:02:43,887 [main] INFO c.n.admin.v4.core.common.pki.CAImpl - creating a new CA - using DN 'CN=nevisAdmin 4 CA, OU=NEVIS Security'
2020-05-07 12:02:44,965 [main] INFO c.n.a.v.i.s.d.j.SpringDataJpaConfiguration - Initialize secrets service on start-up.
2020-05-07 12:02:48,360 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Started SpringRestApplication in 15.445 seconds (JVM running for 16.957)

Timeout During Deployment

In case of a large deployment, it might be required to increase the timeout in the nevisadmin4.yml file, with the nevisadmin.deployment.poll.timeout property.

A timeout mostly happens if the error occurs in the operator itself. This means that it does not report the status of the deployment.

Check whether the pods are running.
- If they are, the deployment was successful. In this case, extend the timeout in the file nevisadmin4.yml.
- If no pods exists or if not enough pod has been created, check the operator log.

In some edge cases, errors do not reach the UI. In this case the logs of the pods/operator has to be manually checked.

Issues in the Pods

ErrImagePull:
- Could not pull the image. This mostly occur because of a wrong registry or image version. → Normally handled on the UI
CrashloopBackOff:
- Error during component startup, most of the times because of a configuration issue→ normally handled in the UI, but only minimal detail is shown.
- Almost all the other errors are put into this category → check the log of the pods for more details

How to Check the Operator Log

To check the operator log, perform the following commands:

Operator

# find out which pod is used for nevisoperator
kubectl get pods -n <operator-namespace> | grep nevisoperator-controller-manager

# check log of the operator
kubectl logs <operator-pod-name> -c manager -n <operator-namespace>

How to Check the Log of a Pod

The operator pod and the pods of the deployed components resides in different namespaces. Perform the following commands to check the log of a pod:

Operator

# get pod names
kubectl get pods -n <namespace>

# check log of pod
kubectl logs <pod-name> -n <namespace>

If the component started, the log of it will be the log of the pod.

Often you have to check another container inside the pod, for example if it is an error connected to the database setup, or to Git.

Pod

# get the names of the main containers
kubectl get pods <pod-name> -o jsonpath='{.spec.containers[*].name}' -n <namespace>
# get names of the init containers
kubectl get pods <pod-name> -o jsonpath='{.spec.initContainers[*].name}' -n <namespace>

# check the log of a container of a pod
kubectl logs <pod-name> -c <container-name> -n <namespace>

How to Check the Log of the Deployed Component

If the component is running, we can get the logs by simply looking at the log of the pod. The name of the pod is based on the service name of the deployed component.

Component

# find the pod name for the deployed component
kubectl get pods -n <component-namespace> | grep <service-name>

# check log of the component
kubectl logs <pod-name> -n <component-namespace>

How to Get Events

Kubernetes events are objects that provide insight into what is happening inside a cluster, it can provide an easy answer to what went wrong, which would otherwise be harder to find in the logs.

Event

# list the events for a given namespace
kubectl get events --sort-by='.lastTimestamp' -n=<namespace>

An example output:

LAST SEEN   TYPE      REASON    KIND   MESSAGE
47m         Normal    Pulling   Pod    pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"
7m24s       Warning   Failed    Pod    Error: ImagePullBackOff
2m23s       Normal    BackOff   Pod    Back-off pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"

How to Access the Container Where the Component is Running

You can open a bash shell to the docker container, where the component is running. Unfortunately it's only possible for pods with running components, not for the ones that failed to start.

# open bash shell to given pod
kubectl exec -it <pod-name> -n <component-namespace> -- /bin/bash

This way you can check the file system, config files, etc. of a given component, in case of a configuration issue.

Deployment repository cleanup

The git operations during the deployment process could slow down due to the size of the used repository. In this case it is recommended to recreate the repository by following these steps:

(Optional) It's recommended to enable git mirror for the projects that use the repository. For more information see git-mirror here: Kubernetes Infrastructure Inventory YAML file format.
This is done because if a pod restarts after the repository is recreated it will not start, as the upstream tag does not exist anymore.
Delete and recreate the repository.
Force deploy all your nevisAdmin 4 projects that use this repository.

Git mirror volume cleanup

In general the git mirror volume can be cleaned up by checking the git tags used in the NevisComponent or in the generated children resources such as Pod and ReplicasSet, then deleting the unused tags from the volume.

The process can be automated either with an external tool or by creating a Kubernetes CronJob.

The following is an example of how a CronJob could look that deletes the unused tags. Replace mirror-volume with the actual volume name, and kubectl:latest with a docker image that contains both kubectl and bash.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: git-cleanup
spec:
  schedule: "0 0 * * *" # Runs daily at midnight
  jobTemplate:
    spec:
      template:
        spec:
          volumes:
          - name: mirror-volume
            persistentVolumeClaim:
              claimName: mirror-volume
          containers:
          - name: kubectl
            image: kubectl:latest
            command:
            - /bin/bash
            - -c
            - |
              #!/bin/bash

              # Directory containing folders to check
              TARGET_DIR="/git-init/mirror"
              # Get the list of valid tags from the Kubernetes ReplicaSets
              valid_tags=$(kubectl get replicasets -l generatedBy=operator.nevis-security.ch -o jsonpath='{range .items[?(@.spec.replicas>0)]}{range .spec.template.spec.initContainers[*]}{range .env[?(@.name=="NEVIS_GIT_INIT_REPO_TAG")]}{.value}{" "}{end}{end}{end}')

              # Convert the valid tags string to an array
              IFS=' ' read -r -a valid_tags_array <<< "$valid_tags"

              # Iterate over each folder in the target directory
              for folder in "$TARGET_DIR"/*; do
                  if [[ -d "$folder" ]]; then
                      folder_name=$(basename "$folder")
                      # Check if any of the valid tags are contained within the folder name
                      tag_found=false
                      for tag in "${valid_tags_array[@]}"; do
                          if [[ "$folder_name" == *"$tag"* ]]; then
                              tag_found=true
                              break
                          fi
                      done
                      # If tag is not found, delete the folder
                      if [[ $tag_found == false ]]; then
                          echo "Deleting folder: $folder_name"
                          rm -rf "$folder"
                      fi
                  fi
              done
              echo "Cleanup completed."
            volumeMounts:
            - mountPath: /git-init/mirror
              name: mirror-volume
          restartPolicy: OnFailure

The above example assumes that a single mirror volume is used across the namespace, and that the default service account has permissions for the ReplicaSets. Adjust it as needed based on the actual setup.

How to Access the MariaDB Database

By default, the MariaDB database can only be reached from inside the Kubernetes Cluster. You can change this in the Azure Portal, by selecting the mariadb resource, and adding a new firewall rule to Connection security.

After this, any MySQL client can be used to connect to the database from the given IP. You can find the host user information in the Overview menu or in the Connection strings menu of the Azure Portal.

How to customize the Ingress resource

You can customize the generated Ingress resource through the NGINX Ingress Settingspattern. This includes manually defining the certificates that should be used for TLS, in case Let's Encrypt is not desired, defining custom annotations and more.

How to set custom time zone for the components

For Java based components, use the relevant Generic Instance Settings pattern and use the -Duser.timezoneJAVA_OPTS. For other components, you can attach a custom volume with the relevant zone info file:

services:
- ebanking-proxy:
    kubernetes:
      custom-volumes:
      - volumeMount:
          name: tz-istanbul
          mountPath: /etc/localtime
        volume:
          name: tz-istanbul
          hostPath:
            path: /usr/share/zoneinfo/Europe/Istanbul

Commonly Seen Errors and Solutions

nevisAdmin 4 is not reachable, but according to the logs it is running without problems
- Soliution: Check if the correct loadBalancerIP was placed in the nginx.yaml file during the setup process.

Error during deployment step 3 (Preview)

webshop-proxy: failed Couldn't connect to kubernetes:
Error on custom-resource object listing for crd-config 'nevisdatabases':
Unauthorized

Solution: The Kubernetes cluster URL or token is incorrect. For more information on how to configure the Kubernetes connection in the inventory, see Configuring an Example Project and Inventory in the GUI.
```
kubernetes-cluster:
url: <some-url>
token: <some-secret>
```

Error during deployment step 3 (Preview)

webshop-proxy: failed Permission denied.
No access to the repository.

Solution: The SSH key of nevisAdmin 4 and the Git repository are not in sync. For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.

Error during deployment step 4 (Deploy - the first deployment run)

webshop-proxy: nevisProxy Instance (NevisComponent):
failed nevis-git-init error (exit code: 1).
Can't fetch from git? : Deploy to Kubernetes

Solution: The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository. For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
The Kubernetes cluster URL or token is incorrect
For more information on how to configure the Kubernetes connection in the inventory, see Configuring an Example Project and Inventory in the GUI.
```
kubernetes-cluster:
   url: <some-url>
   token: <some-secret>
```
The SSH key of nevisAdmin 4 and the Git repository are not in sync
For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository
For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.

Pushing docker images to the container registry fails

error: Error copying image to the remote destination: Error trying to reuse blob sha256:2f4129b6dea024391198419be227846bebc4a63d94a5c276c2920a3701f41651 at destination: unable to retrieve auth token: invalid username/password: unauthorized: authentication required

Solution: In some systems a manual docker login may be necessary, because of compatibility issues with the Azure cli.

# login to azure
az login

# get the credentials
az acr login -n <container-registry-name> --expose-token

# login to the container registry
docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>

# login to azure
az login

# get the credentials
az acr login -n <container-registry-name> --expose-token

# login to the container registry
docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>

Previously failing migration-job is not restarted on redeployment
If, a migration job fails at the initial setup, for example because the setup was not done correctly, or the db was not yet available, it will not be restarted automatically. Currently the migration job is only restarted by the operator, if the configuration changes. These configuration changes include: Database Host, Database Name, Root Credential Name.
Solution: To restart the job, the workaround is to change any one of the below settings, or to delete the current job:
```
# get jobs
kubectl get jobs -n <namespace>

# get the credentials
kubectl delete job <job-name> -n <namespace>
```
Timeout during deployment
- Solution: A timeout mostly happens if an error occurs in the operator itself. This means that it does not report the status of the deployment. Check the logs of the nevisOperator.

Deployment fails, logs show the following

ssh: handshake failed: knownhosts: key mismatch

Solution: Make sure the correct knownhost is put both to the knownhosts file of nevisAdmin4, and in the GitCredentials object in the component namespace.

nevisIDM migration fails with the following

Caused by: java.sql.SQLException: Can't create table `nevisidm`.`TIDMA_UNIT_PATH` (errno: 150 "Foreign key constraint is incorrectly formed")

Solution: Make sure that the database is configured correctly with the following values:

autocommit=0
transaction-isolation = READ-COMMITTED
log_bin_trust_function_creators = 1
lower_case_table_names = 1
character-set-server = utf8mb4

Ingress resources are not created, or not updated, nevisOperator logs contain the following
```
no matches for kind "Ingress" in version "networking.k8s.io/v1beta1"
```
Solution: If using Kubernetes version 1.22, make sure that you use nevisOperator 4.14 or later.

Migration fails with the following

Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9

Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9

Solution: If the given username and password seem correct, make sure the username is in the <username>@<hostname> format, as it is a requirement by Azure MariaDB Database. The opposite is also true, if you don't use an Azure Database, do not use the above format.
Note: The Example Installation on Azure uses the Azure format by default.

Kubernetes Deployment Troubleshooting

How to Check the Log of nevisAdmin 4​

Timeout During Deployment​

Issues in the Pods​

How to Check the Operator Log​

Operator​

How to Check the Log of a Pod​

Operator​

Pod​

How to Check the Log of the Deployed Component​

Component​

How to Get Events​

Event​

How to Access the Container Where the Component is Running​

Deployment repository cleanup​

Git mirror volume cleanup​

How to Access the MariaDB Database​

How to customize the Ingress resource​

How to set custom time zone for the components​

Commonly Seen Errors and Solutions​

nevisAdmin 4 is not reachable, but according to the logs it is running without problems​

Error during deployment step 3 (Preview)​

Error during deployment step 3 (Preview)​

Error during deployment step 4 (Deploy - the first deployment run)​

The Kubernetes cluster URL or token is incorrect​

The SSH key of nevisAdmin 4 and the Git repository are not in sync​

The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository​

Pushing docker images to the container registry fails​

In some systems a manual docker login may be necessary, because of compatibility issues with the Azure CLI​

Previously failing migration-job is not restarted on redeployment​

Timeout during deployment​

Deployment fails, logs show the following​

nevisIDM migration fails with the following​

Ingress resources are not created, or not updated, nevisOperator logs contain the following​

Migration fails with the following​

How to Check the Log of nevisAdmin 4

Timeout During Deployment

Issues in the Pods

How to Check the Operator Log

Operator

How to Check the Log of a Pod

Operator

Pod

How to Check the Log of the Deployed Component

Component

How to Get Events

Event

How to Access the Container Where the Component is Running

Deployment repository cleanup

Git mirror volume cleanup

How to Access the MariaDB Database

How to customize the Ingress resource

How to set custom time zone for the components

Commonly Seen Errors and Solutions

nevisAdmin 4 is not reachable, but according to the logs it is running without problems

Error during deployment step 3 (Preview)

Error during deployment step 3 (Preview)

Error during deployment step 4 (Deploy - the first deployment run)

The Kubernetes cluster URL or token is incorrect

The SSH key of nevisAdmin 4 and the Git repository are not in sync

The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository

Pushing docker images to the container registry fails

In some systems a manual docker login may be necessary, because of compatibility issues with the Azure CLI

Previously failing migration-job is not restarted on redeployment

Timeout during deployment

Deployment fails, logs show the following

nevisIDM migration fails with the following

Ingress resources are not created, or not updated, nevisOperator logs contain the following

Migration fails with the following