Skip to main content
Version: 4.20.x.x Java 8 ELS

Kubernetes Deployment Troubleshooting

How to Check the Log of nevisAdmin 4

To check the log of nevisAdmin 4, perform the following commands:

# Find out which pod is used for nevisadmin4
kubectl get pods --all-namespaces | grep nevisadmin4-

# Update the pod name below and check the nevisAdmin4 logs
kubectl logs -n nevisadmin4 nevisadmin4-<random_id>

If there are no issues, the startup should produce a similar output:

Using /usr/lib/jvm/jre-1.8.0/bin/java
2020-05-07 12:02:32,442 [main] INFO c.n.a.v.c.c.u.PropertyProviderImpl - reading config file at 'command line config path: /var/opt/nevisadmin4/conf/nevisadmin4.yml'

o o o__ __o__/_ o o __o__ o__ __o o o__ __o o o __o__ o o o o
<|\ <|> <| v <|> <|> | /v v\ <|> <| v\ <|\ /|> | <|\ <|> <|> <|>
/ \\o / \ < > < > < > / \ /> <\ / \ / \ <\ / \\o o// \ / \ / \\o / \ / > < \
\o/ v\ \o/ | \o o/ \o/ _\o____ o/ \o \o/ \o \o/ v\ /v \o/ \o/ \o/ v\ \o/ \o__ __o/
| <\ | o__/_ v\ /v | ___o__ <|__ __|> | |> | <\/> | | | <\ | \|__ __|
/ \ \o / \ | <\ /> < > \ / \ / \ // / \ / \ < > / \ \o / \ |
\o/ v\ \o/ <o> \o o/ | \ / o/ \o \o/ / \o/ \o/ | \o/ v\ \o/ <o>
| <\ | | v\ /v o o o /v v\ | o | | o | <\ | |
/ \ < \ / \ _\o__/_ <\/> __|>_ <__ __/> /> <\ / \ __/> / \ / \ __|>_ / \ < \ / \

2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Starting SpringRestApplication v4.7.0.162 on nevisadmin4-0.nevisadmin4.user7.svc.cluster.local with PID 8 (/opt/nevisadmin4/bin/nevisadmin4.jar started by nevis in /opt/nevisadmin4/bin)
2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - The following profiles are active: jpa,jwthmac,mariadb,health
2020-05-07 12:02:39,024 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2020-05-07 12:02:39,124 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Start completed.
2020-05-07 12:02:43,887 [main] INFO c.n.admin.v4.core.common.pki.CAImpl - creating a new CA - using DN 'CN=nevisAdmin 4 CA, OU=NEVIS Security'
2020-05-07 12:02:44,965 [main] INFO c.n.a.v.i.s.d.j.SpringDataJpaConfiguration - Initialize secrets service on start-up.
2020-05-07 12:02:48,360 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Started SpringRestApplication in 15.445 seconds (JVM running for 16.957)

Timeout During Deployment

In case of a large deployment, it might be required to increase the timeout in the nevisadmin4.yml file, with the nevisadmin.deployment.poll.timeout property.

A timeout mostly happens if the error occurs in the operator itself. This means that it does not report the status of the deployment.

  • Check whether the pods are running.
    • If they are, the deployment was successful. In this case, extend the timeout in the file nevisadmin4.yml.
    • If no pods exists or if not enough pod has been created, check the operator log.

In some edge cases, errors do not reach the UI. In this case the logs of the pods/operator has to be manually checked.

Issues in the Pods

  • ErrImagePull:
    • Could not pull the image. This mostly occur because of a wrong registry or image version. → Normally handled on the UI
  • CrashloopBackOff:
    • Error during component startup, most of the times because of a configuration issue→ normally handled in the UI, but only minimal detail is shown.
    • Almost all the other errors are put into this category → check the log of the pods for more details

How to Check the Operator Log

To check the operator log, perform the following commands:

Operator

# find out which pod is used for nevisoperator
kubectl get pods -n <operator-namespace> | grep nevisoperator-controller-manager

# check log of the operator
kubectl logs <operator-pod-name> -c manager -n <operator-namespace>

How to Check the Log of a Pod

The operator pod and the pods of the deployed components resides in different namespaces. Perform the following commands to check the log of a pod:

Operator

# get pod names
kubectl get pods -n <namespace>

# check log of pod
kubectl logs <pod-name> -n <namespace>

If the component started, the log of it will be the log of the pod.

Often you have to check another container inside the pod, for example if it is an error connected to the database setup, or to Git.

Pod

# get the names of the main containers
kubectl get pods <pod-name> -o jsonpath='{.spec.containers[*].name}' -n <namespace>
# get names of the init containers
kubectl get pods <pod-name> -o jsonpath='{.spec.initContainers[*].name}' -n <namespace>

# check the log of a container of a pod
kubectl logs <pod-name> -c <container-name> -n <namespace>

How to Check the Log of the Deployed Component

If the component is running, we can get the logs by simply looking at the log of the pod. The name of the pod is based on the service name of the deployed component.

Component

# find the pod name for the deployed component
kubectl get pods -n <component-namespace> | grep <service-name>

# check log of the component
kubectl logs <pod-name> -n <component-namespace>

How to Get Events

Kubernetes events are objects that provide insight into what is happening inside a cluster, it can provide an easy answer to what went wrong, which would otherwise be harder to find in the logs.

Event

# list the events for a given namespace
kubectl get events --sort-by='.lastTimestamp' -n=<namespace>

An example output:

LAST SEEN   TYPE      REASON    KIND   MESSAGE
47m Normal Pulling Pod pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"
7m24s Warning Failed Pod Error: ImagePullBackOff
2m23s Normal BackOff Pod Back-off pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"

How to Access the Container Where the Component is Running

You can open a bash shell to the docker container, where the component is running. Unfortunately it's only possible for pods with running components, not for the ones that failed to start.

Bash

# open bash shell to given pod
kubectl exec -it <pod-name> -n <component-namespace> -- /bin/bash

This way you can check the file system, config files, etc. of a given component, in case of a configuration issue.

How to Access the MariaDB Database

By default, the MariaDB database can only be reached from inside the Kubernetes Cluster. You can change this in the Azure Portal, by selecting the mariadb resource, and adding a new firewall rule to Connection security.

After this, any MySQL client can be used to connect to the database from the given IP. You can find the host user information in the Overview menu or in the Connection strings menu of the Azure Portal.

How to customize the Ingress resource

You can customize the generated Ingress resource through the NGINX Ingress Settingspattern. This includes manually defining the certificates that should be used for TLS, in case Let's Encrypt is not desired, defining custom annotations and more.

How to set custom time zone for the components

For Java based components, use the relevant Generic Instance Settings pattern and use the -Duser.timezoneJAVA_OPTS. For other components, you can attach a custom volume with the relevant zone info file:

services:
- ebanking-proxy:
kubernetes:
custom-volumes:
- volumeMount:
name: tz-istanbul
mountPath: /etc/localtime
volume:
name: tz-istanbul
hostPath:
path: /usr/share/zoneinfo/Europe/Istanbul

Deployment repository cleanup

The git operations during the deployment process could slow down due to the size of the used repository. In this case it is recommended to recreate the repository by following these steps:

  1. (Optional) It's recommended to enable git mirror for the projects that use the repository. For more information see git-mirror here: Kubernetes Infrastructure Inventory YAML file format.

    This is done because if a pod restarts after the repository is recreated it will not start, as the upstream tag does not exist anymore.

    Additionally by setting the kubernetes.git-init.mirror.prioritize property to true the git mirror will be prioritized over the upstream state in case the pod restarts between step 4 and 5.

  2. Force deploy all the nevisAdmin 4 projects that use the repository and note down the new release tags that were created during the deployments in the git repository.

  3. Delete and recreate the repository.

  4. In the empty repo create all the tags that were noted down in step 2. This is required as the deployment preview will fail if the tag no longer exists upstream.

  5. Force deploy all your nevisAdmin 4 projects that use this repository again.

Commonly Seen Errors and Solutions

  • nevisAdmin 4 is not reachable, but according to the logs it is running without problems

    • Soliution: Check if the correct loadBalancerIP was placed in the nginx.yaml file during the setup process.
  • Error during deployment step 3 (Preview)

    webshop-proxy: failed Couldn't connect to kubernetes:
    Error on custom-resource object listing for crd-config 'nevisdatabases':
    Unauthorized
  • Solution: The Kubernetes cluster URL or token is incorrect. For more information on how to configure the Kubernetes connection in the inventory, see Configuring an Example Project and Inventory in the GUI.

    kubernetes-cluster:
    url: <some-url>
    token: <some-secret>
  • Error during deployment step 3 (Preview)

    webshop-proxy: failed Permission denied.
    No access to the repository.
  • Solution: The SSH key of nevisAdmin 4 and the Git repository are not in sync. For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.

  • Error during deployment step 4 (Deploy - the first deployment run)

    webshop-proxy: nevisProxy Instance (NevisComponent):
    failed nevis-git-init error (exit code: 1).
    Can't fetch from git? : Deploy to Kubernetes
  • Solution: The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository. For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.

  • The Kubernetes cluster URL or token is incorrect

    For more information on how to configure the Kubernetes connection in the inventory, see Configuring an Example Project and Inventory in the GUI.

    kubernetes-cluster:
    url: <some-url>
    token: <some-secret>
  • The SSH key of nevisAdmin 4 and the Git repository are not in sync

    For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.

  • The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository

    For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.

  • Pushing docker images to the container registry fails

    error: Error copying image to the remote destination: Error trying to reuse blob sha256:2f4129b6dea024391198419be227846bebc4a63d94a5c276c2920a3701f41651 at destination: unable to retrieve auth token: invalid username/password: unauthorized: authentication required
  • Solution: In some systems a manual docker login may be necessary, because of compatibility issues with the Azure cli.

    # login to azure
    az login

    # get the credentials
    az acr login -n <container-registry-name> --expose-token

    # login to the container registry
    docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>
  • In some systems a manual docker login may be necessary, because of compatibility issues with the Azure CLI

    # login to azure
    az login

    # get the credentials
    az acr login -n <container-registry-name> --expose-token

    # login to the container registry
    docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>
  • Previously failing migration-job is not restarted on redeployment

    If, a migration job fails at the initial setup, for example because the setup was not done correctly, or the db was not yet available, it will not be restarted automatically. Currently the migration job is only restarted by the operator, if the configuration changes. These configuration changes include: Database Host, Database Name, Root Credential Name.

  • Solution: To restart the job, the workaround is to change any one of the below settings, or to delete the current job:

    # get jobs
    kubectl get jobs -n <namespace>

    # get the credentials
    kubectl delete job <job-name> -n <namespace>
  • Timeout during deployment

    • Solution: A timeout mostly happens if an error occurs in the operator itself. This means that it does not report the status of the deployment. Check the logs of the nevisOperator.
  • Deployment fails, logs show the following

    ssh: handshake failed: knownhosts: key mismatch
  • Solution: Make sure the correct knownhost is put both to the knownhosts file of nevisAdmin4, and in the GitCredentials object in the component namespace.

  • nevisIDM migration fails with the following

    Caused by: java.sql.SQLException: Can't create table `nevisidm`.`TIDMA_UNIT_PATH` (errno: 150 "Foreign key constraint is incorrectly formed")
  • Solution: Make sure that the database is configured correctly with the following values:

    autocommit=0
    transaction-isolation = READ-COMMITTED
    log_bin_trust_function_creators = 1
    lower_case_table_names = 1
    character-set-server = utf8mb4
  • Ingress resources are not created, or not updated, nevisOperator logs contain the following

    no matches for kind "Ingress" in version "networking.k8s.io/v1beta1"
  • Solution: If using Kubernetes version 1.22, make sure that you use nevisOperator 4.14 or later.

  • Migration fails with the following

    Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9

    or

    Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9
  • Solutuion: If the given username and password seem correct, make sure the username is in the <username>@<hostname> format, as it is a requirement by Azure MariaDB Database. The opposite is also true, if you don't use an Azure Database, do not use the above format.

    Note: The Example Installation on Azure uses the Azure format by default.