Kubernetes Deployment Troubleshooting
How to Check the Log of nevisAdmin 4
To check the log of nevisAdmin 4, perform the following commands:
# Find out which pod is used for nevisadmin4
kubectl get pods --all-namespaces | grep nevisadmin4-
# Update the pod name below and check the nevisAdmin4 logs
kubectl logs -n nevisadmin4 nevisadmin4-<random_id>
If there are no issues, the startup should produce a similar output:
Using /usr/lib/jvm/jre-1.8.0/bin/java
2020-05-07 12:02:32,442 [main] INFO c.n.a.v.c.c.u.PropertyProviderImpl - reading config file at 'command line config path: /var/opt/nevisadmin4/conf/nevisadmin4.yml'
o o o__ __o__/_ o o __o__ o__ __o o o__ __o o o __o__ o o o o
<|\ <|> <| v <|> <|> | /v v\ <|> <| v\ <|\ /|> | <|\ <|> <|> <|>
/ \\o / \ < > < > < > / \ /> <\ / \ / \ <\ / \\o o// \ / \ / \\o / \ / > < \
\o/ v\ \o/ | \o o/ \o/ _\o____ o/ \o \o/ \o \o/ v\ /v \o/ \o/ \o/ v\ \o/ \o__ __o/
| <\ | o__/_ v\ /v | ___o__ <|__ __|> | |> | <\/> | | | <\ | \|__ __|
/ \ \o / \ | <\ /> < > \ / \ / \ // / \ / \ < > / \ \o / \ |
\o/ v\ \o/ <o> \o o/ | \ / o/ \o \o/ / \o/ \o/ | \o/ v\ \o/ <o>
| <\ | | v\ /v o o o /v v\ | o | | o | <\ | |
/ \ < \ / \ _\o__/_ <\/> __|>_ <__ __/> /> <\ / \ __/> / \ / \ __|>_ / \ < \ / \
2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Starting SpringRestApplication v4.7.0.162 on nevisadmin4-0.nevisadmin4.user7.svc.cluster.local with PID 8 (/opt/nevisadmin4/bin/nevisadmin4.jar started by nevis in /opt/nevisadmin4/bin)
2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - The following profiles are active: jpa,jwthmac,mariadb,health
2020-05-07 12:02:39,024 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2020-05-07 12:02:39,124 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Start completed.
2020-05-07 12:02:43,887 [main] INFO c.n.admin.v4.core.common.pki.CAImpl - creating a new CA - using DN 'CN=nevisAdmin 4 CA, OU=NEVIS Security'
2020-05-07 12:02:44,965 [main] INFO c.n.a.v.i.s.d.j.SpringDataJpaConfiguration - Initialize secrets service on start-up.
2020-05-07 12:02:48,360 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Started SpringRestApplication in 15.445 seconds (JVM running for 16.957)
Timeout During Deployment
In case of a large deployment, it might be required to increase the timeout in the nevisadmin4.yml file, with the nevisadmin.deployment.poll.timeout
property.
A timeout mostly happens if the error occurs in the operator itself. This means that it does not report the status of the deployment.
- Check whether the pods are running.
- If they are, the deployment was successful. In this case, extend the timeout in the file nevisadmin4.yml.
- If no pods exists or if not enough pod has been created, check the operator log.
In some edge cases, errors do not reach the UI. In this case the logs of the pods/operator has to be manually checked.
Issues in the Pods
ErrImagePull
:- Could not pull the image. This mostly occur because of a wrong registry or image version. → Normally handled on the UI
CrashloopBackOff
:- Error during component startup, most of the times because of a configuration issue→ normally handled in the UI, but only minimal detail is shown.
- Almost all the other errors are put into this category → check the log of the pods for more details
How to Check the Operator Log
To check the operator log, perform the following commands:
Operator
# find out which pod is used for nevisoperator
kubectl get pods -n <operator-namespace> | grep nevisoperator-controller-manager
# check log of the operator
kubectl logs <operator-pod-name> -c manager -n <operator-namespace>
How to Check the Log of a Pod
The operator pod and the pods of the deployed components resides in different namespaces. Perform the following commands to check the log of a pod:
Operator
# get pod names
kubectl get pods -n <namespace>
# check log of pod
kubectl logs <pod-name> -n <namespace>
If the component started, the log of it will be the log of the pod.
Often you have to check another container inside the pod, for example if it is an error connected to the database setup, or to Git.
Pod
# get the names of the main containers
kubectl get pods <pod-name> -o jsonpath='{.spec.containers[*].name}' -n <namespace>
# get names of the init containers
kubectl get pods <pod-name> -o jsonpath='{.spec.initContainers[*].name}' -n <namespace>
# check the log of a container of a pod
kubectl logs <pod-name> -c <container-name> -n <namespace>
How to Check the Log of the Deployed Component
If the component is running, we can get the logs by simply looking at the log of the pod. The name of the pod is based on the service name of the deployed component.
Component
# find the pod name for the deployed component
kubectl get pods -n <component-namespace> | grep <service-name>
# check log of the component
kubectl logs <pod-name> -n <component-namespace>
How to Get Events
Kubernetes events are objects that provide insight into what is happening inside a cluster, it can provide an easy answer to what went wrong, which would otherwise be harder to find in the logs.
Event
# list the events for a given namespace
kubectl get events --sort-by='.lastTimestamp' -n=<namespace>
An example output:
LAST SEEN TYPE REASON KIND MESSAGE
47m Normal Pulling Pod pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"
7m24s Warning Failed Pod Error: ImagePullBackOff
2m23s Normal BackOff Pod Back-off pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"
How to Access the Container Where the Component is Running
You can open a bash shell to the docker container, where the component is running. Unfortunately it's only possible for pods with running components, not for the ones that failed to start.
# open bash shell to given pod
kubectl exec -it <pod-name> -n <component-namespace> -- /bin/bash
This way you can check the file system, config files, etc. of a given component, in case of a configuration issue.
Deployment repository cleanup
The git operations during the deployment process could slow down due to the size of the used repository. In this case it is recommended to recreate the repository by following these steps:
(Optional) It's recommended to enable git mirror for the projects that use the repository. For more information see
git-mirror
here: Kubernetes Infrastructure Inventory YAML file format.This is done because if a pod restarts after the repository is recreated it will not start, as the upstream tag does not exist anymore.
Delete and recreate the repository.
Force deploy all your nevisAdmin 4 projects that use this repository.
Git mirror volume cleanup
In general the git mirror volume can be cleaned up by checking the git tags used in the NevisComponent
or in the generated children resources such as Pod
and ReplicasSet
, then deleting the unused tags from the volume.
The process can be automated either with an external tool or by creating a Kubernetes CronJob
.
The following is an example of how a CronJob
could look that deletes the unused tags. Replace mirror-volume
with the actual volume name, and kubectl:latest
with a docker image that contains both kubectl
and bash
.
apiVersion: batch/v1
kind: CronJob
metadata:
name: git-cleanup
spec:
schedule: "0 0 * * *" # Runs daily at midnight
jobTemplate:
spec:
template:
spec:
volumes:
- name: mirror-volume
persistentVolumeClaim:
claimName: mirror-volume
containers:
- name: kubectl
image: kubectl:latest
command:
- /bin/bash
- -c
- |
#!/bin/bash
# Directory containing folders to check
TARGET_DIR="/git-init/mirror"
# Get the list of valid tags from the Kubernetes ReplicaSets
valid_tags=$(kubectl get replicasets -l generatedBy=operator.nevis-security.ch -o jsonpath='{range .items[?(@.spec.replicas>0)]}{range .spec.template.spec.initContainers[*]}{range .env[?(@.name=="NEVIS_GIT_INIT_REPO_TAG")]}{.value}{" "}{end}{end}{end}')
# Convert the valid tags string to an array
IFS=' ' read -r -a valid_tags_array <<< "$valid_tags"
# Iterate over each folder in the target directory
for folder in "$TARGET_DIR"/*; do
if [[ -d "$folder" ]]; then
folder_name=$(basename "$folder")
# Check if any of the valid tags are contained within the folder name
tag_found=false
for tag in "${valid_tags_array[@]}"; do
if [[ "$folder_name" == *"$tag"* ]]; then
tag_found=true
break
fi
done
# If tag is not found, delete the folder
if [[ $tag_found == false ]]; then
echo "Deleting folder: $folder_name"
rm -rf "$folder"
fi
fi
done
echo "Cleanup completed."
volumeMounts:
- mountPath: /git-init/mirror
name: mirror-volume
restartPolicy: OnFailure
The above example assumes that a single mirror volume is used across the namespace, and that the default service account has permissions for the ReplicaSets. Adjust it as needed based on the actual setup.
How to Access the MariaDB Database
By default, the MariaDB database can only be reached from inside the Kubernetes Cluster. You can change this in the Azure Portal, by selecting the mariadb resource, and adding a new firewall rule to Connection security.
After this, any MySQL client can be used to connect to the database from the given IP. You can find the host user information in the Overview menu or in the Connection strings menu of the Azure Portal.
How to customize the Ingress resource
You can customize the generated Ingress resource through the NGINX Ingress Settingspattern. This includes manually defining the certificates that should be used for TLS, in case Let's Encrypt is not desired, defining custom annotations and more.
How to set custom time zone for the components
For Java based components, use the relevant Generic Instance Settings pattern and use the -Duser.timezoneJAVA_OPTS. For other components, you can attach a custom volume with the relevant zone info file:
services:
- ebanking-proxy:
kubernetes:
custom-volumes:
- volumeMount:
name: tz-istanbul
mountPath: /etc/localtime
volume:
name: tz-istanbul
hostPath:
path: /usr/share/zoneinfo/Europe/Istanbul
Commonly Seen Errors and Solutions
nevisAdmin 4 is not reachable, but according to the logs it is running without problems
- Soliution: Check if the correct loadBalancerIP was placed in the nginx.yaml file during the setup process.
Error during deployment step 3 (Preview)
webshop-proxy: failed Couldn't connect to kubernetes:
Error on custom-resource object listing for crd-config 'nevisdatabases':
UnauthorizedSolution: The Kubernetes cluster URL or token is incorrect. For more information on how to configure the Kubernetes connection in the inventory, see Configuring an Example Project and Inventory in the GUI.
kubernetes-cluster:
url: <some-url>
token: <some-secret>Error during deployment step 3 (Preview)
webshop-proxy: failed Permission denied.
No access to the repository.Solution: The SSH key of nevisAdmin 4 and the Git repository are not in sync. For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
Error during deployment step 4 (Deploy - the first deployment run)
webshop-proxy: nevisProxy Instance (NevisComponent):
failed nevis-git-init error (exit code: 1).
Can't fetch from git? : Deploy to KubernetesSolution: The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository. For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
The Kubernetes cluster URL or token is incorrect
For more information on how to configure the Kubernetes connection in the inventory, see Configuring an Example Project and Inventory in the GUI.
kubernetes-cluster:
url: <some-url>
token: <some-secret>The SSH key of nevisAdmin 4 and the Git repository are not in sync
For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository
For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
Pushing docker images to the container registry fails
error: Error copying image to the remote destination: Error trying to reuse blob sha256:2f4129b6dea024391198419be227846bebc4a63d94a5c276c2920a3701f41651 at destination: unable to retrieve auth token: invalid username/password: unauthorized: authentication required
Solution: In some systems a manual docker login may be necessary, because of compatibility issues with the Azure cli.
# login to azure
az login
# get the credentials
az acr login -n <container-registry-name> --expose-token
# login to the container registry
docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>In some systems a manual docker login may be necessary, because of compatibility issues with the Azure CLI
# login to azure
az login
# get the credentials
az acr login -n <container-registry-name> --expose-token
# login to the container registry
docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>Previously failing migration-job is not restarted on redeployment
If, a migration job fails at the initial setup, for example because the setup was not done correctly, or the db was not yet available, it will not be restarted automatically. Currently the migration job is only restarted by the operator, if the configuration changes. These configuration changes include: Database Host, Database Name, Root Credential Name.
Solution: To restart the job, the workaround is to change any one of the below settings, or to delete the current job:
# get jobs
kubectl get jobs -n <namespace>
# get the credentials
kubectl delete job <job-name> -n <namespace>Timeout during deployment
- Solution: A timeout mostly happens if an error occurs in the operator itself. This means that it does not report the status of the deployment. Check the logs of the nevisOperator.
Deployment fails, logs show the following
ssh: handshake failed: knownhosts: key mismatch
Solution: Make sure the correct knownhost is put both to the knownhosts file of nevisAdmin4, and in the GitCredentials object in the component namespace.
nevisIDM migration fails with the following
Caused by: java.sql.SQLException: Can't create table `nevisidm`.`TIDMA_UNIT_PATH` (errno: 150 "Foreign key constraint is incorrectly formed")
Solution: Make sure that the database is configured correctly with the following values:
autocommit=0
transaction-isolation = READ-COMMITTED
log_bin_trust_function_creators = 1
lower_case_table_names = 1
character-set-server = utf8mb4Ingress resources are not created, or not updated, nevisOperator logs contain the following
no matches for kind "Ingress" in version "networking.k8s.io/v1beta1"
Solution: If using Kubernetes version 1.22, make sure that you use nevisOperator 4.14 or later.
Migration fails with the following
Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9
or
Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9
Solution: If the given username and password seem correct, make sure the username is in the
<username>@<hostname>
format, as it is a requirement by Azure MariaDB Database. The opposite is also true, if you don't use an Azure Database, do not use the above format.Note: The Example Installation on Azure uses the Azure format by default.