Kubernetes Deployment Troubleshooting
How to Check the Log of nevisAdmin 4
To check the log of nevisAdmin 4, perform the following commands:
# Find out which pod is used for nevisadmin4
kubectl get pods --all-namespaces | grep nevisadmin4-
# Update the pod name below and check the nevisAdmin4 logs
kubectl logs -n nevisadmin4 nevisadmin4-<random_id>
If there are no issues, the startup should produce a similar output:
Using /usr/lib/jvm/jre-1.8.0/bin/java
2020-05-07 12:02:32,442 [main] INFO c.n.a.v.c.c.u.PropertyProviderImpl - reading config file at 'command line config path: /var/opt/nevisadmin4/conf/nevisadmin4.yml'
o o o__ __o__/_ o o __o__ o__ __o o o__ __o o o __o__ o o o o
<|\ <|> <| v <|> <|> | /v v\ <|> <| v\ <|\ /|> | <|\ <|> <|> <|>
/ \\o / \ < > < > < > / \ /> <\ / \ / \ <\ / \\o o// \ / \ / \\o / \ / > < \
\o/ v\ \o/ | \o o/ \o/ _\o____ o/ \o \o/ \o \o/ v\ /v \o/ \o/ \o/ v\ \o/ \o__ __o/
| <\ | o__/_ v\ /v | \_\__o__ <|__ __|> | |> | <\/> | | | <\ | \|__ __|
/ \ \o / \ | <\ /> < > \ / \ / \ // / \ / \ < > / \ \o / \ |
\o/ v\ \o/ <o> \o o/ | \ / o/ \o \o/ / \o/ \o/ | \o/ v\ \o/ <o>
| <\ | | v\ /v o o o /v v\ | o | | o | <\ | |
/ \ < \ / \ _\o__/_ <\/> __|>_ <\__ __/> /> <\ / \ __/> / \ / \ __|>_ / \ < \ / \
2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Starting SpringRestApplication v4.7.0.162 on nevisadmin4-0.nevisadmin4.user7.svc.cluster.local with PID 8 (/opt/nevisadmin4/bin/nevisadmin4.jar started by nevis in /opt/nevisadmin4/bin)
2020-05-07 12:02:33,862 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - The following profiles are active: jpa,jwthmac,mariadb,health
2020-05-07 12:02:39,024 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2020-05-07 12:02:39,124 [main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Start completed.
2020-05-07 12:02:43,887 [main] INFO c.n.admin.v4.core.common.pki.CAImpl - creating a new CA - using DN 'CN=nevisAdmin 4 CA, OU=NEVIS Security'
2020-05-07 12:02:44,965 [main] INFO c.n.a.v.i.s.d.j.SpringDataJpaConfiguration - Initialize secrets service on start-up.
2020-05-07 12:02:48,360 [main] INFO c.n.a.v.i.s.r.SpringRestApplication - Started SpringRestApplication in 15.445 seconds (JVM running for 16.957)
Timeout During Deployment
In case of a large deployment, it might be required to increase the timeout in the nevisadmin4.yml file, with the nevisadmin.deployment.poll.timeout property.
A timeout mostly happens if the error occurs in the operator itself. This means that it does not report the status of the deployment.
- Check whether the pods are running.
In some edge cases, errors do not reach the UI. In this case the logs of the pods/operator has to be manually checked.
Issues in the Pods
- ErrImagePull
- CrashloopBackOff
How to Check the Operator Log
To check the operator log, perform the following commands:
Operator
# find out which pod is used for nevisoperator
kubectl get pods -n <operator-namespace> | grep nevisoperator-controller-manager
# check log of the operator
kubectl logs <operator-pod-name> -c manager -n <operator-namespace>
How to Check the Log of a Pod
The operator pod and the pods of the deployed components resides in different namespaces. Perform the following commands to check the log of a pod:
Operator
# get pod names
kubectl get pods -n <namespace>
# check log of pod
kubectl logs <pod-name> -n <namespace>
If the component started, the log of it will be the log of the pod.
Often you have to check another container inside the pod, for example if it is an error connected to the database setup, or to Git.
Pod
# get the names of the main containers
kubectl get pods <pod-name> -o jsonpath='{.spec.containers[*].name}' -n <namespace>
# get names of the init containers
kubectl get pods <pod-name> -o jsonpath='{.spec.initContainers[*].name}' -n <namespace>
# check the log of a container of a pod
kubectl logs <pod-name> -c <container-name> -n <namespace>
How to Check the Log of the Deployed Component
If the component is running, we can get the logs by simply looking at the log of the pod. The name of the pod is based on the service name of the deployed component.
Component
# find the pod name for the deployed component
kubectl get pods -n <component-namespace> | grep <service-name>
# check log of the component
kubectl logs <pod-name> -n <component-namespace>
How to Get Events
Kubernetes events are objects that provide insight into what is happening inside a cluster, it can provide an easy answer to what went wrong, which would otherwise be harder to find in the logs.
Event
# list the events for a given namespace
kubectl get events --sort-by='.lastTimestamp' -n=<namespace>
An example output:
Example
LAST SEEN TYPE REASON KIND MESSAGE
47m Normal Pulling Pod pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"
7m24s Warning Failed Pod Error: ImagePullBackOff
2m23s Normal BackOff Pod Back-off pulling image "na4demo.azurecr.io/nevis/nevisproxy:888"
How to Access the Container Where the Component is Running
You can open a bash shell to the docker container, where the component is running. Unfortunately it's only possible for pods with running components, not for the ones that failed to start.
Bash
# open bash shell to given pod
kubectl exec -it <pod-name> -n <component-namespace> -- /bin/bash
This way you can check the file system, config files, etc. of a given component, in case of a configuration issue.
How to Access the MariaDB Database
By default, the MariaDB database can only be reached from inside the Kubernetes Cluster. You can change this in the Azure Portal, by selecting the mariadb resource, and adding a new firewall rule to Connection security.
After this, any MySQL client can be used to connect to the database from the given IP. You can find the host user information in the Overview menu or in the Connection strings menu of the Azure Portal.
Commonly Seen Errors and Solutions
nevisAdmin 4 is not reachable, but according to the logs it is running without problems
Check if the correct loadBalancerIP was placed in the nginx.yaml file during the setup process.
Error during deployment step 3 (Preview)
webshop-proxy: failed Couldn't connect to kubernetes:
Error on custom-resource object listing for crd-config 'nevisdatabases':
Unauthorized
The Kubernetes cluster URL or token is incorrect. For more information on how to configure the Kubernetes connection in the inventory, see [Configuring an Example Project and Inventory in the GUI]
kubernetes-cluster:
url: <some-url>
token: <some-secret>
Error during deployment step 3 (Preview)
webshop-proxy: failed Permission denied.
No access to the repository.
The SSH key of nevisAdmin 4 and the Git repository are not in sync
For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
Error during deployment step 4 (Deploy - the first deployment run)
webshop-proxy: nevisProxy Instance (NevisComponent):
failed nevis-git-init error (exit code: 1).
Can't fetch from git? : Deploy to Kubernetes
The SSH key configured in GitCredentials for NevisOperator is not in sync with the SSH key of the Git repository
For more information on how to set up nevisAdmin 4 with Git, see Preparing the Git Deployment Repository.
Pushing docker images to the container registry fails
error: Error copying image to the remote destination: Error trying to reuse blob sha256:2f4129b6dea024391198419be227846bebc4a63d94a5c276c2920a3701f41651 at destination: unable to retrieve auth token: invalid username/password: unauthorized: authentication required
In some systems a manual docker login may be necessary, because of compatibility issues with the Azure CLI
# login to azure
az login
# get the credentials
az acr login -n <container-registry-name> --expose-token
# login to the container registry
docker login <loginServer> -u 00000000-0000-0000-0000-000000000000 -p <accessToken>
Previously failing migration-job is not restarted on redeployment
If, a migration job fails at the initial setup, for example because the setup was not done correctly, or the db was not yet available, it will not be restarted automatically. Currently the migration job is only restarted by the operator, if the configuration changes. These configuration changes include: Database Host, Database Name, Root Credential Name. | To restart the job, the workaround is to change any one of the below settings, or to delete the current job:
# get jobs
kubectl get jobs -n <namespace>
# get the credentials
kubectl delete job <job-name> -n <namespace>
Timeout during deployment
A timeout mostly happens if an error occurs in the operator itself. This means that it does not report the status of the deployment. Check the logs of the nevisOperator.
Deployment fails, logs show the following
ssh: handshake failed: knownhosts: key mismatch
Make sure the correct knownhost is put both to the knownhosts file of nevisAdmin4, and in the GitCredentials object in the component namespace.
nevisIDM migration fails with the following
Caused by: java.sql.SQLException: Can't create table `nevisidm`.`TIDMA_UNIT_PATH` (errno: 150 "Foreign key constraint is incorrectly formed")
Make sure that the database is configured correctly with the following values:
autocommit=0
transaction-isolation = READ-COMMITTED
log_bin_trust_function_creators = 1
lower_case_table_names = 1
character-set-server = utf8mb4
Ingress resources are not created, or not updated, nevisOperator logs contain the following
no matches for kind "Ingress" in version "networking.k8s.io/v1beta1"
If using Kubernetes version 1.22, make sure that you use nevisOperator 4.14 or later.
Migration fails with the following
Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9
or
Message : Access denied for user <dbuser>@<dbhost>'@'10.2.56.9
If the given username and password seem correct, make sure the username is in the <username>@<hostname>
format, as it is a requirement by Azure MariaDB Database. The opposite is also true, if you don't use an Azure Database, do not use the above format.Note: The Example Installation on Azureuses the Azure format by default.