Troubleshooting¶

Uninstalling¶

To uninstall this solution:

Use the ./launch.sh script to enter the docker container that was used to install this software package as described in the README file in the repository
Navigate to the /workspaces/current folder:
```
cd /workspaces/current
```
There are 2 ways you can uninstall this solution:
Use the /destroy-all.sh script to uninstall all layers of the solution
Navigate into the specific subdirectories of the solution and remove specific layers. These steps should be applied for all layers, in reverse order, starting with the highest-numbered layer first. Repeat for all layers/sub-folders in your solution.
```
cd 200-openshift-gitops 
terraform init
terraform destroy --auto-approve
```

Variables may not be used here.¶

You may encounter an error message containing Variables may not be used here. during terraform execution, similar to the following:

│ Error: Variables not allowed
│
│   on terraform.tfvars line 1:
│    1: cluster_login_token=asdf
│
│ Variables may not be used here.

This error happens when values in a tfvars file are not wrapped in quotes. In this case terraform interprets the value as a variable reference, which does not exist.

To remedy this situation, wrap the value in your terraform.tfvars in quotes.

For example: - cluster_login_token=ABCXYZ is incorrect - cluster_login_token="ABCXYZ" is correct

Intermittent network failures when using Colima¶

If you are using the colima container engine (replacement for Docker Desktop), you may see random network failures when the container is put under heavy network load. This happens when the internal DNS resolver can't keep up with the container's network requests. The workaround is to switch colima to use external DNS instead of it's own internal DNS.

Steps to fix this solution:

Stop Colima using colima stop
Create a file ~/.lima/_config/override.yaml containing the following:
```
useHostResolver: false
dns:
- 8.8.8.8
```
Restart Colima using colima start
Resume your activities where you encountered networking failures. It may be required to execute a terraform destroy command to cleanup invalid/bad state due to network failures.

Resources stuck in `Terminating` state¶

When deleting resources, the namespaces used by the solution occasionally will get stuck in a terminating or inconsistent state. Use the following steps to recover from these conditions:

Follow these steps: - run oc get namespace <namespace> -o yaml on the CLI to get the details for the namespace. Within the yaml output, you can see if resources are stuck in a finalizing state. - Get the details of the remaining resource oc get <type> <instance> -n <namespace> -o yaml to see details on the resources that are stuck and have not been cleaned up. The <type> and <instance> can be found in the output of the previous oc get namespace <namespace> -o yaml command. - Patch the instances to remove the stuck finalizer: oc patch <type> <instance> -n <namespace> -p '{"metadata": {"finalizers": []}}' --type merge - Delete the resource that was stuck: oc delete <type> <instance> -n <namespace> - Go into ArgoCD instance and delete the remaining argo applications

Workspace permission issues¶

Root user on Linux¶

If you are running on a linux machine as root user, the terraform directory is locked down so that only root had write permissions. When the launch.sh script puts you into the docker container, you are no longer root, and you encounter permission denied errors when executing setupWorkspace.sh.

If the user on the host operating system is root, then you have to run chmod g+w -R . before running launch.sh to allow the terraform directory to be group writeable. Once you do this, the permission errors go away, and you can follow the installation instructions.

Legacy launch.sh script¶

IF you are not encountering the root user issue described above, and You may encounter permission errors if you have previously executed this terraform automation using an older launch.sh script (prior to June 2022). If you had previously executed the older launch.sh script, it mounted the workspace volume with root as the owner. The current launch.sh script mounts the workspace volume as the user devops. When trying to execute commands, you will encounter permission errors, and terraform or setupWorkspace.sh commands will only work if you use the sudo command.

If this is the case, the workaround is to remove the workspace volume on your system, so that it can be recreated with the proper ownership.

To do this:

Exit the container using the exit command
Verify that you have the workspace volume by executing docker volume list
Delete the workspace volume using docker volume rm workspace
If this command fails, you may first have to remove containers that reference the volume. User docker ps to list containers and docker rm <container> to remove a container. After you delete the container, re-run docker volume rm workspace to delete the workspace volume.
Use the launch.sh script reenter the container.
Use the setupWorkspace.sh script as described in the README in the repository to reconfigure your workspace and continue with the installation process.

You should never use the sudo command to execute this automation. If you have to use sudo, then something is wrong with your configuration.

That didn't work, what next?¶

If you continue to experience issues with this automation, please file an issue or reach out on our public Discord server.