This article demonstrates how to use the ArgoCD GitOps controller to deploy applications to a Kubernetes cluster. The definitions/manifests of these applications, such as an Ingress controller, monitoring stack, etc., are stored in GitLab. They are automatically updated by Renovate Bot, which regularly scans your GitLab project for outdated dependencies. A demo project illustrates how deployment errors (happening in ArgoCD) can be reported back to GitLab, which automatically creates incident tickets in GitLab, providing observability of failed deployments to your team right in GitLab.
Introduction
The GitOps approach has many advantages over “pushing” code into production (“CI Ops” approach), such as improved security. You can find out more details in my earlier article about GitOps. However, a big disadvantage is the additional complexity. From the point-of-view of your development team, which probably used to deal with only one platform (like GitLab or GitHub) to do everything, the GitOps approach introduces a completely separate, isolated system: the Kubernetes-based GitOps controller, such as ArgoCD or Flux. If something goes wrong in the GitOps controller (such as being unable to deploy an app), your dev-team won’t be automatically notified, unless a cluster administrator configured a monitoring system in the cluster (such as the Prometheus stack that I present in detail in this article series).
This article presents a solution to this problem. The Setup section below explains the technical details, but from a bird’s eye perspective, this happens:
- We configure a GitOps controller (ArgoCD) to install a set of apps, including the Prometheus monitoring stack, into the cluster.
- We configure the monitoring stack to detect and send alerts back to the GitLab project, turning them into GitLab issues (incidents).
This closes the loop for your development team, who only needs to look at a single system: GitLab. My proposed solution coincidentally solves another issue: the constantly-evolving software ecosystem. All third-party systems, including the basic cluster components every Kubernetes cluster needs (such as the metrics-server, the monitoring stack, the GitOps controller, the Ingress controller, cert-manager, etc.), get new versions all the time. This article demonstrates how you can use Renovate Bot to automatically update these software components, while having fine-grained control over which applications and which version-upgrade-types (major vs. minor) should happen fully automatically, and which ones should require manual approval. I already talked about Renovate Bot in detail in these articles here and here.
ArgoCD primer
In my proposed solution, I’m using ArgoCD as the GitOps controller implementation. While there are many alternative implementations, such as FluxCD or Fleet, I like the comprehensive web UI of ArgoCD. The web UI really helps diagnosing problems, or observing the progress of a deployment. The only real disadvantage I’ve observed with ArgoCD is that it does not really use the Helm engine when installing Helm charts (which FluxCD does better!). Instead, ArgoCD only applies Helm templating, and converts Helm hooks to ArgoCD hooks. This causes problems with a few Helm charts, which are often resolved by ArgoCD users complaining to the Helm chart authors, who then (sometimes) adapt their chart to be ArgoCD-compatible.
For the remainder of this article, I assume that you have basic level of understanding of ArgoCD. Most notably, you need to understand that ArgoCD installs a new Application
CRD. In other words: the ArgoCD is (mostly) stateless, and stores the applications it manages in the cluster’s etcd database, as Application
CRs, such as this one:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dummy-echo-server
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://gitlab.com/MShekow/gitops-with-monitoring.git
path: dummy-echo-server
destination:
namespace: default
server: https://kubernetes.default.svc
syncPolicy:
automated:
prune: true
selfHeal: true
Code language: YAML (yaml)
GitOps & repository setup
The following figure explains the components of my solution approach:
You can find the template project on GitLab here: https://gitlab.com/MShekow/gitops-with-monitoring
On the left you see the Cluster Configuration management repository, which plays the central role. It stores the state of the applications we want installed in the cluster, in a declarative way as YAML files. Both Renovate Bot and the ArgoCD GitOps controller continuously watch this repository. Renovate Bot automatically updates versions in the YAML files, and ArgoCD automatically applies the changes it observes in the repository to the cluster.
Let’s look at the set up sequence. Someone from the operations team…
- … installs ArgoCD into the cluster, using the
argocd-kustomize
configuration (see A) - … tweaks those parts of the ArgoCD configuration that is not declaratively stored in the Kustomize template, e.g. the ArgoCD login credentials, as instructed in the ArgoCD manual.
- … creates a deploy / access token in GitLab (B), and configures ArgoCD to know about the Cluster configuration repo with this token, so that ArgoCD has read-access to it.
- … instructs ArgoCD to observe the Cluster configuration repo, by applying
argocd-bootstrap.yml
(C) to the cluster. ArgoCD installs all the applications declared in theapp-xyz.yaml
files, including the (preconfigured) Prometheus stack (D). - … sets up the GitLab Prometheus integration which creates an Alert webhook (E) in GitLab. The operator applies a
Secret
to the cluster that contains the authorization token for that webhook, and creates and pushes a new commit to the configuration repo that updates the URL in thealertmanager-config.yaml
file, so that it contains the correct webhook URL.
From now on, the ArgoCD controller continuously monitors the cluster configuration repo, specifically the applications
folder, which contains a Helm chart. The Helm chart is not made of the “usual” K8s objects (Deployment
, Service
, Ingress
, …) but contains ArgoCD Application
Custom Resource objects. This approach is called the ArgoCD App of Apps pattern. The applications
Helm chart is the “parent app”, and the yaml files in its templates
folder are the “child apps”.
Data flow for a deployment
From now on, the data flow for a (failing) deployment is as follows (focus on the green arrows):
- Someone from the ops team, or Renovate Bot, creates commits that update applications, e.g. bumping their versions, or adding new apps to the
applications/template
folder of the Cluster configuration repo. - ArgoCD detects these changes and attempts to apply them to the cluster. If a deployment failed, the Prometheus metric familiy
argocd_app_info
indicates it: there will be one or more rows in the time seriesargocd_app_info{sync_status!="Synced"}
orargocd_app_info{health_status!="Healthy"}
- Prometheus scrapes these failing metrics, and sends alerts to the Alertmanager.
- The Alertmanger detects (after some time) that the alerts are persistent, and sends an alert notification to the GitLab web hook.
- The GitLab server notifies maintainers/owners of the Cluster configuration project via mail, and creates an issue (of type “incident”) in GitLab.
- Note: the recovery of incidents is also automatic. Suppose you fix the deployment issue by updating the YAML files in the configuration repo – or maybe the issue sorted itself out (e.g. when a Helm chart repo was just temporarily unavailable). In either case, once ArgoCD has successfully deployed all apps again, the Prometheus metrics will indicate this (that is, expressions such as
argocd_app_info{sync_status!="Synced"}
will no longer yield any results). Consequently, the Alertmanager will send a “resolved”-notification to GitLab, and GitLab automatically closes the corresponding GitLab issue/incident for you.
- Note: the recovery of incidents is also automatic. Suppose you fix the deployment issue by updating the YAML files in the configuration repo – or maybe the issue sorted itself out (e.g. when a Helm chart repo was just temporarily unavailable). In either case, once ArgoCD has successfully deployed all apps again, the Prometheus metrics will indicate this (that is, expressions such as
My demo project already comes with a preconfiguration of this data flow. Its README describes the steps you need to do, as well as necessary modifications.
Installing your own applications via GitOps
Apart from components made by third parties, you can also install your own software. Typically, you would store this software in a separate Git project/repository, referred to as application repository. You could have many application repositories, e.g. one per application. I’ve already explained the deployment flow for such a setup here in my GitOps article, under the Multi-repo tab.
Let’s see how you can modify the setup I presented above to also deploy Docker images / Helm charts of your own application:
On the left, you see an example for one of your application repositories. I omitted the Kubernetes cluster box from the above image simply to avoid clutter – the cluster still exists in the actual setup, of course.
The basic idea is that you add a CD pipeline to your application repository, which runs on each push (step 1), after your project’s CI pipeline successfully built, tested and pushed the Docker images of your application. In step 2, the CD job performs the following steps:
- Clone the Cluster configuration repo
- Create or update an ArgoCD
Application
CR file (app-X.yaml
in the above example). ThisApplication
CR instructs ArgoCD to pull the application’s Helm chart from the application repo’schart
directory, from a certain branch or tag.- The CD job primarily updates that branch/tag whenever it runs. The
values.yaml
file of that chart contains the updated Docker image version tags, which the developers need to bump whenever they want a new version to be deployed.
- The CD job primarily updates that branch/tag whenever it runs. The
- Create a new commit (that changes/creates the
Application
CR file), and push it to the Cluster configuration repo.
For this approach to work, there are a few more access tokens involved:
- In the application repo, you need to create a deploy token, with scope
read_repository
(andread_registry
in case you also store the images in that repo’s GitLab image registry). You need to configure ArgoCD in the cluster to know the application repo, using the deploy token as credentials. - To allow the application repo to commit to the Cluster configuration repo (which a simple GitLab deploy token does not permit, as deploy tokens are for read-only use cases), you have two alternatives:
- Use (SSH-based) deploy keys (docs): this requires you to generate a SSH keypair (e.g. via
ssh-keygen -t rsa -b 4096 -C "gitlab-ci"
) on some Linux machine. In the Repository settings of the Cluster configuration repo, create a new deploy key (with the “write” checkbox enabled), using the public key of the generated keypair. Next, create two CI/CD variables (e.g. namedCONFIG_REPO_PUBLIC_KEY
,CONFIG_REPO_PRIVATE_KEY
) of type file in the application GitLab project (docs). In the CD job of the application repo, you need to configure the right tools (e.g. usingssh-agent
) to be able to clone and push to the Cluster configuration repo, using the private and public key CI/CD variables. See e.g. here for details. - Use project access tokens (docs): create a project access token in the Cluster configuration repo, and use the URL
https://project_${PROJECT_ID_OF_CLUSTER_CONFIG_REPO}_bot:${PROJECT_ACCESS_TOKEN}@gitlab.com/${CLUSTER_REPO_PATH}.git
for cloning and committing to the Cluster configuration repo. This uses the Project bot user feature (docs).
- Use (SSH-based) deploy keys (docs): this requires you to generate a SSH keypair (e.g. via
A note about GitHub
In this article, I’m hosting this project on GitLab.com (and it also works on self-hosted GitLab instances), simply because GitLab comes with a Prometheus-stack-integration. Other platforms, such as GitHub, do not have this feature, but you can use projects such as the alertmanager-github-receiver instead, which turns Alert notifications (sent by the Alertmanager) into GitHub issues.
Conclusion
The GitOps approach offers many advantages, such as continuous reconciliation between what we declaratively specify we want to have deployed, and what actually is deployed. The fact that CI and CD are now separated also improves security. In this project I have shown how to overcome the main disadvantage of this separation: lack of monitoring when deployments go wrong.
Although it takes a bit of time to understand my approach, or to tweak its settings, I’ve come to find it invaluable to have a completely declaratively specified cluster setup. It makes it easy to set up new clusters that use the same stack, be it for the same or a different project.
As I indicated above, you can exchange the individual components for others. You can, for instance, use GitHub instead of GitLab, or exchange ArgoCD for other GitOps controllers, such as FluxCD.
If you stick to my proposed setup, using GitLab, be aware that you need to operate your own Renovate Bot instance yourself. See here in the Renovate Bot docs for why the makers of the tool do not have a SaaS offer. I’ve presented how to run your own Renovate Bot instance in this article.
Hi.
Thank you for post.
In your repository file https://gitlab.com/MShekow/gitops-with-monitoring/-/blob/main/applications/templates/alertmanager-config.yaml the path is /prometheus/alerts/notify.json, but this path not in the repository.
Hi. Indeed, there is no (physically stored) file in the repository at that path. Instead, this is the URL generated by the GitLab Prometheus integration. Please search for “In GitLab, configure the Prometheus integration as follows” in the README of the repository to learn how to set up this integratinon. You will see that URL in one of the Prometheus-integration dialogs.
Hi. Thank you for reply. Sorry for my inattention.
Hello. Could you help me with enabling the external Prometheus?
On “GitLab – Settings – Monitoring – Alerts – Current integration – Configure details” in line “Prometheus API base URL” I specified the address of the external prometheus “https://mydomain”
In “Alert settings” tab I enabled the checkboxes “Create an incident. Incidents are created for each alert triggered” and
“Send a single email notification to Owners and Maintainers for new alerts”
In “Alerts – Current integrations – View credentials tab” I copied “Authorization key”. My key: ae68e940b126242c6455c507994e1a82
Then I created Secret:
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: gitlab-webhook-auth-key
data:
authKey: ae68e940b126242c6455c507994e1a82
$ kubectl get secret
NAME TYPE DATA AGE
argocd-application-controller-token-5dvhn kubernetes.io/service-account-token 3 4d20h
argocd-applicationset-controller-token-hxz4g kubernetes.io/service-account-token 3 4d20h
argocd-dex-server-token-fp4tg kubernetes.io/service-account-token 3 4d20h
argocd-image-updater-secret Opaque 0 4d21h
argocd-image-updater-token-f6rbj kubernetes.io/service-account-token 3 4d21h
argocd-initial-admin-secret Opaque 1 4d20h
argocd-notifications-controller-token-s2ts9 kubernetes.io/service-account-token 3 4d20h
argocd-notifications-secret Opaque 0 4d20h
argocd-redis-token-xmwl2 kubernetes.io/service-account-token 3 4d20h
argocd-secret Opaque 5 4d20h
argocd-server-token-ff5j2 kubernetes.io/service-account-token 3 4d20h
default-token-4dvs7 kubernetes.io/service-account-token 3 4d21h
repo-3039993659 Opaque
As a result, I still get a 404 error when I try to go to the address https://gitlab.mydomain/devops/gitops/prometheus/alerts/notify.json
However, the alert manager works correctly and alerts “Bad sync status” and “Bad health status” react to events.
> However, the alert manager works correctly and alerts “Bad sync status” and “Bad health status” react to events.
So that looks as if actually everything works fine.
> I still get a 404 error when I try to go to the address
What do you mean with “go to the address”? If you mean: “open address in the web browser” (which would be a GET request) then this is expected behavior. I’m quite sure that the endpoint (https://gitlab.mydomain/devops/gitops/prometheus/alerts/notify.json) only expects PUT or POST requests.
Also, make sure the first encode your Authorization key with base64 before pasting it into the authKey field of the Kubernetes secret.
My alertmanaget-config.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: argocd-alerts
namespace: argocd
spec:
route:
groupBy: [‘job’]
groupWait: 30s
groupInterval: 2m
repeatInterval: 12h
receiver: ‘gitlab-webhook’
matchers:
– name: job
value: argocd-metrics
receivers:
– name: ‘gitlab-webhook’
webhookConfigs:
– url: https://gitlab.mydomain/devops/gitops/prometheus/alerts/notify.json
httpConfig:
bearerTokenSecret:
name: gitlab-webhook-auth-key
key: ae68e940b126242c6455c507994e1a82
value of “key” must be “authKey”, not “ae68e940b126242c6455c507994e1a82”
I don’t understand what you mean by “authKey”?
Please look at the screenshot. I mean this key https://i.ibb.co/gVVb18j/1.png
I mean that the last line of your alertmanager-config.yml must literally be the following:
key: “authKey”
What you are specifying there is NOT the key itself, but the name of the field in the secret that contains the key.
Hello.
Thanks a lot for your help!