GitOps for managing cluster software using GitLab, ArgoCD and Renovate Bot

This article demonstrates how to use the ArgoCD GitOps controller to deploy applications to a Kubernetes cluster. The definitions/manifests of these applications, such as an Ingress controller, monitoring stack, etc., are stored in GitLab. They are automatically updated by Renovate Bot, which regularly scans your GitLab project for outdated dependencies. A demo project illustrates how deployment errors (happening in ArgoCD) can be reported back to GitLab, which automatically creates incident tickets in GitLab, providing observability of failed deployments to your team right in GitLab.

Introduction

The GitOps approach has many advantages over “pushing” code into production (“CI Ops” approach), such as improved security. You can find out more details in my earlier article about GitOps. However, a big disadvantage is the additional complexity. From the point-of-view of your development team, which probably used to deal with only one platform (like GitLab or GitHub) to do everything, the GitOps approach introduces a completely separate, isolated system: the Kubernetes-based GitOps controller, such as ArgoCD or Flux. If something goes wrong in the GitOps controller (such as being unable to deploy an app), your dev-team won’t be automatically notified, unless a cluster administrator configured a monitoring system in the cluster (such as the Prometheus stack that I present in detail in this article series).

This article presents a solution to this problem. The Setup section below explains the technical details, but from a bird’s eye perspective, this happens:

  • We configure a GitOps controller (ArgoCD) to install a set of apps, including the Prometheus monitoring stack, into the cluster.
  • We configure the monitoring stack to detect and send alerts back to the GitLab project, turning them into GitLab issues (incidents).

This closes the loop for your development team, who only needs to look at a single system: GitLab. My proposed solution coincidentally solves another issue: the constantly-evolving software ecosystem. All third-party systems, including the basic cluster components every Kubernetes cluster needs (such as the metrics-server, the monitoring stack, the GitOps controller, the Ingress controller, cert-manager, etc.), get new versions all the time. This article demonstrates how you can use Renovate Bot to automatically update these software components, while having fine-grained control over which applications and which version-upgrade-types (major vs. minor) should happen fully automatically, and which ones should require manual approval. I already talked about Renovate Bot in detail in these articles here and here.

ArgoCD primer

In my proposed solution, I’m using ArgoCD as the GitOps controller implementation. While there are many alternative implementations, such as FluxCD or Fleet, I like the comprehensive web UI of ArgoCD. The web UI really helps diagnosing problems, or observing the progress of a deployment. The only real disadvantage I’ve observed with ArgoCD is that it does not really use the Helm engine when installing Helm charts (which FluxCD does better!). Instead, ArgoCD only applies Helm templating, and converts Helm hooks to ArgoCD hooks. This causes problems with a few Helm charts, which are often resolved by ArgoCD users complaining to the Helm chart authors, who then (sometimes) adapt their chart to be ArgoCD-compatible.

For the remainder of this article, I assume that you have basic level of understanding of ArgoCD. Most notably, you need to understand that ArgoCD installs a new Application CRD. In other words: the ArgoCD is (mostly) stateless, and stores the applications it manages in the cluster’s etcd database, as Application CRs, such as this one:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dummy-echo-server
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://gitlab.com/MShekow/gitops-with-monitoring.git
    path: dummy-echo-server
  destination:
    namespace: default
    server: https://kubernetes.default.svc
  syncPolicy:
    automated:
      prune: true
      selfHeal: trueCode language: YAML (yaml)

GitOps & repository setup

The following figure explains the components of my solution approach:

GitOps repository structure
Solution structure

You can find the template project on GitLab here: https://gitlab.com/MShekow/gitops-with-monitoring

On the left you see the Cluster Configuration management repository, which plays the central role. It stores the state of the applications we want installed in the cluster, in a declarative way as YAML files. Both Renovate Bot and the ArgoCD GitOps controller continuously watch this repository. Renovate Bot automatically updates versions in the YAML files, and ArgoCD automatically applies the changes it observes in the repository to the cluster.

Let’s look at the set up sequence. Someone from the operations team…

  • installs ArgoCD into the cluster, using the argocd-kustomize configuration (see A)
  • … tweaks those parts of the ArgoCD configuration that is not declaratively stored in the Kustomize template, e.g. the ArgoCD login credentials, as instructed in the ArgoCD manual.
  • creates a deploy / access token in GitLab (B), and configures ArgoCD to know about the Cluster configuration repo with this token, so that ArgoCD has read-access to it.
  • instructs ArgoCD to observe the Cluster configuration repo, by applying argocd-bootstrap.yml (C) to the cluster. ArgoCD installs all the applications declared in the app-xyz.yaml files, including the (preconfigured) Prometheus stack (D).
  • sets up the GitLab Prometheus integration which creates an Alert webhook (E) in GitLab. The operator applies a Secret to the cluster that contains the authorization token for that webhook, and creates and pushes a new commit to the configuration repo that updates the URL in the alertmanager-config.yaml file, so that it contains the correct webhook URL.

From now on, the ArgoCD controller continuously monitors the cluster configuration repo, specifically the applications folder, which contains a Helm chart. The Helm chart is not made of the “usual” K8s objects (Deployment, Service, Ingress, …) but contains ArgoCD Application Custom Resource objects. This approach is called the ArgoCD App of Apps pattern. The applications Helm chart is the “parent app”, and the yaml files in its templates folder are the “child apps”.

Data flow for a deployment

From now on, the data flow for a (failing) deployment is as follows (focus on the green arrows):

GitOps repository with app update flow
App update flow
  1. Someone from the ops team, or Renovate Bot, creates commits that update applications, e.g. bumping their versions, or adding new apps to the applications/template folder of the Cluster configuration repo.
  2. ArgoCD detects these changes and attempts to apply them to the cluster. If a deployment failed, the Prometheus metric familiy argocd_app_info indicates it: there will be one or more rows in the time series argocd_app_info{sync_status!="Synced"} or argocd_app_info{health_status!="Healthy"}
  3. Prometheus scrapes these failing metrics, and sends alerts to the Alertmanager.
  4. The Alertmanger detects (after some time) that the alerts are persistent, and sends an alert notification to the GitLab web hook.
  5. The GitLab server notifies maintainers/owners of the Cluster configuration project via mail, and creates an issue (of type “incident”) in GitLab.
    • Note: the recovery of incidents is also automatic. Suppose you fix the deployment issue by updating the YAML files in the configuration repo – or maybe the issue sorted itself out (e.g. when a Helm chart repo was just temporarily unavailable). In either case, once ArgoCD has successfully deployed all apps again, the Prometheus metrics will indicate this (that is, expressions such as argocd_app_info{sync_status!="Synced"} will no longer yield any results). Consequently, the Alertmanager will send a “resolved”-notification to GitLab, and GitLab automatically closes the corresponding GitLab issue/incident for you.

My demo project already comes with a preconfiguration of this data flow. Its README describes the steps you need to do, as well as necessary modifications.

Installing your own applications via GitOps

Apart from components made by third parties, you can also install your own software. Typically, you would store this software in a separate Git project/repository, referred to as application repository. You could have many application repositories, e.g. one per application. I’ve already explained the deployment flow for such a setup here in my GitOps article, under the Multi-repo tab.

Let’s see how you can modify the setup I presented above to also deploy Docker images / Helm charts of your own application:

GitOps flow with application repo
Flow with application repository

On the left, you see an example for one of your application repositories. I omitted the Kubernetes cluster box from the above image simply to avoid clutter – the cluster still exists in the actual setup, of course.

The basic idea is that you add a CD pipeline to your application repository, which runs on each push (step 1), after your project’s CI pipeline successfully built, tested and pushed the Docker images of your application. In step 2, the CD job performs the following steps:

  • Clone the Cluster configuration repo
  • Create or update an ArgoCD Application CR file (app-X.yaml in the above example). This Application CR instructs ArgoCD to pull the application’s Helm chart from the application repo’s chart directory, from a certain branch or tag.
    • The CD job primarily updates that branch/tag whenever it runs. The values.yaml file of that chart contains the updated Docker image version tags, which the developers need to bump whenever they want a new version to be deployed.
  • Create a new commit (that changes/creates the Application CR file), and push it to the Cluster configuration repo.

For this approach to work, there are a few more access tokens involved:

  • In the application repo, you need to create a deploy token, with scope read_repository (and read_registry in case you also store the images in that repo’s GitLab image registry). You need to configure ArgoCD in the cluster to know the application repo, using the deploy token as credentials.
  • To allow the application repo to commit to the Cluster configuration repo (which a simple GitLab deploy token does not permit, as deploy tokens are for read-only use cases), you have two alternatives:
    1. Use (SSH-based) deploy keys (docs): this requires you to generate a SSH keypair (e.g. via ssh-keygen -t rsa -b 4096 -C "gitlab-ci") on some Linux machine. In the Repository settings of the Cluster configuration repo, create a new deploy key (with the “write” checkbox enabled), using the public key of the generated keypair. Next, create two CI/CD variables (e.g. named CONFIG_REPO_PUBLIC_KEY, CONFIG_REPO_PRIVATE_KEY) of type file in the application GitLab project (docs). In the CD job of the application repo, you need to configure the right tools (e.g. using ssh-agent) to be able to clone and push to the Cluster configuration repo, using the private and public key CI/CD variables. See e.g. here for details.
    2. Use project access tokens (docs): create a project access token in the Cluster configuration repo, and use the URL https://project_${PROJECT_ID_OF_CLUSTER_CONFIG_REPO}_bot:${PROJECT_ACCESS_TOKEN}@gitlab.com/${CLUSTER_REPO_PATH}.git for cloning and committing to the Cluster configuration repo. This uses the Project bot user feature (docs).

A note about GitHub

In this article, I’m hosting this project on GitLab.com (and it also works on self-hosted GitLab instances), simply because GitLab comes with a Prometheus-stack-integration. Other platforms, such as GitHub, do not have this feature, but you can use projects such as the alertmanager-github-receiver instead, which turns Alert notifications (sent by the Alertmanager) into GitHub issues.

Conclusion

The GitOps approach offers many advantages, such as continuous reconciliation between what we declaratively specify we want to have deployed, and what actually is deployed. The fact that CI and CD are now separated also improves security. In this project I have shown how to overcome the main disadvantage of this separation: lack of monitoring when deployments go wrong.

Although it takes a bit of time to understand my approach, or to tweak its settings, I’ve come to find it invaluable to have a completely declaratively specified cluster setup. It makes it easy to set up new clusters that use the same stack, be it for the same or a different project.

As I indicated above, you can exchange the individual components for others. You can, for instance, use GitHub instead of GitLab, or exchange ArgoCD for other GitOps controllers, such as FluxCD.

If you stick to my proposed setup, using GitLab, be aware that you need to operate your own Renovate Bot instance yourself. See here in the Renovate Bot docs for why the makers of the tool do not have a SaaS offer. I’ve presented how to run your own Renovate Bot instance in this article.

11 thoughts on “GitOps for managing cluster software using GitLab, ArgoCD and Renovate Bot”

    • Hi. Indeed, there is no (physically stored) file in the repository at that path. Instead, this is the URL generated by the GitLab Prometheus integration. Please search for “In GitLab, configure the Prometheus integration as follows” in the README of the repository to learn how to set up this integratinon. You will see that URL in one of the Prometheus-integration dialogs.

      Reply
  1. Hello. Could you help me with enabling the external Prometheus?

    On “GitLab – Settings – Monitoring – Alerts – Current integration – Configure details” in line “Prometheus API base URL” I specified the address of the external prometheus “https://mydomain”

    In “Alert settings” tab I enabled the checkboxes “Create an incident. Incidents are created for each alert triggered” and
    “Send a single email notification to Owners and Maintainers for new alerts”

    In “Alerts – Current integrations – View credentials tab” I copied “Authorization key”. My key: ae68e940b126242c6455c507994e1a82

    Then I created Secret:
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
    name: gitlab-webhook-auth-key
    data:
    authKey: ae68e940b126242c6455c507994e1a82

    $ kubectl get secret
    NAME TYPE DATA AGE
    argocd-application-controller-token-5dvhn kubernetes.io/service-account-token 3 4d20h
    argocd-applicationset-controller-token-hxz4g kubernetes.io/service-account-token 3 4d20h
    argocd-dex-server-token-fp4tg kubernetes.io/service-account-token 3 4d20h
    argocd-image-updater-secret Opaque 0 4d21h
    argocd-image-updater-token-f6rbj kubernetes.io/service-account-token 3 4d21h
    argocd-initial-admin-secret Opaque 1 4d20h
    argocd-notifications-controller-token-s2ts9 kubernetes.io/service-account-token 3 4d20h
    argocd-notifications-secret Opaque 0 4d20h
    argocd-redis-token-xmwl2 kubernetes.io/service-account-token 3 4d20h
    argocd-secret Opaque 5 4d20h
    argocd-server-token-ff5j2 kubernetes.io/service-account-token 3 4d20h
    default-token-4dvs7 kubernetes.io/service-account-token 3 4d21h
    repo-3039993659 Opaque

    As a result, I still get a 404 error when I try to go to the address https://gitlab.mydomain/devops/gitops/prometheus/alerts/notify.json

    However, the alert manager works correctly and alerts “Bad sync status” and “Bad health status” react to events.

    Reply
    • > However, the alert manager works correctly and alerts “Bad sync status” and “Bad health status” react to events.

      So that looks as if actually everything works fine.

      > I still get a 404 error when I try to go to the address

      What do you mean with “go to the address”? If you mean: “open address in the web browser” (which would be a GET request) then this is expected behavior. I’m quite sure that the endpoint (https://gitlab.mydomain/devops/gitops/prometheus/alerts/notify.json) only expects PUT or POST requests.

      Reply
    • Also, make sure the first encode your Authorization key with base64 before pasting it into the authKey field of the Kubernetes secret.

      Reply
  2. My alertmanaget-config.yaml
    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
    name: argocd-alerts
    namespace: argocd
    spec:
    route:
    groupBy: [‘job’]
    groupWait: 30s
    groupInterval: 2m
    repeatInterval: 12h
    receiver: ‘gitlab-webhook’
    matchers:
    – name: job
    value: argocd-metrics
    receivers:
    – name: ‘gitlab-webhook’
    webhookConfigs:
    – url: https://gitlab.mydomain/devops/gitops/prometheus/alerts/notify.json
    httpConfig:
    bearerTokenSecret:
    name: gitlab-webhook-auth-key
    key: ae68e940b126242c6455c507994e1a82

    Reply

Leave a Comment