Cleanup Gitlab TF states created by terratest

2021-02-21

In the last blog, I described how to create a terraform state in gitlab with terratest per each merge request. It works nicely, but the number of state files will increase over time so it can be a little bit messy.

Gitlab provides an API endpoint to manage these states. Fortunately, the entire API call is specified directly in Gitlab Managed State documentation so It didn’t cost me any time. Just copy the curl and create a new job in gitlab-ci.yml to nuke the terraform state.

remove_state_file:
  stage: cleanup
  image: curlimages/curl
  when: manual
  script:
    - |
      curl \
        --header "Private-Token: $PROJECT_ACCESS_TOKEN" \
        --request DELETE \
        "$CI_SERVER_URL/api/v4/projects/$CI_PROJECT_ID/terraform/state/$CI_COMMIT_REF_SLUG" \      

I replaced some static parts of the curl with predefined CI variables.

cleanup_tf_state

Simple enough. Now you should be able to delete the state via a manual job.

Or not? No permissions? … As you can see in the job above, the curl expects PROJECT_ACCESS_TOKEN environment variable. It’s standard Gitlab API requests so naturally, it requires some token. But how to generate one?

Do not use personal access tokens! At the time of writing, it’s not possible to narrow permissions of the token so it has the same permissions as the owner - you. Instead, use project access tokens that are scoped to a project.

To generate a project access token:

  1. navigate to a project you would like to create token for (same project where terratest runs)
  2. in the Settings menu choose Access tokens and create a new one with API scope
  3. copy the token
  4. in the same repository, in the Settings menu choose CI/CD and expand Variables
  5. add a new variable with the name PROJECT_ACCESS_TOKEN and paste the token as a value

After the token is created, remove_state_file CI job should successfully remove the state file.

Manual jobs suck though. State file becomes redundant as soon as a merge request is merged so running a manual cleanup job seems to be just an extra annoying chore. How do I automate this?

An inconvenience I hit immediately is that pure git has no clue what merge request is. I’ve been digging around over the internet for a while but I was not able to build any reasonable rule to automate cleanup with pure git. If you know about something then definitely shoot me an email. But back to the topic. Instead of creating crazy CI rules, I decided to leverage another Gitlab feature - environments. The key element for this decision is that environments allow running custom jobs after the environment is stopped. And yes, stop trigger is triggered also after merge.

As the next step, modify the terratest job and define an environment. Notice the on_stop attribute which is referencing the cleanup job. I also matched the environment name containing terraform/ prefix with the name of terraform state so it’s easy to manage.

terratest:
  stage: test
  image:
    name: golang:alpine
  when: manual
  environment:
    name: terraform/$CI_COMMIT_REF_SLUG
    on_stop: remove_state_file
  variables:
    CGO_ENABLED: 0
    TF_STATE_NAME: $CI_COMMIT_REF_SLUG
  before_script:
    - apk add terraform
  script:
    - go test

Now, each terratest run in new a branch will create a new environment.

And the last thing missing is to add environment block to the cleanup job.

remove_state_file:
  stage: cleanup
  image: curlimages/curl
  when: manual
  variables:
    GIT_STRATEGY: none
  environment:
    name: terraform/$CI_COMMIT_REF_SLUG
    action: stop
  except:
    - master
  script:
    - |
      curl \
        --header "Private-Token: $PROJECT_ACCESS_TOKEN" \
        --request DELETE \
        "$CI_SERVER_URL/api/v4/projects/$CI_PROJECT_ID/terraform/state/$CI_COMMIT_REF_SLUG" \
      || echo "Unable to cleanup terraform state"      

GIT_STRATEGY is set to none in order to remove state also for deleted branches. Then environemnt is just matching the same name as terratest job to actually remove a correct state. And also it specifies that it’s stop action - the action executing on “stop” button or any other stop trigger.

That’s it. Click on merge and see what happens.