CMake and vcpkg My Way, Part 2

Question - How Do I Use Sanitizers with vcpkg?

In part 1 I gave a basic overview of using vcpkg in manifest mode. In this part I want to discuss how I integrated vcpkg into my CMake setup, which I haven’t seen elsewhere but I hope might be useful to others.

What led me to this particular setup was wanting to use Address Sanitizer across all my code. Address Sanitizer (henceforth abbreviated to ASan) and it’s brothers Undefined Behaviour Sanitizer and Thread Sanitizer, are bluntly amazing. ASan will detect a whole slew of the basic memory bugs that plague C++ development such as off-by-one errors and use-after-free, and not only will detect them but will give you a report that points both to the line where the error occurred and the line that allocated the memory in question. If, like me, you’ve been developing in projects and environments where running a debugger has been a lot of effort, ASan is a massive breath of fresh air. And enabling ASan these days is pretty straightforward - just add -fsanitize=address to both your compiler and linker flags.

But there’s one significant issue with using ASan in a project with dependencies - all your code must be compiled with it, otherwise you will get false positives when your code calls into uninstrumented library code. This is not a theoretical concern, I’ve hit it multiple times, particularly in ITK with container overflow errors. You can use environment variables to suppress these problems, but to me this is like compiling without warnings on - you’re deliberately hobbling a powerful tool.

This introduces a difficulty with using vcpkg, because at least on initial glance it looks like it is difficult to pass in extra compile flags. I initially tried to solve this with custom triplets, but thankfully there turned out to be a much more straightforward way which also allowed me to easily standardise my build settings across projects.

Answer - Define Your Own Toolchain

To solve our problem with sanitizers we need to ensure that a consistent set of compiler flags is used to build every piece of code, including dependencies. CMake projects are historically bad at this, because CMake lets you specify both what you are building and how to build it in the same CMakeLists.txt file, and if you give people functionality they will use it. The what and the how are mostly orthogonal information, and it’s likely that the how will depend on who is compiling the code.

However, CMake does actually make this easy with the concept of a “toolchain” file. The idea is straightforward - a good CMakeLists.txt contains no compiler or linker settings, and instead these are all placed in a separate .cmake file which you then tell CMake about. I can’t take credit for this idea (I think I first picked it up from @MatRopert on Twitter). But having had to deal with one particular vendor library which badly specified a whole bunch of incorrect compiler flags in the middle of their CMakeLists.txt, it is an approach I wholeheartedly support.

The toolchain files are not complicated. Here is the basic part from one of the ones I use in Riesling:

set(MY_FLAGS "-Wall -Wpedantic -Wshadow")
set(MY_FLAGS_DEBUG "-fsanitize=address,undefined -O2")
set(CMAKE_C_FLAGS_INIT ${MY_FLAGS})
set(CMAKE_CXX_FLAGS_INIT ${MY_FLAGS})
set(CMAKE_C_FLAGS_DEBUG_INIT ${MY_FLAGS_DEBUG})
set(CMAKE_CXX_FLAGS_DEBUG_INIT ${MY_FLAGS_DEBUG})

include(CheckCXXCompilerFlag)
include(CMakeToolsHelpers OPTIONAL)
include(CMakePrintHelpers)

The important thing to note here is that it sets the _INIT versions of the FLAGS variables, instead of the variables themselves. This is important, as it allows the user to specify extra flags, or even remove a particular flag, if they are so inclined. Other than that you might notice that I currently define -O2 in debug mode, which is pretty non-standard. I rely on the Eigen library heavily, which is blazingly fast after optimization but runs as slow as molasses in a normal Debug build. Using optimization and address sanitizer to pick up on errors and not grow old in the meantime is a practical compromise.

So that’s the first part of the solution. The second part is how to tell both CMake and vcpkg about the toolchain file. Telling CMake about is straightforward - we run a configure step with -DCMAKE_TOOLCHAIN_FILE=ourfile.cmake. CMake will load the toolchain file first, and then process CMakeLists.txt. Simple.

Except, you may have noticed in the previous part that the way we tell CMake to use vcpkg is by passing in the argument -DCMAKE_TOOLCHAIN_FILE=vcpkg/scripts/buildsystems/vcpkg.cmake. So we apparently have a conflict. However, this vcpkg.cmake is not really a toolchain file. I had this clarified for me here. It’s just the easiest mechanism that the vcpkg team has to jack in to CMake.

There is some way to “chainload” toolchain files, but that sounds complicated. In the linked discussion there was the suggestion to simply include the vcpkg.cmake file from my custom toolchain file. Hence at the end of my toolchain files I have the following line:

include(${CMAKE_CURRENT_LIST_DIR}/vcpkg/scripts/buildsystems/vcpkg.cmake)

which does exactly that. This way, the FLAGS_INIT variables are set by my toolchain before vcpkg sees them, which means that vcpkg will use them and compile all my dependencies exactly as I have specified. Brilliant! Sanitizers for everyone!

There is one final thing to consider here, and that is how to incorporate my toolchain files into my repositories. I work on several different projects which are separated into different repositories (no fancy monorepo for me). However, I generally use the same core dependencies and I definitely want to use Sanitizers across all of them. I could add a toolchain file to each repo, and add vcpkg as a direct submodule. But that makes synchronizing any changes to my toolchain files a chore.

Instead, what I have done, is to create my own CMake repository which contains my toolchain files and a couple of helper files (e.g. BuildType.cmake). This repo then has my fork of vcpkg as a submodule. This set up means that I can keep my slightly tweaked versions of Eigen and ITK consistent across all my projects simply by checking out a particular commit of my CMake repo. I think this is really neat. The only thing to be aware of is that I need to remember to do git submodule update --init --recursive when checking out my actual projects, but I generally wrap the commands I need up in a bootstrap script.

And that concludes this two parter. I hope it’s been helpful. Find me on Twitter or Discord if you have questions or comments.