GhostTypes: Specific types for better maintainability

Status: draft

Introduction

This article describes “ghost types”: a simple principle to improve software readability and correctness.

A strong formulation of this principle is this: the only permissible use of native types is the definition of custom types.

native types refers to the primitive types of your programming language, such as int or std::string, std::vector<>, or std::map<>. custom types are those you use to express your business logic.

The previous formulation is somewhat extreme; you’ll find plenty of valid exceptions. Particularly, local computations —fully constrained to a specific scope— are okay. However, the strong formulation serves as a good north star.

Person class

For a Person class, don’t do this:

class Person {
  std::string first_name;
  std::string last_name;
  std::string city;
  int shoe_size;
};

Instead, define alias types —the “ghost types”— and do something like:

class Person {
  FirstName first_name;
  LastName last_name;
  CityName city;
  ShoeSize shoe_size;
};

Requirements

Depending on the programming language, there are many different approaches to define ghost types.

The following are my requirements:

Simplicity of declaration
It should be trivial to declare a new type as an alias of an existing type. One should simply need to say something like “FirstName is just an std::string”, with minimal boilerplate.
Method correspondence
All methods of the nested type should, by default, be applicable to the ghost type: methods (and operators) in std::string —like empty, find, starts_with, etc.— should be directly available for FirstName.
Static type safety
Passing a LastName or an std::string to a function that expects a FirstName is an error (even though they are all aliases of std::string) This error must be detected statically.

Implicit conversion from ghost types to their internal type (e.g., from FirstName to std::string) is allowed (though not required).

Why use ghost types?

By expressing business logic using specific types based on the nouns of your models, your code becomes significantly more readable: every expression reflects its semantics more explicitly. The additional type safety makes your software more robust.

Consider an customer of an AddArtist function. An implementation without ghost types does this:

museum.AddArtist("David", "Hockney", "Paris");

We can’t tell, without looking at the definition, if we’re passing the arguments in the right order. The ghost-types version is much safer:

museum.AddArtist(FirstName{"David"},
                 LastName{"Hockney"},
                 CityName{"Paris"});

If the code compiles, we know we got the order right.

An argument against ghost types is that the extra layer of indirection adds cognitive load. “What exactly is a FirstName? How can I program if I don’t know?” The objection has some merit —I can’t fully disregard it. However, in my experience, because these types are just aliases of native types, what they are tends to be obvious. Then again, looking at just the usage of FirstName, you can’t tell that it’s just a ghost type, so… your millage may vary, I suppose.

Implementation details

In Python (with static type annotations), NewType is a great match:

FirstName = NewType("FirstName", str)

TypeScript gives you type tags:

type FirstName = string&{readonly __brand: 'FirstName'};

In C++, I don’t know of a standard solution. I have implemented a generic class (targetting C++23), which I use thus:

struct FirstName : public language::GhostType<FirstName, std::string> {
  using GhostType::GhostType;
};

This isn’t perfect —GhostType has to encode knowledge about methods from primitive types explicitly (and use template metaprogramming to enable them conditionally). However, it lets me do things like:

FirstName name{"Alejandro"};
return name.starts_with("Adriana");

In the past I used macros, but I find the templated-class approach cleaner1.

Extra: Validation

An additional advantage of ghost types, when done carefully, is that you can add validation to their constructors. This is useful when not all values representable in the native type are valid in the ghost type.

You can mandate that FirstName can’t be empty; or, for a container, that elements must occur in a given order; or that a probability, mapped to a double, must be in a given range.

This is a powerful safety mechanism: you validate correctness once (when constructing the ghost type) and then rest assured that all ghost type values are valid.

In my naive Bayes implementation I define Probability: a ghost type wrapping a double. Probability automatically validates that all probability values are in the expected [0, 1] range. Not only do I avoid having to add validation logic to complex statements that compute probability values, I also know that this expectation is validated every single time a new probability value is computed.

Conclusion

Using ghost types for business-logic values can greatly increase the readability of your code. By using custom types for different semantic values (rather than just mapping them directly to the native types), you allow static type systems to go significantly further to detect type mismatches.


  1. With templated classes I can define custom methods, which helps me make incremental progress when gradually adjusting and underlying representation.↩︎