Branded Types

Posted on Oct 21, 2023

tl;dr: A secure-by-design domain primitive that gives type and run-time safety.

Code that handles bare string primitives with important semantic and security concerns isn’t using the type system to its fullest extent.
—Brendan Eich, probably

This is the first post of a series on domain primitives that show how they lay a secure foundation for software. I’ll discuss alternatives to string primitives using an Express web server written in TypeScript as an example, but the concepts introduced here can be generalized to replace any primitive and they can be used in other languages that support type guards.

I’ll also sprinkle in some Semgrep static analysis along the way to help keep developers building on that secure foundation. But first, let’s set the stage with an example.

A stub-your-toe data type

Let’s say that we have a feature in our app that integrates with the service FooBar where a customer can register a tenant, <tenant>.foobar.net. This isn’t that far off from some OAuth flows, but you get the gist. We might have some (iffy) code that looks like this:

// foobar.ts
export async function register(tenant: string, userId: string) {
  const url = `https://${tenant}.foobar.net/api/some-path`;
  const secret = getFooBarSecret();

  /* Call to url with the secret */
  /* Save something to the database */
}

// app.ts
interface FooBarRegisterRequest extends Request {
  body: {
    tenant: string;
  };
}
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
  // Use data from authentication middleware
  if (!req.user) {
    res.status(401).send("Unauthorized");
  }

  // Get data and use it to register the customer to a FooBar tenant
  const { tenant } = req.body;
  await register(tenant, req.user.id);
  return res.status(200).send("Accepted");
});

Let’s focus on how the tenant field is handled. In the happy path, a customer provides a value like "goodcorp", and the server calls register(tenant, ...), which then sends some metadata and an application secret over to goodcorp.foobar.net. Great, right? Well, this is ripe for server-side request forgery (SSRF).

If some ne’er-do-well threat actor passes in "evil.com/q?=", the URL used resolves to evil.com/q?=.foobar.net/api/some-path and now they’ve stolen our app’s FooBar credentials. 😮

Barring any SSRF protections, they can even pivot to internal IP addresses (169.254.169.254) or protected servers (secure-vault.internal). 💀

There are lots of great ways to defend against SSRFs in depth, but let’s take an easy path and tack on some validation to the route:

// Get data and use it to register the customer to a tenant
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
  ...
  // Get data, *validate it*, and use it to register the customer to a tenant
  const { tenant } = req.body;
  if (!/^[a-z0-9]+$/.test(tenant)) {
    res.status(400).send("Bad Request");
  }
  ...
});

Awesome, we’ve prevented the vulnerability… for this endpoint alone. What about other use cases and callers? Sure, you can pull this out into a function like assertValidTenant(), but how will you be sure that all endpoints—today and in the future—will use them? Maybe we can push the validation to that function instead, but then we run into the same “are you using this right?” problem for other functions that might use plain tenant strings.

We can go pretty deep into different SSRF mitigations (like forward proxies), and we might even push the validation to the register function instead. Those are totally valid strategies, but let’s take a data- and domain-centric approach and look into how tenant being a string primitive is underspecified. Our register function accepts it as a parameter, and tenant has zero opinions on how it should be used or validated. It implicitly asks callers or callees to validate the user input, and makes or accepts no guarantees, whatsoever. It’s a stub-your-toe data type. It lets you use and call it, and it happily lets you bang your pinky toe on it. Owwie zowwie.

I’m positive you have worked with codebases where security-critical areas look the same. Secure-by-design, stub-your-toe data types do not make. (Yoda, probably.)

So let’s introduce our first domain primitive to give better guarantees. Let’s actually use the type system!

Implementation

I always try to approach API design by “writing the code that I wish I had.” What we want here is a strongly typed parameter for our register function. Get outta here, string!

// foobar.ts
type Tenant = never; // Temporary placeholder
export async function register(tenant: Tenant, userId: string) { ... }

// app.ts
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
  ...
  const { tenant } = req.body; // tenant is a string type
  await register(tenant, req.user.id); // 💥 Won't compile!
  ...
});

We’ve only put a placeholder for the Tenant type for now, but the register call will not let you call it when tenant is still a primitive string:

src/app.ts:62:18 - error TS2345: Argument of type 'string' is not assignable to
parameter of type 'never'.

62   await register(tenant, req.user.id);
                    ~~~~~~

That’s a feature we want. We want the type system to let us know when we’re using the wrong data. And we want our context and domain to be represented by our data types. Let’s now introduce Tenant as a “branded type”, along with a validator function:

declare const brand: unique symbol;
export type Brand<T, U extends string> = T & { [brand]: U };

type Tenant = Brand<string, "Tenant">;
function isTenant(value: string): value is Tenant {
  return /^[a-z0-9]+$/.test(value);
}

There’s a couple cool things going on here. First, the Brand type lets us tack on a brand (think of cattle) to strings and mark them under-the-hood as a tenant. We’re in “domain primitive” territory, now: we’ve creating an object that is specific to our domain. It’s no longer an arbitrary, can-be-anything string. And the brand doesn’t break any of the native string behavior.

Secondly, the validator function isTenant uses a type predicate, : tenant is Tenant. This gives us an awesome, ergonomic property where it ensures the data exists as a Tenant if and only if it’s valid:

const maybeTenant = "foobarbaz";
const notATenant = "evil.com/?lol=";

register(maybeTenant, userId); // 💥 Won't compile! Not validated!
register(notATenant, userId); // 💥 Won't compile! Not validated!

if (isTenant(maybeTenant)) {
  register(maybeTenant, userId); // ✅ Will compile! Validated!
}

if (isTenant(notATenant)) {
  register(notATenant, userId); // 👻 Will never be reached!
}

Let’s go one step further with our type system and create a type assertion:

function assertTenant(value: string): asserts value is Tenant {
  if (!isTenant(value)) {
    throw new Error("Invalid tenant");
  }
}

This acts in a very similar way, but lets you error out:

register(maybeTenant, userId); // 💥 Won't compile!

assertTenant(maybeTenant);
register(maybeTenant, userId); // ✅ Will compile!

Awesome! Now we can update “the code I wish I had” to “the code I have”:

// app.ts
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
  ...
  const { tenant } = req.body; // tenant is a string type
  assertTenant(maybeTenant); // tenant is a Tenant type
  await register(tenant, req.user.id); // ✅ Will compile!
  ...
});

Excellent. We’ve used our first domain primitive, a branded type, to generate three awesome properties:

Type safety. The type assertion coerces valid tenants into a Tenant object, and developers have to do this to even use the register function.
Run-time validation. We’ll reject any input that isn’t a valid tenant.
Ability to hook (easily) into static analysis. With a dedicated type, we can easily wire this up to static analysis tools.

I have to emphasize the first two. These mean that developers don’t have to think, “Gee, do I have to validate this?” They can totally evict that from their brains. A Tenant type will exist only if it’s already validated. (Well, unless the developer overrides it with a const oopsie = value as Tenant, but we’ll fix that in the next section.)

What about the last property? Well, we’ve built a secure-by-design building block, but we don’t have anything that ensures that developers actually use them for other functions. They might as easily write a unregister function that accepts a string. There are plenty of tools to reach for, like eslint, but let’s address this problem with Semgrep.

Semgrep

If you’re not familiar with Semgrep, it’s a pretty awesome static analysis tool (docs). I won’t go too deep into it, but for the purpose of this series, you just need to know that you can write some rules in YAML and TypeScript. We’ll use it to detect misuse (or lack of use) of our Tenant type.

We want to target and fix code that looks like this:

function unregister(tenant: string) { /* do unregister-y type things */ }

const disable = (userId: string, tenant: string, days: number) => { /* ... */ };

I’ll gloss over the rule syntax, but the idea is to detect function parameters that have tenant: string and change them to tenant: Tenant:

rules:
  - id: missing-branded-type
    message: Use `Tenant` instead of a string primitive.
    languages: [typescript]
    severity: WARNING
    patterns:
      - pattern-either:
          - pattern: "function $FN(..., tenant: $STRING, ...) { ... }"
          - pattern: "$FN = (..., tenant: $STRING, ...) => { ... }"
      - metavariable-pattern:
          metavariable: $STRING
          pattern: string
      - focus-metavariable: $STRING
    fix: Tenant

When we run that with semgrep, we get:

src/app.ts
    rules.missing-branded-type
      Use `Tenant` instead of a string primitive.

        ▶▶┆ Autofix ▶ Tenant
        52┆ function unregister(tenant: string) { /* ... */ }
        ⋮┆----------------------------------------
        ▶▶┆ Autofix ▶ Tenant
        54┆ const disable = (userId: string, tenant: string, days: number) => { /* ... */ };

A really awesome workflow is to run this with --autofix, and let it fix all the things for you. Then, you can just let the compiler yell at you about all the places where callers aren’t correctly using a validator or assertion function. Nice!

I mentioned we’d fix this, but we can and should prevent or at least warn about type casting, i.e. value as tenant:

rules:
  - id: type-casted-tenant
    message: Don't do this!
    languages: [typescript]
    severity: WARNING
    pattern: $VALUE as Tenant
    fix: $VALUE

src/app.ts
    rules.type-casted-tenant
      Don't do this!

        ▶▶┆ Autofix ▶ tenant
        73┆ const oopsie = tenant as Tenant;

I picked tenants as an example for clarity, but there are so many other domains that you can use these for. They don’t even have to be strings! I think they work best around security and trust boundaries, where you need code around those boundaries to be correct, e.g. authentication, authorization, cryptography, and accounting, to name a few. If you’re looking for inspiration, I highly recommend reading this thread by @mattpocockuk.

I like to think about branded types as way to make things that send you data act safely. You can slap it onto a function’s parameters and it makes a contract about some domain’s data type that all code that calls your function has to abide by. Powerful stuff!

In the next post, I’ll flip the script with tainted types, and introduce a way to make things that you send data to act safely. I hope to catch you there! ✌️

Special thanks to Drew Gregory and Utsav Shah for reviewing this post!

Branded Types

A stub-your-toe data type

Implementation

Semgrep

Next