Branded Types
Code that handles bare string primitives with important semantic and security concerns isn’t using the type system to its fullest extent.
—Brendan Eich, probably
This is the first post of a series on domain primitives that show how they lay a secure foundation for software. I’ll discuss alternatives to string primitives using an Express web server written in TypeScript as an example, but the concepts introduced here can be generalized to replace any primitive and they can be used in other languages that support type guards.
I’ll also sprinkle in some Semgrep static analysis along the way to help keep developers building on that secure foundation. But first, let’s set the stage with an example.
A stub-your-toe data type
Let’s say that we have a feature in our app that integrates with the service
FooBar where a customer can register a tenant, <tenant>.foobar.net
. This isn’t
that far off from some OAuth flows, but you get the gist. We might have some
(iffy) code that looks like this:
// foobar.ts
export async function register(tenant: string, userId: string) {
const url = `https://${tenant}.foobar.net/api/some-path`;
const secret = getFooBarSecret();
/* Call to url with the secret */
/* Save something to the database */
}
// app.ts
interface FooBarRegisterRequest extends Request {
body: {
tenant: string;
};
}
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
// Use data from authentication middleware
if (!req.user) {
res.status(401).send("Unauthorized");
}
// Get data and use it to register the customer to a FooBar tenant
const { tenant } = req.body;
await register(tenant, req.user.id);
return res.status(200).send("Accepted");
});
Let’s focus on how the tenant
field is handled. In the happy path, a customer
provides a value like "goodcorp"
, and the server calls register(tenant, ...)
, which then sends some metadata and an application secret over to
goodcorp.foobar.net
. Great, right? Well, this is ripe for server-side
request forgery (SSRF).
If some ne’er-do-well threat actor passes in "evil.com/q?="
, the URL used
resolves to evil.com/q?=.foobar.net/api/some-path
and now they’ve stolen our
app’s FooBar credentials. 😮
Barring any SSRF protections, they can even pivot to internal IP addresses (169.254.169.254) or protected servers (secure-vault.internal). 💀
There are lots of great ways to defend against SSRFs in depth, but let’s take an easy path and tack on some validation to the route:
// Get data and use it to register the customer to a tenant
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
...
// Get data, *validate it*, and use it to register the customer to a tenant
const { tenant } = req.body;
if (!/^[a-z0-9]+$/.test(tenant)) {
res.status(400).send("Bad Request");
}
...
});
Awesome, we’ve prevented the vulnerability… for this endpoint alone. What
about other use cases and callers? Sure, you can pull this out into a function
like assertValidTenant()
, but how will you be sure that all endpoints—today
and in the future—will use them? Maybe we can push the validation to that
function instead, but then we run into the same “are you using this right?”
problem for other functions that might use plain tenant
strings.
We can go pretty deep into different SSRF mitigations (like forward proxies),
and we might even push the validation to the register
function instead. Those
are totally valid strategies, but let’s take a data- and domain-centric approach
and look into how tenant
being a string
primitive is underspecified. Our
register
function accepts it as a parameter, and tenant
has zero opinions on
how it should be used or validated. It implicitly asks callers or callees to
validate the user input, and makes or accepts no guarantees, whatsoever. It’s
a stub-your-toe data type. It lets you use and call it, and it happily lets
you bang your pinky toe on it. Owwie zowwie.
I’m positive you have worked with codebases where security-critical areas look the same. Secure-by-design, stub-your-toe data types do not make. (Yoda, probably.)
So let’s introduce our first domain primitive to give better guarantees. Let’s actually use the type system!
Implementation
I always try to approach API design by “writing the code that I wish I had.”
What we want here is a strongly typed parameter for our register
function.
Get outta here, string
!
// foobar.ts
type Tenant = never; // Temporary placeholder
export async function register(tenant: Tenant, userId: string) { ... }
// app.ts
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
...
const { tenant } = req.body; // tenant is a string type
await register(tenant, req.user.id); // 💥 Won't compile!
...
});
We’ve only put a placeholder for the Tenant
type for now, but the register
call will not let you call it when tenant
is still a primitive string
:
src/app.ts:62:18 - error TS2345: Argument of type 'string' is not assignable to
parameter of type 'never'.
62 await register(tenant, req.user.id);
~~~~~~
That’s a feature we want. We want the type system to let us know when we’re
using the wrong data. And we want our context and domain to be represented by
our data types. Let’s now introduce Tenant
as a “branded type”, along with a
validator function:
declare const brand: unique symbol;
export type Brand<T, U extends string> = T & { [brand]: U };
type Tenant = Brand<string, "Tenant">;
function isTenant(value: string): value is Tenant {
return /^[a-z0-9]+$/.test(value);
}
There’s a couple cool things going on here. First, the Brand
type lets us tack
on a brand (think of cattle) to strings and mark them under-the-hood as a
tenant. We’re in “domain primitive” territory, now: we’ve creating an object
that is specific to our domain. It’s no longer an arbitrary, can-be-anything
string. And the brand doesn’t break any of the native string behavior.
Secondly, the validator function isTenant
uses a type predicate,
: tenant is Tenant
. This gives us an awesome, ergonomic property where it
ensures the data exists as a Tenant
if and only if it’s valid:
const maybeTenant = "foobarbaz";
const notATenant = "evil.com/?lol=";
register(maybeTenant, userId); // 💥 Won't compile! Not validated!
register(notATenant, userId); // 💥 Won't compile! Not validated!
if (isTenant(maybeTenant)) {
register(maybeTenant, userId); // ✅ Will compile! Validated!
}
if (isTenant(notATenant)) {
register(notATenant, userId); // 👻 Will never be reached!
}
Let’s go one step further with our type system and create a type assertion:
function assertTenant(value: string): asserts value is Tenant {
if (!isTenant(value)) {
throw new Error("Invalid tenant");
}
}
This acts in a very similar way, but lets you error out:
register(maybeTenant, userId); // 💥 Won't compile!
assertTenant(maybeTenant);
register(maybeTenant, userId); // ✅ Will compile!
Awesome! Now we can update “the code I wish I had” to “the code I have”:
// app.ts
app.post('/foobar/register', async (req: FooBarRegisterRequest, res) => {
...
const { tenant } = req.body; // tenant is a string type
assertTenant(maybeTenant); // tenant is a Tenant type
await register(tenant, req.user.id); // ✅ Will compile!
...
});
Excellent. We’ve used our first domain primitive, a branded type, to generate three awesome properties:
- Type safety. The type assertion coerces valid tenants into a
Tenant
object, and developers have to do this to even use theregister
function. - Run-time validation. We’ll reject any input that isn’t a valid tenant.
- Ability to hook (easily) into static analysis. With a dedicated type, we can easily wire this up to static analysis tools.
I have to emphasize the first two. These mean that developers don’t have to
think, “Gee, do I have to validate this?” They can totally evict that from their
brains. A Tenant
type will exist only if it’s already validated. (Well, unless
the developer overrides it with a const oopsie = value as Tenant
, but we’ll
fix that in the next section.)
What about the last property? Well, we’ve built a secure-by-design building
block, but we don’t have anything that ensures that developers actually use
them for other functions. They might as easily write a unregister
function
that accepts a string. There are plenty of tools to reach for, like
eslint, but let’s address this problem with Semgrep.
Semgrep
If you’re not familiar with Semgrep, it’s a pretty awesome static analysis tool
(docs). I won’t go too deep into it, but for the purpose of this
series, you just need to know that you can write some rules in YAML and
TypeScript. We’ll use it to detect misuse (or lack of use) of our Tenant
type.
We want to target and fix code that looks like this:
function unregister(tenant: string) { /* do unregister-y type things */ }
const disable = (userId: string, tenant: string, days: number) => { /* ... */ };
I’ll gloss over the rule syntax, but the idea is to detect function parameters
that have tenant: string
and change them to tenant: Tenant
:
rules:
- id: missing-branded-type
message: Use `Tenant` instead of a string primitive.
languages: [typescript]
severity: WARNING
patterns:
- pattern-either:
- pattern: "function $FN(..., tenant: $STRING, ...) { ... }"
- pattern: "$FN = (..., tenant: $STRING, ...) => { ... }"
- metavariable-pattern:
metavariable: $STRING
pattern: string
- focus-metavariable: $STRING
fix: Tenant
When we run that with semgrep
, we get:
src/app.ts
rules.missing-branded-type
Use `Tenant` instead of a string primitive.
▶▶┆ Autofix ▶ Tenant
52┆ function unregister(tenant: string) { /* ... */ }
⋮┆----------------------------------------
▶▶┆ Autofix ▶ Tenant
54┆ const disable = (userId: string, tenant: string, days: number) => { /* ... */ };
A really awesome workflow is to run this with --autofix
, and let it fix all
the things for you. Then, you can just let the compiler yell at you about all
the places where callers aren’t correctly using a validator or assertion
function. Nice!
I mentioned we’d fix this, but we can and should prevent or at least warn about
type casting, i.e. value as tenant
:
rules:
- id: type-casted-tenant
message: Don't do this!
languages: [typescript]
severity: WARNING
pattern: $VALUE as Tenant
fix: $VALUE
src/app.ts
rules.type-casted-tenant
Don't do this!
▶▶┆ Autofix ▶ tenant
73┆ const oopsie = tenant as Tenant;
Next
I picked tenants as an example for clarity, but there are so many other domains that you can use these for. They don’t even have to be strings! I think they work best around security and trust boundaries, where you need code around those boundaries to be correct, e.g. authentication, authorization, cryptography, and accounting, to name a few. If you’re looking for inspiration, I highly recommend reading this thread by @mattpocockuk.
I like to think about branded types as way to make things that send you data act safely. You can slap it onto a function’s parameters and it makes a contract about some domain’s data type that all code that calls your function has to abide by. Powerful stuff!
In the next post, I’ll flip the script with tainted types, and introduce a way to make things that you send data to act safely. I hope to catch you there! ✌️
Special thanks to Drew Gregory and Utsav Shah for reviewing this post!