Tainted Types
Welcome! In the previous post, I covered branded types as the first of a domain primitives series.
Where branded types helped us ensure upstream code is correct, we can flip the script and use another type of domain object (specifically, a value object) to ensure that downstream code is correct. As the adage goes, “always validate user input!” We can “taint” user input as it enters our system and force everything that handles it to either validate or sanitize it.
Let’s get into it!
Implementation
The basic concept is to encapsulate and taint dangerous data inside a value object: a simple, immutable store of data. The most basic implementation might look like this:
class UserInput<T = string> {
constructor(readonly value: T) { }
// Optional, but leave out if you're paranoid!
toString() {
return this.value.toString();
}
}
…and that’s it! Simple, right?
To actually use and benefit from it, you need to choose the ingress or integration point where you taint user input and store them in these objects. Ideally, it’s as close to the source as possible, but that can be quite aggressive. You need to weigh the costs, benefits, and trade-offs.
Because this is all fake code and I’m insane, I’ll insert a top-level middleware in our toy Express/API app that takes all JSON input from the world, and then convert it there:
app.use(express.json({
reviver: (_key, value) => {
if (typeof value === "string") {
return new UserInput(value);
}
return value;
}
}));
This reviver
function takes all strings (your threat model might
be different and you may need to handle other data types) and then stuffs them
into UserInput
objects. This will break many endpoints that don’t have
updated request interfaces and types, so you’ll want great unit test coverage
before changing this wholesale; otherwise, incremental adoption might be more
apt.
It’s important to note that user input doesn’t come only via your API. You very well might be fetching data from another service that contains user input, and it’s just entered into some other website. Generally, I prefer to validate inputs and only sanitize on output, when I know which medium to render to (and thus, how to sanitize it). Another totally valid option is to serialize untrusted fields in your database into this data type. It’s up to you!
For simplicity, let’s assume we made the decision to sanitize input and remove HTML tags for an endpoint that allows a user to update their bio in Markdown:
// user.ts
async function updateBio(userId: string, bio: string) {
/* Save something to the database */
}
// app.ts
interface UserRequest extends Request {
body: {
bio: UserInput;
}
}
app.post("/user", async (req: UserRequest, res) => {
const { bio } = req.body;
await updateBio(req.user.id, bio); // 💥 Won't compile!
return res.status(200).send("Accepted");
});
We’ll be greeted with an error, because we didn’t unpack and handle user input:
src/app.ts:110:32 - error TS2345: Argument of type 'UserInput' is not assignable
to parameter of type 'string'.
110 await updateBio(req.user.id, bio);
~~~
This is great! We’re using our type system to help us find correctness issues: a clear win! A simple sanitizer can fix it:
// sanitizers.ts
type Sanitizer<T = string> = (input: UserInput) => T;
export const sanitizeBio: Sanitizer = (input) => stripHTML(input.value);
// app.ts
app.post("/user", async (req: UserRequest, res) => {
const { bio } = req.body;
const sanitizedBio = sanitizeBio(bio);
await updateBio(req.user.id, sanitizedBio); // ✅ Will compile!
return res.status(200).send("Accepted");
});
Awesome! We tainted user input, used the compiler to flush out where we’re not handling them correctly, and then introduced a sanitizing function to shore them up. Let’s see how we can use Semgrep to improve the roll-out of this tooling.
Semgrep
I mentioned earlier the danger of turning this on wholesale, especially at a top level middleware. If you’re looking for a more incremental approach, you can also use taint tracking. We can write a rule taint request data and make sure that it’s properly asserted, sanitized, or validated:
rules:
- id: sanitize-user-input
message: Sanitize and handle user input
languages: [typescript]
severity: WARNING
mode: taint
pattern-sources:
- patterns:
- pattern-inside: app.$METHOD($ROUTE, ...);
- pattern: req.body
pattern-sanitizers:
- by-side-effect: true
patterns:
- pattern: $SANITIZER($X)
- pattern: $X
- metavariable-regex:
metavariable: $SANITIZER
regex: (assert|sanitize|validate).*
pattern-sinks:
- patterns:
- pattern: await $FN(...)
Request data is tainted and is flagged if it flows into an awaitable function without passing first through a matching sanitizer function that removes that taint. We’re able to correctly identify the original unsanitized code with Semgrep:
src/app.ts
rules.sanitize-user-input
Sanitize and handle user input
108┆ await updateBio(req.user.id, bio);
Taint comes from:
107┆ const { bio } = req.body;
Taint flows through these intermediate variables:
107┆ const { bio } = req.body;
This is how taint reaches the sink:
108┆ await updateBio(req.user.id, bio);
You can use this approach to incrementally move each of your endpoints and resolvers to start serializing data into a tainted type. Once you’re comfortable with the coverage, you can flip the switch on and start serializing all your data.
A quick note: the pattern-sanitizers
directive (correctly) doesn’t flag the
previous branded type example, since it
passed through the assertTenant
function. Sweet!
Next
Tainted types are a great way to ensure the security and correctness of your system, and they complement and interoperate with branded types quite nicely.
So far, we’ve covered how:
- branded types make things that send data to you act safely
- tainted types make things that you send data to act safely
In the next post, I’ll riff on the idea of a value object and introduce a read-once object as a more sophisticated domain primitive. Follow along! ✌️