Tainted Types

Posted on Oct 22, 2023
tl;dr: A value object that lets you mark data like user input as dangerous and force all downstream code to handle it correctly.

Welcome! In the previous post, I covered branded types as the first of a domain primitives series.

Where branded types helped us ensure upstream code is correct, we can flip the script and use another type of domain object (specifically, a value object) to ensure that downstream code is correct. As the adage goes, “always validate user input!” We can “taint” user input as it enters our system and force everything that handles it to either validate or sanitize it.

Let’s get into it!

Implementation

Pixellated avatar
"But first, one small caveat: the code I share here is more illustrative than production-ready. Your app and type system is certainly different and more complex, so please don’t just copy this wholesale!"

The basic concept is to encapsulate and taint dangerous data inside a value object: a simple, immutable store of data. The most basic implementation might look like this:

class UserInput<T = string> {
  constructor(readonly value: T) { }
  
  // Optional, but leave out if you're paranoid!
  toString() {
    return this.value.toString();
  }
}

…and that’s it! Simple, right?

To actually use and benefit from it, you need to choose the ingress or integration point where you taint user input and store them in these objects. Ideally, it’s as close to the source as possible, but that can be quite aggressive. You need to weigh the costs, benefits, and trade-offs.

Because this is all fake code and I’m insane, I’ll insert a top-level middleware in our toy Express/API app that takes all JSON input from the world, and then convert it there:

app.use(express.json({
  reviver: (_key, value) => {
    if (typeof value === "string") {
      return new UserInput(value);
    }
    return value;
  }
}));

This reviver function takes all strings (your threat model might be different and you may need to handle other data types) and then stuffs them into UserInput objects. This will break many endpoints that don’t have updated request interfaces and types, so you’ll want great unit test coverage before changing this wholesale; otherwise, incremental adoption might be more apt.

It’s important to note that user input doesn’t come only via your API. You very well might be fetching data from another service that contains user input, and it’s just entered into some other website. Generally, I prefer to validate inputs and only sanitize on output, when I know which medium to render to (and thus, how to sanitize it). Another totally valid option is to serialize untrusted fields in your database into this data type. It’s up to you!

For simplicity, let’s assume we made the decision to sanitize input and remove HTML tags for an endpoint that allows a user to update their bio in Markdown:

// user.ts
async function updateBio(userId: string, bio: string) {
  /* Save something to the database */
}
// app.ts
interface UserRequest extends Request {
  body: {
    bio: UserInput;
  }
}
app.post("/user", async (req: UserRequest, res) => {
  const { bio } = req.body;
  await updateBio(req.user.id, bio); // 💥 Won't compile!
  return res.status(200).send("Accepted");
});

We’ll be greeted with an error, because we didn’t unpack and handle user input:

src/app.ts:110:32 - error TS2345: Argument of type 'UserInput' is not assignable
to parameter of type 'string'.

110   await updateBio(req.user.id, bio);
                                   ~~~

This is great! We’re using our type system to help us find correctness issues: a clear win! A simple sanitizer can fix it:

// sanitizers.ts
type Sanitizer<T = string> = (input: UserInput) => T;

export const sanitizeBio: Sanitizer = (input) => stripHTML(input.value);
// app.ts
app.post("/user", async (req: UserRequest, res) => {
  const { bio } = req.body;
  const sanitizedBio = sanitizeBio(bio);
  await updateBio(req.user.id, sanitizedBio); // ✅ Will compile!
  return res.status(200).send("Accepted");
});

Awesome! We tainted user input, used the compiler to flush out where we’re not handling them correctly, and then introduced a sanitizing function to shore them up. Let’s see how we can use Semgrep to improve the roll-out of this tooling.

Semgrep

I mentioned earlier the danger of turning this on wholesale, especially at a top level middleware. If you’re looking for a more incremental approach, you can also use taint tracking. We can write a rule taint request data and make sure that it’s properly asserted, sanitized, or validated:

rules:
  - id: sanitize-user-input
    message: Sanitize and handle user input
    languages: [typescript]
    severity: WARNING
    mode: taint
    pattern-sources:
      - patterns:
          - pattern-inside: app.$METHOD($ROUTE, ...);
          - pattern: req.body
    pattern-sanitizers:
      - by-side-effect: true
        patterns:
          - pattern: $SANITIZER($X)
          - pattern: $X
          - metavariable-regex:
              metavariable: $SANITIZER
              regex: (assert|sanitize|validate).*
    pattern-sinks:
      - patterns:
          - pattern: await $FN(...)

Request data is tainted and is flagged if it flows into an awaitable function without passing first through a matching sanitizer function that removes that taint. We’re able to correctly identify the original unsanitized code with Semgrep:

src/app.ts
    rules.sanitize-user-input
      Sanitize and handle user input

      108┆ await updateBio(req.user.id, bio);

        Taint comes from:
        107┆ const { bio } = req.body;

        Taint flows through these intermediate variables:
        107┆ const { bio } = req.body;

        This is how taint reaches the sink:
        108┆ await updateBio(req.user.id, bio);

You can use this approach to incrementally move each of your endpoints and resolvers to start serializing data into a tainted type. Once you’re comfortable with the coverage, you can flip the switch on and start serializing all your data.

A quick note: the pattern-sanitizers directive (correctly) doesn’t flag the previous branded type example, since it passed through the assertTenant function. Sweet!

Next

Tainted types are a great way to ensure the security and correctness of your system, and they complement and interoperate with branded types quite nicely.

So far, we’ve covered how:

In the next post, I’ll riff on the idea of a value object and introduce a read-once object as a more sophisticated domain primitive. Follow along! ✌️