Always Tell When You Stop Telling

Posted on Feb 5, 2023 (updated Feb 22, 2023)
tl;dr: When your system pushes event messages, your final message should be that you have stopped pushing events.

Most services and systems that provide webhooks often lack a critical feature: telling you when it’s been changed or shut off. This is a particularly fragile and dangerous setup if you rely on receiving these events for rare but critical events. It might seem obvious, but I see this missing everywhere, and I wish that weren’t the case!

On Telling

Let’s say you’re using Tailscale (a lovely product) to manage access to your infrastructure and you’ve set up webhooks to notify you of any changes to access control lists. You’ve set this up for ingestion into your fancy logging pipeline or SIEM and you’ve set up various detection rules or notifications so you can know when the state of your system has unexpectedly changed. What happens if or when the renowned evil attacker, Mallory (she’s everywhere!), gets access to the control plane?

  1. Mallory disables the webhook.
  2. Mallory grants herself access to your super-mega-secure vault service.
  3. Your alerting system never receives a webhook, and thus, never alerts you.

Of course, you’ll want to rely on defense-in-depth mechanisms other than this webhook, but the point stands: the service stopped sending events without even a blip of a warning.

Note: Tailscale fixed this issue shortly after the publication of this article. What an awesome team!

As the title suggests, the better pattern is to actually notify downstream systems that you’ve stopped! In an alternate universe:

  1. Mallory disables the webhook.
  2. The service sends a final webhook describing that it’s been disabled.
  3. Your alerting system takes action and you start an investigation.

Sure, maybe Mallory still tampers with the access control, but the point is that you now know you have a blindspot! You have a fighting chance.

The key piece that’s often missed is a feedback loop. Without it, the webhook consumer is left in an indeterminate state: the absence of events does not and will not ever indicate that events will or will not be sent in the future. Webhooks are meant to be stateless events, but we should take some inspiration for how stateful protocols like TCP handle termination—notably, a FIN packet!

This isn’t just for security. There are plenty of things (or people) other than Mallory. It could be for observability, reliability, or for some mission- or business-critical event stream. Wouldn’t you want to know if someone accidentally misconfigured or shut it off?

If you are designing the service that dispatches webhooks, always tell when you stop telling. Don’t leave your consumers and customers guessing: give them better guarantees and observability!

A Final Message

If something doesn’t support this, file a feature request! I did. (Edit: and they fixed it!) In the mean time, consider polling: fetching the state of your system and making some assertions about it. You might want to consider augmenting webhooks with polling, anyway, especially if the service doesn’t retry webhooks without an acknowledgement, or if you need strong delivery guarantees. As a consumer, you might need to tell (discover, detect) when you stop telling as well!

(End of article, nothing follows. 😉)