A brief survey on the theory of waiting in line with computers, software, and systems
Given: a bank with one teller, customers taking 10 minutes on average to serve and arriving at a rate of 5.8/hour...
What is the expected wait time? How about with two tellers?
Ref: What happens when you add a new teller? (Cook 2008)
Arrivals (λ) and service time (μ) are distributions!
"M" in M/M/1 is for Markovian (Poisson and Exponential)
Given: customers taking 10 minutes on average to serve and arriving at a rate of 5.8/hour...
Skipping a whole bunch of math aka "rest of the owl"...
Which is better for response time? A single fast server or slower, parallel servers?
Ref: Controlling Queue Delay (Nichols 2012)
Ref: Meet Bandaid, the Dropbox service proxy (Dropbox 2018)
Offline, batch systems
Optimized for throughput
Online, live systems
Critical to load-balancing
Optimized for latency
Ref: Kafka: a Distributed Messaging System for Log Processing (Kreps 2011)
Queueing is central to systems performance:
Ref/Example: Don't Block the Event Loop (or the Worker Pool) (node.js docs)
If your load test is a closed-loop load generator, it might be lying to you!
Ref: Open Versus Closed: A Cautionary Tale (Schroeder 2006)
Ref: Guerilla Capacity Planning (Gunther 2007)
Ref: Task Assignment with Unknown Duration (Harchol-Balter 2002)
There's so much more. Where to learn more about queueing?