Contact Us

Use the form on the right to contact us.

You can edit the text in this area, and change where the contact form on the right submits to, by entering edit mode using the modes on the bottom right. 

         

123 Street Avenue, City Town, 99999

(123) 555-6789

email@address.com

 

You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.

Essays

Latency & Throughput

Sean Moore

In engineering, there are two common ways to measure the efficacy of a system’s response. There is latency, the amount of time it takes for a single action to be completed from the moment it is requested, and there is throughput, the amount of work done over a given period of time. At first glance, these two metrics appear to just be two ways of measuring the same thing, and on a small-scale, that’s not inaccurate.

Where the two begin to diverge in not-so-subtle ways is under high scale demand. A restaurant during a busy lunchtime rush demonstrates the phenomenon well. There may be a long wait to be served your food – that’s a high latency system – but the restaurant is serving a large number of customers in a short period of time – that’s a high throughput system.

An ideal system, of course, is one where responses have low-latency but also high throughput, serving as many responses at once. But two are often in conflict. To maximize throughput, the system needs to be running near or at full capacity. To minimize latency though, the system has to have capacity available to respond; a hard thing to do if you are already running at full capacity.


There’s often conflict as to which metric to optimize toward, especially between the consumer and business relationship. Consumers take the viewpoint of an individual, and hence are interested in low-latency systems. They desire to complete their tasks as quickly as possible. Businesses, though, are optimized for throughput: they make the most money when they serve the most people in the least amount of time.

These conflicts lead to a mismatch in expectations between customers and the businesses that serve them. It makes great financial sense for a restaurant to have a long wait to get a table, because it means the wait staff, kitchen, and most importantly, the cash register, are all operating at full capacity. But it makes for a poor experience for the restaurant patron, who must wait in agony for long periods of time before being served. Enough dissatisfaction spread over enough people, and that high throughput a restaurant once enjoyed could crumble.


I’ve kept my examples to the physical world for good reason, because capacity, especially in response to short-term spikes in demand, is essentially fixed. A restaurant cannot double in size to accommodate more tables, nor can an assembly line suddenly double production output to meet unanticipated demand.

With the Internet and web technologies, however, the story can be different, even if that isn’t necessarily always the case. With a wealth of real-time data depicting the demand of a service, it’s possible to anticipate sudden surges placed on a system. With smart caching, it’s incredibly easy to ensure the most requested information is set to be delivered as quickly as possible. And, most interestingly, companies now exist that can do real time scaling of web services, increasing the availability of connections to meet demand, and downsizing them when interest wanes.

The web has given us a dramatic ability to align previously counterproductive goals of companies and customers, allowing the form to optimize their resource expenditures, and the latter to minimize the time they spend waiting for a request. This alignment allows businesses to ensure their short-term goals of revenue and profit are met, while maintaining the long-term outlook of the company because users’ interests are satisfied.