July 30, 2022

TIL: Stopping Requests Mid Flight

Today I learned that you can stop a HTTP request made with the requests library mid flight. Why would you want to do that? The reason I needed to do it was because I was making HTTP requests with a user provided URL on my own server. That put the user in complete control of what the server was requesting. This allowed them to provide a URL to an external server that would return an extremely large response and perform a DOS attack on the server if that had enough requests going at once.

The key is to stream the response and set a timeout while reading the request.

import requests

SCRAPER_TIMEOUT = 15

class ForceTimeoutException(Exception):
    pass

def safe_scrape_html(url: str) -> str:
    """
    Scrapes the html from a url but will cancel the request
    if the request takes longer than 15 seconds. This is used to mitigate
    DOS attacks from users providing a url with arbitrary large content.
    """
    resp = requests.get(url, timeout=SCRAPER_TIMEOUT, stream=True)

    html_bytes = b""

    start_time = time.time()

    for chunk in resp.iter_content(chunk_size=1024):
        html_bytes += chunk

        if time.time() - start_time > SCRAPER_TIMEOUT:
            raise ForceTimeoutException()

    return html_bytes.decode("utf-8")

In this example you can see that we’re taking each chunk in increments of 1024 bytes and adding it to the html_bytes variable. On each iteration we check to see if the timeout has been exceeded. If it has we raise an exception. If it hasn’t we continue to add the chunk to the html_bytes variable.

Depending on your needs, it’s also possible to make a small adjustment to the code above to use a max bytes size instead of a timeout.