When Celery, asyncio, and Playwright Go to War — A Concurrency Deep Dive
The Problem
While building an automation platform, I hit two crashes that took me a while to understand and fix. Both came from the same root: mixing concurrency models that don't know about each other.
Before I explain the crashes, let me build the foundation properly—because without it, the error messages look like random noise.
Part 1: The Basics
What is a program actually doing?
Your CPU executes instructions one at a time. One core, one instruction stream. If you have one task, it runs from start to finish. Simple.
But real software needs to do multiple things at once: handle HTTP requests, query a database, scrape a website, send emails. The question becomes—how do you do multiple things with a CPU that can only do one thing at a time?
There are two completely different answers to that question, and mixing them is exactly what caused my crashes.
Answer 1: Threads and Processes (letting the OS handle it)
The operating system can fake parallelism by rapidly switching between tasks. This is called preemptive multitasking—the OS can interrupt any task at any moment and hand the CPU to something else, without asking permission.
Time → 0ms 5ms 10ms 15ms 20ms
CPU → [Task A][Task B][Task A][Task B][Task A]
From your perspective, both tasks seem to run at the same time. In reality the CPU is just switching fast enough that you can't feel it.
A thread is one unit of execution. Multiple threads live inside the same process and share the same memory space. Creating a thread is relatively cheap, but since threads share memory, you need careful coordination (locks, mutexes) to prevent them from corrupting each other's data.
A process is a fully isolated program—its own memory, its own file handles, its own everything. Processes don't share memory by default. Creating a process is expensive (the OS has to set up a whole new environment), but once created it's completely isolated—no coordination needed, no chance of interference.
Process
├── Memory (heap, stack, code)
├── Thread 1 ← its own stack and execution pointer
├── Thread 2 ← its own stack and execution pointer
└── Thread 3 ← its own stack and execution pointer
What is a Thread Pool?
Creating and destroying threads for every task is expensive. The OS has to allocate memory, set up the thread's internal state, and then clean it all up when it's done. If you have a web server handling thousands of requests per second and you create a new thread for each one, you spend more time managing threads than actually doing work.
A thread pool solves this by pre-creating a fixed number of threads at startup and keeping them alive permanently. When work arrives, it goes into a queue. An idle thread picks it up, does the work, then instead of dying it goes back to the pool and waits for the next task.
Thread Pool (4 threads):
[Thread 1] — idle — picks up Task A — finishes — goes back to idle
[Thread 2] — idle — picks up Task B — finishes — goes back to idle
[Thread 3] — idle — waiting...
[Thread 4] — idle — waiting...
Incoming tasks → Queue → threads pick them up as they become free
This is a foundational pattern used in almost every server framework. Java's Spring uses thread pools by default. Python's concurrent.futures.ThreadPoolExecutor is a thread pool. Celery's --pool=prefork is essentially a process pool—the same idea but with OS processes instead of threads.
Answer 2: The Event Loop (doing it yourself, cooperatively)
Here is a key observation: most of the time, your code isn't running—it's waiting.
Waiting for a network response. Waiting for a database query to finish. Waiting for a file to be read from disk. During all that waiting time, your CPU is sitting completely idle, doing nothing.
The event loop is built around exploiting this idle time:
Event Loop (single thread):
while True:
task = get_next_ready_task()
run it UNTIL it voluntarily pauses (await)
register what it's waiting for
when that thing finishes → put the task back in the queue
The key word is cooperative: tasks voluntarily yield control when they hit an await. The loop never interrupts them. They say "I'm waiting for this network call—go run something else while I wait." When the network responds, the loop resumes that task.
Event Loop Thread:
Task A runs... hits await fetch(url) → suspended, waiting
Task B runs... hits await db.query() → suspended, waiting
Task C runs... hits await sleep(2) → suspended, waiting
[network responds] → Task A resumes → runs to its next await
[db responds] → Task B resumes → runs to its next await
This lets one single thread handle thousands of concurrent I/O operations with almost no overhead. No OS thread-switching cost, no memory per thread, no locks needed (because only one thing ever runs at a time). Python calls its implementation asyncio. The syntax uses async def and await.
The critical limitation:
If any task does something that truly blocks—heavy CPU computation, or launching an external process without async support—the entire loop freezes until it's done. Every other task stops waiting. This is why you must never put heavy blocking work inside an event loop.
What are Greenlets?
Greenlets are a third model—a hybrid that tries to get the benefits of the event loop without changing how you write your code.
- Like the event loop: cooperative (tasks yield voluntarily, the OS never interrupts)
- Like threads: the code looks completely normal and synchronous—no
async/awaitsyntax
Gevent (the library I was using) takes this further by monkey-patching Python's standard library. Monkey-patching means replacing the real functions with fake versions at runtime, before your code even runs:
# gevent does this silently at startup:
import socket
socket.connect = gevent_version_of_connect # replaces real socket
import time
time.sleep = gevent_version_of_sleep # replaces real sleep
So when your normal-looking code calls time.sleep(1), it's actually running gevent's version—which yields to other greenlets instead of truly sleeping. From your code's perspective, nothing changed. It still looks synchronous. But under the hood, gevent is switching between greenlets exactly like an event loop would.
One OS Thread (the gevent hub):
greenlet A: runs → time.sleep(1) → [actually yields] → greenlet B runs
greenlet B: runs → socket.connect() → [actually yields] → greenlet A resumes
All of this happens inside one OS thread. The OS sees a single thread. Gevent manages all the switching internally.
This sounds elegant. And it is—for the right use cases. The problem appears the moment you try to mix it with a system that also wants to own the same low-level resources.
Part 2: The Crashes
My Setup
In my platform, I use Celery to run background jobs—scraping deals, generating content, sending data to the pipeline. The scraping jobs use Playwright (a browser automation library) to open a real Chromium browser, navigate to Jumia, and extract product data. The pipeline logic is written with Python's asyncio (async/await).
Celery worker (--pool=gevent)
└── runs tasks as greenlets inside 1 OS thread
└── each task runs async Python code (asyncio)
└── async code launches Playwright (real Chromium browser)
Three systems. Three different concurrency models. And they do not get along.
Crash 1: "Cannot run the event loop while another loop is running"
My Celery tasks are regular synchronous functions (Celery requires this). But my pipeline logic is all async. To bridge the two, I called asyncio.run() inside each task—which creates an event loop, runs the async code inside it, then destroys the loop when done.
def run_deal_pipeline(): # Celery task — must be sync
result = asyncio.run( # bridge to async world
_run_pipeline_async()
)
This works fine when only one task runs at a time. The problem is gevent runs multiple tasks as greenlets inside the same OS thread.
Here is exactly what happens:
- Thread-locals are variables that each OS thread has its own private copy of. asyncio uses a thread-local to store "which event loop is currently running in this thread."
- With gevent, there is only one OS thread. So there is only one thread-local. So there is only one event loop slot.
- Task A calls
asyncio.run()→ loop starts → stored in the thread-local. - Gevent switches to task B (same thread, same thread-local).
- Task B calls
asyncio.run()→ checks thread-local → sees a loop already running → CRASH.
RuntimeError: Cannot run the event loop while another loop is running
From Python's perspective this makes complete sense. You cannot run an event loop inside another running event loop—it's like trying to start a movie inside a movie that's already playing. The rule is one loop per thread, one thread per loop. Gevent broke that rule silently by making two tasks share a thread.
My first attempt to fix this was to move the async work into a real OS thread via gevent's own threadpool. Real threads have isolated thread-locals, so each would get its own event loop slot. Task A's loop in thread 1, task B's loop in thread 2—no conflict.
It worked. Until Playwright got involved.
Crash 2: "child watchers are only available on the default loop"
Playwright needs to launch Chromium as a subprocess—a completely separate OS process running the browser. To do that, Python uses asyncio.create_subprocess_exec().
When you spawn a subprocess, you need a way to know when it exits. On Linux, the OS notifies the parent process via a signal called SIGCHLD—sent whenever a child process dies. To listen for this signal, someone has to register a signal handler. And on Linux, signal handlers can only be registered on the main thread. This is not a Python rule—it is a hard Unix constraint baked into the operating system.
asyncio's child process watcher respects this: "I only work on the default event loop, which must be on the main thread."
Gevent's monkey-patching also has its own way of tracking child processes via its internal hub.
When I moved the async code into a gevent threadpool thread to fix crash 1, I created crash 2:
- The threadpool thread is not the main thread.
- asyncio's child watcher requires the main thread → refuses to work.
- Gevent's patched subprocess also breaks because the threadpool thread is not gevent's hub thread.
- Playwright tries to launch Chromium, both systems try to own the subprocess tracking, and both fail.
TypeError: child watchers are only available on the default loop
The pieces were fundamentally incompatible:
| System | Wants to own |
|---|---|
| Gevent | subprocess management via its hub |
| asyncio | subprocess management via the default loop's watcher |
| Playwright | asyncio's subprocess support specifically |
All three in the same process, none of them agreeing on who's in charge.
The Real Fix: Prefork
The solution was to stop using gevent entirely and switch Celery to --pool=prefork.
prefork means Celery spawns N real OS processes—not threads, not greenlets. Each process is a completely clean Python runtime:
# Before (broken):
celery worker --pool=gevent --concurrency=4
# After (correct):
celery worker --pool=prefork --concurrency=2
OS Process 1:
main thread → asyncio.run() → Playwright → Chromium subprocess → works
OS Process 2:
main thread → asyncio.run() → Playwright → Chromium subprocess → works
Each process has its own main thread. Each main thread can register signal handlers. Each process has its own clean asyncio state. No monkey-patching, no shared event loops, no conflicts. The asyncio.run() bridge became a clean one-liner again.
The tradeoff:
Processes are heavier than greenlets, so I reduced concurrency from 4 to 2. For tasks that each spawn a full browser, that's the right call anyway—4 simultaneous Chromium instances on a small server would kill it.
Part 3: How Other Ecosystems Handle This
Python + asyncio (prefork — the fix)
After switching to prefork, each Celery worker is an independent OS process with its own clean Python runtime. This is how the architecture looks now:
(Redis)"] subgraph Process1["OS Process 1"] ML1["Main Thread"] EL1["asyncio Event Loop"] PW1["Playwright"] CH1["Chromium"] ML1 --> EL1 --> PW1 --> CH1 end subgraph Process2["OS Process 2"] ML2["Main Thread"] EL2["asyncio Event Loop"] PW2["Playwright"] CH2["Chromium"] ML2 --> EL2 --> PW2 --> CH2 end Broker --> Process1 Broker --> Process2
Each process is fully isolated. No shared event loop, no signal handler conflicts. Clean.
Node.js
Node.js was built around a single event loop from day one. There is no "start a loop" concept—the loop is the entire runtime. From the moment the process starts, the event loop is running. You just write await and everything works inside it automatically.
Crash 1 would be impossible. You never call something equivalent to asyncio.run() in Node because there is no such thing. The loop is always running. There is no loop-per-thread concept to violate.
Crash 2 would not happen either. Node uses libuv—a C++ I/O library that owns all I/O, including subprocess management, at the lowest level. When Playwright launches Chromium in Node, libuv handles the subprocess natively, fully integrated with the event loop. No signal handler confusion, no competing systems.
(libuv)"] subgraph Phases["Loop Phases"] T["Timers
setTimeout / setInterval"] IO["I/O Callbacks
network, disk"] CK["Check
setImmediate"] end EL --> T --> IO --> CK --> EL WQ["Work Queue
async tasks waiting"] CB["Callback Queue
resolved promises"] IO --> WQ WQ --> CB --> EL end libuv["libuv
Owns: sockets, fs,
subprocesses, signals"] NodeProcess <--> libuv
For true parallelism in Node.js, you use worker_threads (real OS threads) or child_process (real OS processes). For my exact use case—browser scraping—you would run multiple Node processes, each with its own Playwright and event loop. Essentially what prefork gives me in Python, but you never hit the crashes because Node never had the gap between synchronous and asynchronous that Python is still closing.
Spring MVC (Java / Kotlin)
Traditional Spring MVC uses one OS thread per request. The thread blocks while waiting for I/O. When you have 1000 concurrent requests, you need 1000 threads. Threads are heavyweight (~1MB stack each), so this becomes expensive fast—but it is simple, predictable, and it works. Crash 1 and Crash 2 cannot happen here because there is no event loop at all. Every thread is fully independent.
Each thread owns its request from start to finish. Simple. Expensive at scale.
Spring WebFlux (Java / Kotlin)
Spring WebFlux is Java's reactive framework—it uses Netty under the hood, which runs a small pool of event loop threads (one per CPU core). The programming model uses reactive streams (Mono, Flux). If you call blocking code inside a reactive chain, you hit the same class of problem I hit—you block the event loop thread. Spring's solution is to offload blocking work to a dedicated thread pool using Schedulers.boundedElastic():
fun scrapeWithPlaywright(): Mono<List<Deal>> {
return Mono.fromCallable {
// blocking Playwright code here
}.subscribeOn(Schedulers.boundedElastic()) // runs on a separate thread pool
}
This is conceptually the same as what I tried with gevent's threadpool—move blocking work off the event loop. The difference is Spring's separation between the reactive thread pool and the blocking thread pool is explicit and well-documented, while gevent's monkey-patching made the boundary invisible until it exploded.
(for blocking work)"] BT1["Blocking Thread 1"] BT2["Blocking Thread 2"] BT3["Blocking Thread 3"] end Req["Incoming Requests"] --> EL1 Req --> EL2 EL1 -->|"non-blocking I/O
(DB, HTTP)"| EL1 EL2 -->|"blocking task
(Playwright, file)"| BT1 EL2 -->|"blocking task"| BT2 BT1 -->|"done → callback"| EL2 BT2 -->|"done → callback"| EL2
Java 21 Virtual Threads are the most elegant solution in any ecosystem. Virtual threads look and behave like normal OS threads (blocking code, no async/await), but the JVM parks them automatically when they block on I/O instead of wasting a real OS thread. Millions of them can exist at once with almost no overhead.
// Spring Boot 3.2+ with virtual threads — you write normal blocking code:
fun scrapeDeals(): List<Deal> {
val response = restClient.get().uri("/flash-sales/").retrieve().body(String::class.java)
return parseDeals(response)
}
The JVM handles the suspension and resumption invisibly. No event loop programming model, no monkey-patching, no crashes. This is the Java answer to what gevent was trying to do—but done properly at the runtime level.
Summary
| Python asyncio | Python gevent | Node.js | Spring MVC | Spring WebFlux | Java 21 vThreads | |
|---|---|---|---|---|---|---|
| Concurrency model | Event loop | Cooperative greenlets | Event loop (libuv) | OS thread pool | Event loop (Netty) | Virtual threads (JVM) |
| "Start a loop" concept | Yes — asyncio.run() | No | No | No | No | No |
| Subprocess from async | Works (with care) | Breaks with asyncio | Native, always works | Always works | Needs thread pool | Always works |
| Monkey-patches stdlib | No | Yes | No | No | No | No |
| Fix for my use case | prefork processes | Abandoned | Not needed | Not needed | boundedElastic() | Not needed |
The Root Cause
The root cause of both crashes is a history problem. Python had synchronous code for decades, then added asyncio, and gevent was an attempt to bridge the old and new worlds via monkey-patching—making blocking code behave cooperatively without rewriting it. That bridge works well for simple I/O. It breaks completely the moment you introduce a system (Playwright) that needs to own subprocess management at the OS level.
Node.js never had this gap. Java's virtual threads closed it at the JVM level. Python's answer is still evolving.
The fix was simple once I understood the problem: stop fighting the layers and give each task its own clean process.