During exam windows, the submission endpoint was buckling. Under sustained concurrent load, it would slow down to 8 seconds per submission — which in an exam context is a real problem. Students would retry, making it worse.
I profiled the request path under load. The issue was clear — every submission was doing a synchronous write to PostgreSQL, holding the connection open until the write confirmed. Under high concurrency this created a queue that compounded quickly. I reworked the flow to use batched async inserts — submissions would acknowledge immediately, and the actual DB write happened asynchronously in controlled batches. I also added a lightweight queue to buffer the burst.
Submission response time dropped from 8 seconds to under 1 second. No more retries, no more cascading slowdowns. Exams ran cleanly from that point on.