Narrative

API Integration Framework — Write Error Handling Once

multi-dayhoursintegration time

Every time the team needed to integrate a new external API, someone wrote custom code from scratch. No shared patterns, no consistent error handling, no retry logic. Integrations worked fine in happy-path testing and broke unpredictably in production.

PythonFlaskAPIBackend

What Was Broken

How It Was Built

Designed failure scenarios first, not success cases.

Unified adapter layer
  • Flask-based integration layer acting as a standardized adapter between internal systems and external APIs.
  • 📄 adapter.py
Response normalizer — JSON and XML
  • Handles both JSON and XML response formats and converts them to a standard internal schema.
Failure-first design
  • Every integration answers: what happens when the external API is slow? What when it returns 500? What when it returns malformed response? Those failure paths were designed upfront, not discovered in production.
  • 📄 retry.py

Unified adapter layer

Flask-based integration layer acting as a standardized adapter between internal systems and external APIs. Unified request handler managing authentication, timeout configuration, and logging consistently across all integrations.

adapter.py
python
class APIAdapter:
    def __init__(self, config: APIConfig):
        self.base_url = config.base_url
        self.auth = AuthModule(config.auth_type)
        self.timeout = config.timeout_seconds
        self.retry = RetryModule(
            max_attempts=3,
            backoff='exponential'
        )
    
    def request(self, method, endpoint, **kwargs):
        return self.retry.execute(
            fn=self._make_request,
            method=method,
            endpoint=endpoint,
            headers=self.auth.get_headers(),
            timeout=self.timeout,
            **kwargs
        )
    
    def _make_request(self, **kwargs):
        response = requests.request(**kwargs)
        return self.normalizer.normalize(response)

Response normalizer — JSON and XML

Handles both JSON and XML response formats and converts them to a standard internal schema. Integration logic itself never has to worry about response format parsing.

Failure-first design

Every integration answers: what happens when the external API is slow? What when it returns 500? What when it returns malformed response? Those failure paths were designed upfront, not discovered in production.

retry.py
python
class RetryModule:
    def execute(self, fn, max_attempts=3, **kwargs):
        for attempt in range(max_attempts):
            try:
                return fn(**kwargs)
            except Timeout:
                wait = 2 ** attempt  # exponential
                if attempt < max_attempts - 1:
                    time.sleep(wait)
                    continue
                raise APITimeoutError()
            except HTTPError as e:
                if e.status_code in [429, 503]:
                    # Retryable
                    time.sleep(2 ** attempt)
                    continue
                raise  # Non-retryable, fail fast

What Changed

New API integrations went from multi-day efforts to hours. Reliable from day one because they inherit all error handling from the framework.

Integration Time
multi-day
0
10x faster
Production Reliability
discovered failures in prod
0
reliable by default
"Write error handling once. Every integration inherits it. That is what a framework is for."

Common Questions

Flask gave us a lightweight flexible foundation without too much magic. Since this was an internal integration layer — not a user-facing API — we did not need the overhead of a heavier framework. Flask let us define exactly what we needed and nothing more.
Each integration has an auth module — API key, OAuth, bearer token — configured separately and injected into the request handler. The framework abstracts auth so the integration logic itself does not have to worry about it. Credentials are stored in environment variables, never hardcoded.
A proper circuit breaker pattern. Right now the retry logic is good, but a circuit breaker would automatically stop retrying a failing service after a threshold and let the system degrade gracefully instead of hammering a dead endpoint. I would also add centralized tracing so you can follow a request across multiple integrations.