Documentation
¶
Overview ¶
This sample demonstrates composing the Retry and Fallback middlewares in a single Generate call to build a resilient model pipeline.
The composition order passed to ai.WithUse is outer-to-inner: the first middleware wraps the second, which wraps the actual model call. Here:
ai.WithUse(&middleware.Retry{...}, &middleware.Fallback{...})
expands to Retry { Fallback { model } } at call time:
- The primary model is invoked first.
- If it fails with a fallback-eligible status (UNAVAILABLE, NOT_FOUND, DEADLINE_EXCEEDED, INTERNAL, ...), Fallback forwards the request to the next model in its list and keeps trying until one succeeds or the list is exhausted.
- If the whole Fallback cascade still returns a retryable error, Retry sleeps with exponential backoff and runs the cascade again, up to MaxRetries times.
To make the fallback path visibly fire on a fresh run, the primary model is deliberately set to a non-existent model id — Google AI returns a NOT_FOUND, Fallback catches it, and the real model answers. If you switch the primary to a valid model, the sample still works; Fallback simply never triggers.
To run:
go run .
In another terminal, trigger the flow (the response will be produced by the fallback model since the primary is intentionally invalid):
curl -X POST http://localhost:8080/resilientFlow \
-H "Content-Type: application/json" \
-d '{"data": "quantum computing"}'