CircuitBreaker v1.1
Per-resource fault isolation for Roblox. Wraps any external call and stops hammering a failing service, giving it time to recover.
Three-State Automaton
Closed, open, and half-open with automatic transitions
Two Failure Modes
Consecutive threshold and sliding window failure rate
Probe Recovery
Configurable probe count and success rate for safe recovery
Call Timeouts
Race any call against a deadline, count slow calls as failures
Fallbacks
Optional fallback function called on failure or rejection
EventBus Integration
Emit state change events to any EventBus instance
Why CircuitBreaker?
In a Roblox game, external calls fail. DataStores hit rate limits, MessagingService goes down, HTTP requests time out. Without circuit breaking, a failing dependency gets called on every single request, adding latency and burning retries that have no chance of succeeding.
CircuitBreaker wraps any function call and tracks its health. After enough failures it stops calling the function entirely for a cooldown period, then lets a small number of test calls through to check if the service recovered. Every call returns a result table so you always know what happened.
Getting Started
Installation
Drop CircuitBreaker.lua in ReplicatedStorage.
Basic Usage
local CircuitBreaker = require(game.ReplicatedStorage.CircuitBreaker)
local cb = CircuitBreaker.new({
Defaults = {
FailureThreshold = 5, -- open after 5 consecutive failures
ResetTimeout = 30, -- try recovery after 30s
},
})
local result = cb:Execute("datastore", function()
return DataStore:GetAsync(key)
end)
if result.Ok then
print("got value:", result.Value)
elseif result.Rejected then
print("circuit is open, skipping call")
else
print("call failed:", result.Err)
end
With Fallback
local result = cb:Execute("datastore",
function()
return DataStore:GetAsync(key)
end,
function(reason)
-- called on failure OR when circuit is open
warn("DataStore unavailable:", reason)
return defaultValue
end
)
-- result.Value is the fallback return when the call failed or was rejected
print(result.Value)
Circuit States
Every circuit key has one of three states. Transitions happen automatically based on call results and timeouts.
Closed
Normal operation. All calls go through. Failures are tracked. The circuit opens when the failure threshold is crossed.
Open
The circuit is tripped. All calls are rejected immediately without executing the function. After ResetTimeout seconds, the circuit moves to half-open to test recovery.
Half-Open
A limited number of probe calls (ProbeCount) are allowed through. If enough of them succeed (>= ProbeSuccessRate), the circuit closes. If any probe fails, it trips back to open immediately.
Probes beyond the ProbeCount limit are rejected while the existing probes are still running. Once the last probe slot completes, recovery is evaluated and the circuit transitions to closed or open.
print(cb:GetState("datastore")) -- "closed" | "open" | "half_open"
print(cb:IsAvailable("datastore")) -- false when open (and probe slots exhausted)
Failure Modes
Two independent mechanisms can open the circuit. Either one alone is sufficient.
Consecutive Threshold
The simplest mode. After N failures in a row with no successes between them, the circuit opens. This is always active regardless of whether sliding window is configured.
local cb = CircuitBreaker.new({
Defaults = {
FailureThreshold = 3, -- open after 3 consecutive failures
ResetTimeout = 15,
},
})
A single success resets the consecutive counter back to zero. If you have a dependency that flaps - failing then succeeding then failing - use the sliding window instead.
Sliding Window
Tracks a rolling window of the last N calls. When the window is full and the failure rate exceeds FailureRate, the circuit opens. Stale entries can be pruned by TTL with WindowTTLSeconds.
local cb = CircuitBreaker.new({
Defaults = {
FailureThreshold = 999, -- disable consecutive-only mode
WindowSize = 20, -- track last 20 calls
FailureRate = 0.5, -- open at 50% failure rate
WindowTTLSeconds = 60, -- discard calls older than 60s
ResetTimeout = 30,
},
})
FailureThreshold and WindowSize/FailureRate, either condition opening the circuit. A burst of 3 consecutive failures trips it before the window fills. A sustained 50% failure rate trips it when the window fills. Whichever comes first wins.
Recovery / Half-Open
After ResetTimeout seconds in the open state, the circuit moves to half-open and starts accepting probe calls.
local cb = CircuitBreaker.new({
Defaults = {
FailureThreshold = 3,
ResetTimeout = 30, -- wait 30s before probing
ProbeCount = 3, -- allow 3 test calls through
ProbeSuccessRate = 0.67, -- need 2 out of 3 to succeed
},
})
How Probe Evaluation Works
Probe slots are claimed atomically before the call executes, so concurrent calls from multiple task.spawn threads can not all sneak through the ProbeCount gate at the same time.
- Slot claimed before the function runs
- After the last slot completes, success rate is evaluated
- If rate meets threshold: transition to closed
- If rate misses threshold: transition back to open, reset timer
- Any failure during half-open trips back to open immediately
Forcing State
For testing or manual recovery:
cb:ForceState("datastore", "half_open") -- start probing immediately
cb:ForceState("datastore", "closed") -- manually recover
cb:ForceState("datastore", "open") -- manually trip
Call Timeouts
CallTimeout races the function against a deadline. If the function doesn't return in time, the call is counted as a failure and the result has TimedOut = true.
local cb = CircuitBreaker.new({
Defaults = {
CallTimeout = 5, -- 5 second deadline per call
FailureThreshold = 3,
ResetTimeout = 30,
},
})
local result = cb:Execute("slow_service", function()
task.wait(10) -- longer than timeout
return "done"
end)
print(result.TimedOut) -- true
print(result.Err) -- "timeout"
print(result.Ok) -- false
Fallbacks
Pass a fallback function as the third argument to Execute. It is called with the failure reason whenever:
- The function throws an error
- The function times out
- The circuit is open and the call is rejected
local result = cb:Execute("datastore",
function()
return DataStore:GetAsync(key)
end,
function(reason)
-- reason is "circuit open" on rejection,
-- or the actual error string on failure
if reason == "circuit open" then
return cachedValue
end
return defaultValue
end
)
-- result.Value is the fallback return value
-- result.Rejected is true if the circuit was open
If no fallback is provided, result.Value is nil on failure and rejection.
Per-Key Configuration
Each circuit key can have its own threshold settings that override the defaults. Call Configure before the first Execute on that key.
local cb = CircuitBreaker.new({
Defaults = {
FailureThreshold = 10,
ResetTimeout = 60,
},
})
-- "payments" is more sensitive than the default
cb:Configure("payments", {
FailureThreshold = 2,
ResetTimeout = 120,
ProbeCount = 1,
ProbeSuccessRate = 1.0, -- need 100% probe success
})
-- "analytics" is more tolerant
cb:Configure("analytics", {
FailureThreshold = 20,
ResetTimeout = 10,
})
Per-key settings merge with defaults. You only need to specify the fields you want to override.
EventBus Integration
Pass an EventBus instance to the constructor and CircuitBreaker will emit events on every state change.
local EventBus = require(game.ReplicatedStorage.EventBus)
local CircuitBreaker = require(game.ReplicatedStorage.CircuitBreaker)
local bus = EventBus.new()
local cb = CircuitBreaker.new({ EventBus = bus })
bus:On("circuit.**", function(payload, meta)
print(meta.Event, payload.Key)
end)
-- Events emitted:
-- circuit.state_changed { Key, From, To, Time }
-- circuit.opened { Key, Failures }
-- circuit.closed { Key }
-- circuit.probing { Key }
State change events are emitted synchronously inside _transition so subscribers see them before the next line of your code runs. The OnStateChange callback runs in a separate thread via task.spawn since it is arbitrary user code that might yield.
API Reference
CircuitBreaker.new
CircuitBreaker.new(config: CircuitBreakerConfig?): CircuitBreaker
Creates a new CircuitBreaker instance. Config is optional; all fields have defaults.
Execute
cb:Execute(key: string, fn: () -> any, fallback: ((reason: string) -> any)?): CallResult
Runs fn under the circuit for key. Never throws. Returns a CallResult:
| Field | Type | Description |
|---|---|---|
| Ok | boolean | true if the call succeeded |
| Value | any | return value of fn, or fallback return value |
| Err | string? | error message on failure or timeout |
| Rejected | boolean | true if the circuit was open |
| TimedOut | boolean | true if CallTimeout was exceeded |
| Elapsed | number | wall time in seconds for the call |
GetState
cb:GetState(key: string): CircuitState?
Returns the current state for a key, or nil if no circuit exists for it yet.
GetMetrics
cb:GetMetrics(key: string): CircuitMetrics
Returns a metrics table for a key. Creates the circuit if it doesn't exist yet.
| Field | Type | Description |
|---|---|---|
| State | string | Current circuit state |
| TotalCalls | number | Total calls including rejections |
| Successes | number | Total successful calls |
| Failures | number | Total failed calls |
| ConsecutiveFails | number | Current consecutive failure count |
| TotalRejected | number | Calls rejected while circuit was open |
| LastFailure | number? | os.time() of last failure |
| LastSuccess | number? | os.time() of last success |
| OpenedAt | number? | os.clock() when circuit last opened |
GetAll
cb:GetAll(): {[string]: CircuitState}
Returns a table of all known keys and their current states.
IsAvailable
cb:IsAvailable(key: string): boolean
Returns true if a call on this key would be executed right now. False if the circuit is open or all half-open probe slots are taken. Returns true for keys that have no circuit yet.
Configure
cb:Configure(key: string, config: BreakerConfig)
Set per-key config overrides. If called before first Execute, the overrides are merged when the circuit is created. If called after, they are merged into the existing config immediately.
ForceState
cb:ForceState(key: string, state: CircuitState)
Manually set the state of a circuit. Creates it first if it doesn't exist. Fires the normal transition logic including EventBus events and OnStateChange callbacks.
Reset
cb:Reset(key: string)
Clears all counters and sets the circuit back to closed. Metrics are zeroed. Does not fire OnStateChange.
Destroy
cb:Destroy()
Clears all circuits. Any subsequent call to Execute or ForceState will throw.
Configuration
CircuitBreakerConfig (top-level)
| Field | Type | Default | Description |
|---|---|---|---|
| Defaults | BreakerConfig | see below | Default config applied to all circuits |
| EventBus | EventBus? | nil | EventBus instance for state change events |
| MaxCircuits | number | 512 | Max number of keys before new keys throw |
| CircuitTTLSeconds | number? | nil | Evict closed idle circuits after this many seconds |
| OnError | function? | warn | Called with internal error messages |
BreakerConfig (per-circuit)
| Field | Type | Default | Description |
|---|---|---|---|
| FailureThreshold | number | 5 | Consecutive failures before opening |
| ResetTimeout | number | 30 | Seconds to wait before probing after open |
| ProbeCount | number | 3 | Number of probe calls allowed in half-open |
| ProbeSuccessRate | number | 0.6 | Fraction of probes that must succeed to close |
| CallTimeout | number | 10 | Seconds before a call is considered timed out |
| WindowSize | number? | nil | Enable sliding window with this many entries |
| FailureRate | number? | nil | Failure rate threshold for sliding window (0.0 to 1.0) |
| WindowTTLSeconds | number? | nil | Discard window entries older than this |
| OnStateChange | function? | nil | Called as (key, from, to) on any state transition |
| OnRejected | function? | nil | Called as (key) when a call is rejected |
| OnTimeout | function? | nil | Called as (key, elapsed) on call timeout |
Examples
DataStore Wrapper
local cb = CircuitBreaker.new({
Defaults = {
FailureThreshold = 3,
ResetTimeout = 60,
CallTimeout = 5,
ProbeCount = 1,
ProbeSuccessRate = 1.0,
OnStateChange = function(key, from, to)
if to == "open" then
warn("DataStore circuit opened for key:", key)
end
end,
},
})
local function safeGet(key)
return cb:Execute("datastore_get", function()
return DataStore:GetAsync(key)
end, function(reason)
return nil -- return nil on failure, caller can check
end)
end
local result = safeGet("player_123")
if result.Ok then
-- use result.Value
end
Multiple Services
-- one CircuitBreaker instance, multiple keys
local cb = CircuitBreaker.new({
Defaults = { FailureThreshold = 5, ResetTimeout = 30 },
})
cb:Configure("payments", { FailureThreshold = 1, ResetTimeout = 120 })
cb:Configure("analytics", { FailureThreshold = 20, ResetTimeout = 5 })
local paymentResult = cb:Execute("payments", function() ... end)
local analyticsResult = cb:Execute("analytics", function() ... end)
local datastoreResult = cb:Execute("datastore", function() ... end) -- uses defaults
Dashboard / Health Check
local all = cb:GetAll()
for key, state in pairs(all) do
local m = cb:GetMetrics(key)
print(string.format(
"[%s] %s | calls: %d | failures: %d | rejected: %d",
key, state, m.TotalCalls, m.Failures, m.TotalRejected
))
end
Troubleshooting
Circuit opens too fast
Lower FailureThreshold is more aggressive. If you are seeing the circuit trip on normal transient errors, raise the threshold or add a sliding window so a single bad request doesn't immediately trip.
Circuit won't recover
Check that your probe call is actually succeeding. If ProbeSuccessRate is set to 1.0 and the service is still flaky, probes will keep failing and the circuit will stay in the open/half-open cycle. Lower the rate or increase ProbeCount to get a better sample.
Timeouts are not firing
Make sure CallTimeout is set in the config. The default is 10 seconds. If your function completes before the deadline via normal Roblox yield resolution, no timeout is declared even if it was slow.
Calls still going through when circuit is open
Check IsAvailable before calling if you need to gate other logic. Execute does enforce the open state, but the check only happens at call time. If the circuit opened while a call was already running, that call completes normally.
Getting help
- Check
GetMetricsfor counts - Use
OnStateChangeto log transitions - Use EventBus integration for a complete event stream
- Contact @chi0sk on Discord