chi0sk
github.com/chi0sk

CircuitBreaker v1.1

Per-resource fault isolation for Roblox. Wraps any external call and stops hammering a failing service, giving it time to recover.

Three-State Automaton

Closed, open, and half-open with automatic transitions

Two Failure Modes

Consecutive threshold and sliding window failure rate

Probe Recovery

Configurable probe count and success rate for safe recovery

Call Timeouts

Race any call against a deadline, count slow calls as failures

Fallbacks

Optional fallback function called on failure or rejection

EventBus Integration

Emit state change events to any EventBus instance

Why CircuitBreaker?

In a Roblox game, external calls fail. DataStores hit rate limits, MessagingService goes down, HTTP requests time out. Without circuit breaking, a failing dependency gets called on every single request, adding latency and burning retries that have no chance of succeeding.

CircuitBreaker wraps any function call and tracks its health. After enough failures it stops calling the function entirely for a cooldown period, then lets a small number of test calls through to check if the service recovered. Every call returns a result table so you always know what happened.

Zero dependencies. CircuitBreaker is self-contained. The EventBus integration is optional and only activated if you pass an EventBus instance in config.

Getting Started

Installation

Drop CircuitBreaker.lua in ReplicatedStorage.

Basic Usage

local CircuitBreaker = require(game.ReplicatedStorage.CircuitBreaker)

local cb = CircuitBreaker.new({
    Defaults = {
        FailureThreshold = 5,   -- open after 5 consecutive failures
        ResetTimeout     = 30,  -- try recovery after 30s
    },
})

local result = cb:Execute("datastore", function()
    return DataStore:GetAsync(key)
end)

if result.Ok then
    print("got value:", result.Value)
elseif result.Rejected then
    print("circuit is open, skipping call")
else
    print("call failed:", result.Err)
end

With Fallback

local result = cb:Execute("datastore",
    function()
        return DataStore:GetAsync(key)
    end,
    function(reason)
        -- called on failure OR when circuit is open
        warn("DataStore unavailable:", reason)
        return defaultValue
    end
)

-- result.Value is the fallback return when the call failed or was rejected
print(result.Value)

Circuit States

Every circuit key has one of three states. Transitions happen automatically based on call results and timeouts.

Closed

Normal operation. All calls go through. Failures are tracked. The circuit opens when the failure threshold is crossed.

Open

The circuit is tripped. All calls are rejected immediately without executing the function. After ResetTimeout seconds, the circuit moves to half-open to test recovery.

Half-Open

A limited number of probe calls (ProbeCount) are allowed through. If enough of them succeed (>= ProbeSuccessRate), the circuit closes. If any probe fails, it trips back to open immediately.

Probes beyond the ProbeCount limit are rejected while the existing probes are still running. Once the last probe slot completes, recovery is evaluated and the circuit transitions to closed or open.

print(cb:GetState("datastore"))   -- "closed" | "open" | "half_open"
print(cb:IsAvailable("datastore")) -- false when open (and probe slots exhausted)

Failure Modes

Two independent mechanisms can open the circuit. Either one alone is sufficient.

Consecutive Threshold

The simplest mode. After N failures in a row with no successes between them, the circuit opens. This is always active regardless of whether sliding window is configured.

local cb = CircuitBreaker.new({
    Defaults = {
        FailureThreshold = 3, -- open after 3 consecutive failures
        ResetTimeout     = 15,
    },
})

A single success resets the consecutive counter back to zero. If you have a dependency that flaps - failing then succeeding then failing - use the sliding window instead.

Sliding Window

Tracks a rolling window of the last N calls. When the window is full and the failure rate exceeds FailureRate, the circuit opens. Stale entries can be pruned by TTL with WindowTTLSeconds.

local cb = CircuitBreaker.new({
    Defaults = {
        FailureThreshold = 999,        -- disable consecutive-only mode
        WindowSize       = 20,         -- track last 20 calls
        FailureRate      = 0.5,        -- open at 50% failure rate
        WindowTTLSeconds = 60,         -- discard calls older than 60s
        ResetTimeout     = 30,
    },
})
Combining both modes: If you set both FailureThreshold and WindowSize/FailureRate, either condition opening the circuit. A burst of 3 consecutive failures trips it before the window fills. A sustained 50% failure rate trips it when the window fills. Whichever comes first wins.

Recovery / Half-Open

After ResetTimeout seconds in the open state, the circuit moves to half-open and starts accepting probe calls.

local cb = CircuitBreaker.new({
    Defaults = {
        FailureThreshold = 3,
        ResetTimeout     = 30,   -- wait 30s before probing
        ProbeCount       = 3,    -- allow 3 test calls through
        ProbeSuccessRate = 0.67, -- need 2 out of 3 to succeed
    },
})

How Probe Evaluation Works

Probe slots are claimed atomically before the call executes, so concurrent calls from multiple task.spawn threads can not all sneak through the ProbeCount gate at the same time.

  • Slot claimed before the function runs
  • After the last slot completes, success rate is evaluated
  • If rate meets threshold: transition to closed
  • If rate misses threshold: transition back to open, reset timer
  • Any failure during half-open trips back to open immediately

Forcing State

For testing or manual recovery:

cb:ForceState("datastore", "half_open") -- start probing immediately
cb:ForceState("datastore", "closed")    -- manually recover
cb:ForceState("datastore", "open")      -- manually trip

Call Timeouts

CallTimeout races the function against a deadline. If the function doesn't return in time, the call is counted as a failure and the result has TimedOut = true.

local cb = CircuitBreaker.new({
    Defaults = {
        CallTimeout      = 5,  -- 5 second deadline per call
        FailureThreshold = 3,
        ResetTimeout     = 30,
    },
})

local result = cb:Execute("slow_service", function()
    task.wait(10) -- longer than timeout
    return "done"
end)

print(result.TimedOut) -- true
print(result.Err)      -- "timeout"
print(result.Ok)       -- false
Roblox limitation: Timed-out coroutines can not be cancelled in Roblox. The underlying thread keeps running after the timeout is declared. This is an unavoidable platform constraint - the result is just discarded.

Fallbacks

Pass a fallback function as the third argument to Execute. It is called with the failure reason whenever:

  • The function throws an error
  • The function times out
  • The circuit is open and the call is rejected
local result = cb:Execute("datastore",
    function()
        return DataStore:GetAsync(key)
    end,
    function(reason)
        -- reason is "circuit open" on rejection,
        -- or the actual error string on failure
        if reason == "circuit open" then
            return cachedValue
        end
        return defaultValue
    end
)

-- result.Value is the fallback return value
-- result.Rejected is true if the circuit was open

If no fallback is provided, result.Value is nil on failure and rejection.

Per-Key Configuration

Each circuit key can have its own threshold settings that override the defaults. Call Configure before the first Execute on that key.

local cb = CircuitBreaker.new({
    Defaults = {
        FailureThreshold = 10,
        ResetTimeout     = 60,
    },
})

-- "payments" is more sensitive than the default
cb:Configure("payments", {
    FailureThreshold = 2,
    ResetTimeout     = 120,
    ProbeCount       = 1,
    ProbeSuccessRate = 1.0, -- need 100% probe success
})

-- "analytics" is more tolerant
cb:Configure("analytics", {
    FailureThreshold = 20,
    ResetTimeout     = 10,
})

Per-key settings merge with defaults. You only need to specify the fields you want to override.

EventBus Integration

Pass an EventBus instance to the constructor and CircuitBreaker will emit events on every state change.

local EventBus       = require(game.ReplicatedStorage.EventBus)
local CircuitBreaker = require(game.ReplicatedStorage.CircuitBreaker)

local bus = EventBus.new()
local cb  = CircuitBreaker.new({ EventBus = bus })

bus:On("circuit.**", function(payload, meta)
    print(meta.Event, payload.Key)
end)

-- Events emitted:
--   circuit.state_changed  { Key, From, To, Time }
--   circuit.opened         { Key, Failures }
--   circuit.closed         { Key }
--   circuit.probing        { Key }

State change events are emitted synchronously inside _transition so subscribers see them before the next line of your code runs. The OnStateChange callback runs in a separate thread via task.spawn since it is arbitrary user code that might yield.

API Reference

CircuitBreaker.new

CircuitBreaker.new(config: CircuitBreakerConfig?): CircuitBreaker

Creates a new CircuitBreaker instance. Config is optional; all fields have defaults.

Execute

cb:Execute(key: string, fn: () -> any, fallback: ((reason: string) -> any)?): CallResult

Runs fn under the circuit for key. Never throws. Returns a CallResult:

FieldTypeDescription
Okbooleantrue if the call succeeded
Valueanyreturn value of fn, or fallback return value
Errstring?error message on failure or timeout
Rejectedbooleantrue if the circuit was open
TimedOutbooleantrue if CallTimeout was exceeded
Elapsednumberwall time in seconds for the call

GetState

cb:GetState(key: string): CircuitState?

Returns the current state for a key, or nil if no circuit exists for it yet.

GetMetrics

cb:GetMetrics(key: string): CircuitMetrics

Returns a metrics table for a key. Creates the circuit if it doesn't exist yet.

FieldTypeDescription
StatestringCurrent circuit state
TotalCallsnumberTotal calls including rejections
SuccessesnumberTotal successful calls
FailuresnumberTotal failed calls
ConsecutiveFailsnumberCurrent consecutive failure count
TotalRejectednumberCalls rejected while circuit was open
LastFailurenumber?os.time() of last failure
LastSuccessnumber?os.time() of last success
OpenedAtnumber?os.clock() when circuit last opened

GetAll

cb:GetAll(): {[string]: CircuitState}

Returns a table of all known keys and their current states.

IsAvailable

cb:IsAvailable(key: string): boolean

Returns true if a call on this key would be executed right now. False if the circuit is open or all half-open probe slots are taken. Returns true for keys that have no circuit yet.

Configure

cb:Configure(key: string, config: BreakerConfig)

Set per-key config overrides. If called before first Execute, the overrides are merged when the circuit is created. If called after, they are merged into the existing config immediately.

ForceState

cb:ForceState(key: string, state: CircuitState)

Manually set the state of a circuit. Creates it first if it doesn't exist. Fires the normal transition logic including EventBus events and OnStateChange callbacks.

Reset

cb:Reset(key: string)

Clears all counters and sets the circuit back to closed. Metrics are zeroed. Does not fire OnStateChange.

Destroy

cb:Destroy()

Clears all circuits. Any subsequent call to Execute or ForceState will throw.

Configuration

CircuitBreakerConfig (top-level)

FieldTypeDefaultDescription
DefaultsBreakerConfigsee belowDefault config applied to all circuits
EventBusEventBus?nilEventBus instance for state change events
MaxCircuitsnumber512Max number of keys before new keys throw
CircuitTTLSecondsnumber?nilEvict closed idle circuits after this many seconds
OnErrorfunction?warnCalled with internal error messages

BreakerConfig (per-circuit)

FieldTypeDefaultDescription
FailureThresholdnumber5Consecutive failures before opening
ResetTimeoutnumber30Seconds to wait before probing after open
ProbeCountnumber3Number of probe calls allowed in half-open
ProbeSuccessRatenumber0.6Fraction of probes that must succeed to close
CallTimeoutnumber10Seconds before a call is considered timed out
WindowSizenumber?nilEnable sliding window with this many entries
FailureRatenumber?nilFailure rate threshold for sliding window (0.0 to 1.0)
WindowTTLSecondsnumber?nilDiscard window entries older than this
OnStateChangefunction?nilCalled as (key, from, to) on any state transition
OnRejectedfunction?nilCalled as (key) when a call is rejected
OnTimeoutfunction?nilCalled as (key, elapsed) on call timeout

Examples

DataStore Wrapper

local cb = CircuitBreaker.new({
    Defaults = {
        FailureThreshold = 3,
        ResetTimeout     = 60,
        CallTimeout      = 5,
        ProbeCount       = 1,
        ProbeSuccessRate = 1.0,
        OnStateChange    = function(key, from, to)
            if to == "open" then
                warn("DataStore circuit opened for key:", key)
            end
        end,
    },
})

local function safeGet(key)
    return cb:Execute("datastore_get", function()
        return DataStore:GetAsync(key)
    end, function(reason)
        return nil -- return nil on failure, caller can check
    end)
end

local result = safeGet("player_123")
if result.Ok then
    -- use result.Value
end

Multiple Services

-- one CircuitBreaker instance, multiple keys
local cb = CircuitBreaker.new({
    Defaults = { FailureThreshold = 5, ResetTimeout = 30 },
})

cb:Configure("payments", { FailureThreshold = 1, ResetTimeout = 120 })
cb:Configure("analytics", { FailureThreshold = 20, ResetTimeout = 5 })

local paymentResult   = cb:Execute("payments",   function() ... end)
local analyticsResult = cb:Execute("analytics",  function() ... end)
local datastoreResult = cb:Execute("datastore",  function() ... end) -- uses defaults

Dashboard / Health Check

local all = cb:GetAll()
for key, state in pairs(all) do
    local m = cb:GetMetrics(key)
    print(string.format(
        "[%s] %s | calls: %d | failures: %d | rejected: %d",
        key, state, m.TotalCalls, m.Failures, m.TotalRejected
    ))
end

Troubleshooting

Circuit opens too fast

Lower FailureThreshold is more aggressive. If you are seeing the circuit trip on normal transient errors, raise the threshold or add a sliding window so a single bad request doesn't immediately trip.

Circuit won't recover

Check that your probe call is actually succeeding. If ProbeSuccessRate is set to 1.0 and the service is still flaky, probes will keep failing and the circuit will stay in the open/half-open cycle. Lower the rate or increase ProbeCount to get a better sample.

Timeouts are not firing

Make sure CallTimeout is set in the config. The default is 10 seconds. If your function completes before the deadline via normal Roblox yield resolution, no timeout is declared even if it was slow.

Calls still going through when circuit is open

Check IsAvailable before calling if you need to gate other logic. Execute does enforce the open state, but the check only happens at call time. If the circuit opened while a call was already running, that call completes normally.

Getting help

  • Check GetMetrics for counts
  • Use OnStateChange to log transitions
  • Use EventBus integration for a complete event stream
  • Contact @chi0sk on Discord