chi0sk
github.com/chi0sk

TaskScheduler v2.0

Production-ready distributed task scheduling for Roblox with dual execution modes, priority queues, and automatic recovery.

Dual Execution Modes

At-least-once (fast) OR exactly-once (guaranteed)

Priority Queues

Critical, high, normal, and low priority execution

Auto Retry

Exponential backoff with configurable retry limits

Auto-Scaling

Adaptive rate limiting for 100+ servers

Work Stealing

Automatic cross-server load balancing

Persistence

Survive server crashes with DataStore recovery

Why TaskScheduler?

TaskScheduler provides production-ready background job processing:

  • Flexibility: Choose at-least-once OR exactly-once per task
  • Scalability: Server-level coordination for 100+ servers
  • Reliability: Automatic retries, exponential backoff, and crash recovery
  • Performance: Adaptive rate limiting prevents MessagingService overload
  • Observability: Comprehensive statistics, hooks, and event logging
  • Developer Experience: Simple API, configuration presets, and dependency tracking
Execution Modes: TaskScheduler supports both at-least-once (default, high performance) and exactly-once (optional, for critical tasks). Choose the right mode for each task type. Same semantics as AWS SQS, Apache Kafka, and RabbitMQ.

What's New in v2.0

Exactly-Once Execution Mode

Optional distributed locking for critical tasks like payments and economy transactions.

scheduler:SubmitTask({
    Name = "process_payment",
    ExactlyOnce = true, -- NEW!
    Payload = { amount = 1000 },
})

Server-Level Heartbeats

Reduced MessagingService usage by 80-90%. Now supports 100+ servers (was: 12-15 servers).

  • One heartbeat per server (not per worker)
  • Adaptive rate limiting with automatic backoff
  • Smart work stealing based on server health

Configuration Presets

One-line setup for any game size.

-- Small game (10-100 CCU)
local scheduler = TaskScheduler.newWithPreset("SMALL")

-- Large game (1000+ CCU, 20+ servers)
local scheduler = TaskScheduler.newWithPreset("LARGE")

Breaking Changes

None! v2.0 is 100% backward compatible. Existing code works as-is.

Getting Started

Installation

Place TaskScheduler.lua in ReplicatedStorage.

Quick Start (Small Game)

local TaskScheduler = require(game.ReplicatedStorage.TaskScheduler)

-- Use preset for easy setup
local scheduler, preset = TaskScheduler.newWithPreset("SMALL", {
    PersistTasks = true,
    ObservabilityEnabled = true,
})

-- Create worker
local workerId = scheduler:CreateWorker({
    Handlers = {
        ["send_notification"] = function(task)
            local userId = task.Payload.userId
            local message = task.Payload.message
            
            print("Sending notification to", userId)
            -- Send logic here
            
            return true -- success
        end,
    },
})

-- Submit task
local taskId = scheduler:SubmitTask({
    Name = "send_notification",
    Priority = "high",
    Payload = {
        userId = 123,
        message = "Welcome!",
    },
})

-- Check status
local task = scheduler:GetTask(taskId)
print("Status:", task.Status)

Quick Start (Large Game - 1000+ CCU)

-- Optimized for 30+ servers
local scheduler, preset = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true, -- for critical tasks
})

-- Create fewer workers with higher concurrency
scheduler:CreateWorker({
    MaxConcurrent = 15, -- higher than small games
    Handlers = {
        ["process_payment"] = function(task)
            -- Critical handler
            return true
        end,
        ["log_analytics"] = function(task)
            -- Non-critical handler
            return true
        end,
    },
})

-- Fast tasks: standard mode
scheduler:SubmitTask({
    Name = "log_analytics",
    Payload = { event = "player_joined" },
})

-- Critical tasks: exactly-once mode
scheduler:SubmitTask({
    Name = "process_payment",
    ExactlyOnce = true, -- guaranteed execution
    Payload = { amount = 1000 },
})

Execution Modes

TaskScheduler supports two execution modes. Choose the right one for each task type.

Standard Mode (Default)

scheduler:SubmitTask({
    Name = "send_notification",
    Priority = "normal",
    Payload = { ... },
    -- ExactlyOnce defaults to false
})

Characteristics

  • At-least-once delivery (same as AWS SQS, Kafka, RabbitMQ)
  • Zero coordination overhead
  • Optimal performance for high-throughput workloads
  • Tasks may execute multiple times under failure scenarios
  • Handlers should be idempotent (industry best practice)

Best For

  • Sending notifications (duplicate is harmless)
  • Logging analytics (idempotent increment)
  • Cache invalidation (safe to repeat)
  • Background processing (95% of tasks)

Performance

  • Latency: ~5-10ms
  • Throughput: 1000s of tasks per second
  • Overhead: None

Exactly-Once Mode (Optional)

scheduler:SubmitTask({
    Name = "process_payment",
    Priority = "critical",
    ExactlyOnce = true, -- Enable distributed locking
    Payload = { transactionId = "txn_123", amount = 1000 },
})

Characteristics

  • Exactly-once execution guarantee via distributed locks
  • Prevents duplicate processing across all servers
  • Automatic lock acquisition and release
  • Built-in deadlock prevention with TTL
  • Slight latency overhead (~50-100ms)

How It Works

  1. Worker acquires DataStore-based lock before execution
  2. If lock is held by another server, task is skipped
  3. Lock auto-extends during execution
  4. Lock auto-releases on completion or timeout

Best For

  • Processing payments (can't double-charge)
  • Economy transactions (can't duplicate currency)
  • Badge awards (want true deduplication)
  • Critical operations (5% of tasks)

Performance

  • Latency: ~50-100ms (lock overhead)
  • Throughput: Same as standard mode
  • Overhead: 2-3 DataStore operations per task
Cost vs Benefit: Use exactly-once sparingly. Most tasks (95%) don't need it. Idempotent handlers in standard mode are often sufficient and much faster.

When to Use Each Mode

Task Type Mode Reasoning
Send notification Standard Duplicate notification is harmless
Log analytics event Standard Idempotent increment is fine
Invalidate cache Standard Running twice is safe
Award badge Standard* Use idempotency check in handler
Process payment Exactly-Once Cannot double-charge users
Grant currency Exactly-Once Cannot duplicate economy
Critical transaction Exactly-Once Requires guarantee

* For badge awards, you can use either mode with proper idempotency checks

Scaling Guide

TaskScheduler auto-scales with configuration presets. Choose based on your player count and server count.

Small Games (10-100 CCU, 1-5 servers)

local scheduler, preset = TaskScheduler.newWithPreset("SMALL", {
    PersistTasks = true,
})

Configuration

  • Workers per server: 3
  • Max concurrent per worker: 5
  • Heartbeat interval: 5 seconds
  • Poll interval: 0.5 seconds

Messaging Load

10 servers × 1 heartbeat every 5s = 2 publishes/sec

Medium Games (100-1000 CCU, 5-20 servers)

local scheduler, preset = TaskScheduler.newWithPreset("MEDIUM", {
    PersistTasks = true,
    ObservabilityEnabled = true,
})

Configuration

  • Workers per server: 5
  • Max concurrent per worker: 8
  • Heartbeat interval: 10 seconds
  • Poll interval: 0.5 seconds

Messaging Load

20 servers × 1 heartbeat every 10s = 2 publishes/sec

Large Games (1000+ CCU, 20+ servers)

local scheduler, preset = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true, -- for critical tasks
})

Configuration

  • Workers per server: 2-3 (fewer workers, higher concurrency each)
  • Max concurrent per worker: 15
  • Heartbeat interval: 15 seconds
  • Poll interval: 1 second
  • Adaptive rate limiting: Enabled

Messaging Load

50 servers × 1 heartbeat every 15s = 3.3 publishes/sec

100 servers × 1 heartbeat every 15s (with backoff) = ~4 publishes/sec

MessagingService Limit: ~2.5 publishes/sec per topic. TaskScheduler stays well under limits at all scales with adaptive backoff.

Scaling Comparison

Preset Servers Workers/Server Heartbeat Messaging Load Status
SMALL 1-5 3 5s 2/sec Good
MEDIUM 5-20 5 10s 2/sec Good
LARGE 20-100+ 2-3 15s 3-4/sec Good
v1.0 (old) 12-15 5 5s/worker 20/sec Over Limit

Core Concepts

Tasks

A task is a unit of work with a name, payload, and configuration. Tasks move through states:

  • pending: Queued, waiting for worker
  • running: Currently executing
  • completed: Successfully finished
  • failed: Failed but will retry
  • dead: Permanently failed (moved to dead letter queue)

Workers

Workers pull tasks from the queue and execute handler functions. Each worker:

  • Runs up to MaxConcurrent tasks simultaneously
  • Polls the queue at PollInterval
  • Requests work stealing when underutilized
  • Tracked by server-level heartbeat

Priority Levels

Tasks are executed in priority order:

  1. critical: Urgent tasks (executed first)
  2. high: Important tasks
  3. normal: Default priority
  4. low: Background tasks

API Reference

TaskScheduler.new

TaskScheduler.new(config: SchedulerConfig): TaskScheduler

Creates a new task scheduler instance with custom configuration.

Config Options

Field Type Default Description
PersistTasks boolean false Enable DataStore persistence and recovery
ObservabilityEnabled boolean false Enable detailed event logging
EnableExactlyOnce boolean false NEW Enable exactly-once execution mode
DeadLetterRetention number 86400 Seconds to keep dead tasks (24 hours)
CleanupInterval number 300 Seconds between cleanup runs
PersistenceBatchSize number 10 Batch DataStore writes
PersistenceBatchTimeout number 2 Flush persistence every N seconds

TaskScheduler.newWithPreset

TaskScheduler.newWithPreset(preset: string, config: SchedulerConfig?): (TaskScheduler, PresetConfig)

NEW v2.0 Creates a scheduler with preset configuration for easy scaling.

Presets

  • "SMALL" - Small games (10-100 CCU, 1-5 servers)
  • "MEDIUM" - Medium games (100-1000 CCU, 5-20 servers)
  • "LARGE" - Large games (1000+ CCU, 20+ servers)
  • "SINGLE_SERVER" - Testing / single server (disables messaging)
local scheduler, preset = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    EnableExactlyOnce = true,
})

print("Workers per server:", preset.workers)
print("Max concurrent:", preset.maxConcurrent)

CreateWorker

scheduler:CreateWorker(config: WorkerConfig): string

Creates a worker that executes tasks.

Config Options

Field Type Default Description
MaxConcurrent number 5 Max tasks running simultaneously
Handlers table required Map of task names to handler functions
PollInterval number 0.5 Seconds between queue polls
StealThreshold number 10 Request work stealing if queue below this

SubmitTask

scheduler:SubmitTask(config: TaskConfig): string

Submits a task to the queue.

Config Options

Field Type Default Description
Name string required Handler name to execute
Priority string "normal" critical, high, normal, or low
ExactlyOnce boolean false NEW Enable distributed locking
Payload table {} Data passed to handler
MaxRetries number 3 Retry attempts before dead
RetryDelay number 5 Base delay (exponential: 5s, 10s, 20s...)
Timeout number 60 Seconds before timeout
DependsOn {string} {} Array of task IDs that must complete first
ScheduledFor number nil Unix timestamp to delay execution
LocalIdempotencyKey string nil Deduplicate on same server only

GetTask

scheduler:GetTask(taskId: string): Task?

Retrieves a task by ID. Returns nil if not found.

CancelTask

scheduler:CancelTask(taskId: string): boolean

Cancels a pending task. Returns false if task is running or complete.

GetStats

scheduler:GetStats(): table

Returns scheduler statistics.

Configuration

Recommended Configurations by Scale

Small Game

local scheduler = TaskScheduler.newWithPreset("SMALL", {
    PersistTasks = true,
})

Medium Game

local scheduler = TaskScheduler.newWithPreset("MEDIUM", {
    PersistTasks = true,
    ObservabilityEnabled = true,
})

Large Game

local scheduler = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true,
})

Custom Configuration

local scheduler = TaskScheduler.new({
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true,
    DeadLetterRetention = 172800, -- 48 hours
    CleanupInterval = 600, -- 10 minutes
    PersistenceBatchSize = 20,
})

Best Practices

1. Write Idempotent Handlers

Even with exactly-once mode, handlers should be idempotent:

-- GOOD: Idempotent handler
["award_badge"] = function(task)
    local userId = task.Payload.userId
    local badgeId = task.Payload.badgeId
    
    -- Check if already awarded
    local key = userId .. "_" .. badgeId
    if DataStore:GetAsync(key) then
        return true -- already done
    end
    
    -- Award badge
    BadgeService:AwardBadge(userId, badgeId)
    DataStore:SetAsync(key, true)
    
    return true
end

2. Use Exactly-Once Sparingly

-- Standard mode for most tasks (95%)
scheduler:SubmitTask({
    Name = "send_notification",
    Payload = { ... },
})

-- Exactly-once only for critical tasks (5%)
scheduler:SubmitTask({
    Name = "process_payment",
    ExactlyOnce = true,
    Payload = { ... },
})

3. Choose the Right Preset

-- Small game
local scheduler = TaskScheduler.newWithPreset("SMALL")

-- Large game (don't use SMALL preset!)
local scheduler = TaskScheduler.newWithPreset("LARGE")

4. Monitor Statistics

task.spawn(function()
    while true do
        task.wait(60)
        local stats = scheduler:GetStats()
        
        if stats.dead > 10 then
            warn("High dead letter count:", stats.dead)
        end
        
        if stats.pending > 500 then
            warn("Queue backlog detected:", stats.pending)
        end
    end
end)

Examples

Mixed Execution Modes

local scheduler = TaskScheduler.newWithPreset("MEDIUM", {
    EnableExactlyOnce = true,
})

scheduler:CreateWorker({
    Handlers = {
        ["log_analytics"] = function(task)
            -- Non-critical: standard mode is fine
            AnalyticsStore:IncrementAsync("events", 1)
            return true
        end,
        
        ["grant_currency"] = function(task)
            -- Critical: ensure idempotency
            local userId = task.Payload.userId
            local amount = task.Payload.amount
            
            -- Check if already granted
            local key = "currency_" .. task.Id
            if DataStore:GetAsync(key) then
                return true
            end
            
            -- Grant currency
            local profile = ProfileStore:LoadProfileAsync(userId)
            profile.Data.Currency += amount
            DataStore:SetAsync(key, true)
            
            return true
        end,
    },
})

-- Fast: standard mode
scheduler:SubmitTask({
    Name = "log_analytics",
    Payload = { event = "player_joined" },
})

-- Guaranteed: exactly-once mode
scheduler:SubmitTask({
    Name = "grant_currency",
    ExactlyOnce = true,
    Payload = { userId = 123, amount = 100 },
})

Dependency Chains

local task1 = scheduler:SubmitTask({
    Name = "fetch_data",
})

local task2 = scheduler:SubmitTask({
    Name = "process_data",
    DependsOn = {task1},
})

local task3 = scheduler:SubmitTask({
    Name = "save_results",
    DependsOn = {task2},
    ExactlyOnce = true, -- ensure final save is guaranteed
})

Scheduled Tasks

-- Daily reset at midnight
local midnight = os.time({
    year = 2026,
    month = 2,
    day = 17,
    hour = 0,
})

scheduler:SubmitTask({
    Name = "daily_reset",
    ScheduledFor = midnight,
    ExactlyOnce = true, -- ensure it only runs once
})

Performance

Throughput

Mode Latency Throughput Overhead
Standard 5-10ms 1000s/sec None
Exactly-Once 50-100ms Same 2-3 DataStore ops

Messaging Load by Scale

Scale Servers Publishes/sec vs Limit (2.5/sec)
SMALL 10 2 80%
MEDIUM 20 2 80%
LARGE 50 3.3 132% (backoff kicks in)
LARGE (backoff) 100 ~4 160% (further backoff)

Adaptive backoff automatically adjusts intervals to stay under limits

v1.0 vs v2.0

Metric v1.0 v2.0 Improvement
Max servers 12-15 100+ 600%+
Messaging load 20/sec 3-4/sec 80% reduction
Execution modes 1 2 Exactly-once added
Setup complexity Manual Presets One-line

Troubleshooting

Tasks Not Executing

Possible causes:

  • No worker with matching handler
  • All workers at MaxConcurrent limit
  • Task has unmet dependencies
  • Task is scheduled for future
  • Exactly-once lock is held by another server

High Dead Letter Count

-- Inspect dead letters
for taskId, task in pairs(scheduler.DeadLetterQueue) do
    print("Failed:", task.Name)
    print("Error:", task.LastError)
    print("Attempts:", task.Attempts)
    print("ExactlyOnce:", task.ExactlyOnce)
end

MessagingService Rate Limits

If you see RateLimitBackoff events frequently:

  • Use a larger preset (MEDIUM LARGE)
  • Reduce worker count
  • Adaptive backoff will handle it automatically

Exactly-Once Tasks Skipped

Check for TaskLockFailed events in observability logs. This means another server is already executing the task (working as intended).

Getting Help

  • Enable ObservabilityEnabled for detailed logs
  • Check task Status and LastError fields
  • Monitor GetStats() output
  • Contact @chi0sk on Discord

TaskScheduler v2.0 is created and maintained by sam (@chi0sk)

Documentation last updated: February 2026

GitHub | License (GPL-3.0)