TaskScheduler v2.0

Production-ready distributed task scheduling for Roblox with dual execution modes, priority queues, and automatic recovery.

Dual Execution Modes

At-least-once (fast) OR exactly-once (guaranteed)

Priority Queues

Critical, high, normal, and low priority execution

Auto Retry

Exponential backoff with configurable retry limits

Auto-Scaling

Adaptive rate limiting for 100+ servers

Work Stealing

Automatic cross-server load balancing

Persistence

Survive server crashes with DataStore recovery

Why TaskScheduler?

TaskScheduler provides production-ready background job processing:

Flexibility: Choose at-least-once OR exactly-once per task
Scalability: Server-level coordination for 100+ servers
Reliability: Automatic retries, exponential backoff, and crash recovery
Performance: Adaptive rate limiting prevents MessagingService overload
Observability: Comprehensive statistics, hooks, and event logging
Developer Experience: Simple API, configuration presets, and dependency tracking

Execution Modes: TaskScheduler supports both at-least-once (default, high performance) and exactly-once (optional, for critical tasks). Choose the right mode for each task type. Same semantics as AWS SQS, Apache Kafka, and RabbitMQ.

What's New in v2.0

Exactly-Once Execution Mode

Optional distributed locking for critical tasks like payments and economy transactions.

scheduler:SubmitTask({
    Name = "process_payment",
    ExactlyOnce = true, -- NEW!
    Payload = { amount = 1000 },
})

Server-Level Heartbeats

Reduced MessagingService usage by 80-90%. Now supports 100+ servers (was: 12-15 servers).

One heartbeat per server (not per worker)
Adaptive rate limiting with automatic backoff
Smart work stealing based on server health

Configuration Presets

One-line setup for any game size.

-- Small game (10-100 CCU)
local scheduler = TaskScheduler.newWithPreset("SMALL")

-- Large game (1000+ CCU, 20+ servers)
local scheduler = TaskScheduler.newWithPreset("LARGE")

Breaking Changes

None! v2.0 is 100% backward compatible. Existing code works as-is.

Getting Started

Installation

Place TaskScheduler.lua in ReplicatedStorage.

Quick Start (Small Game)

local TaskScheduler = require(game.ReplicatedStorage.TaskScheduler)

-- Use preset for easy setup
local scheduler, preset = TaskScheduler.newWithPreset("SMALL", {
    PersistTasks = true,
    ObservabilityEnabled = true,
})

-- Create worker
local workerId = scheduler:CreateWorker({
    Handlers = {
        ["send_notification"] = function(task)
            local userId = task.Payload.userId
            local message = task.Payload.message
            
            print("Sending notification to", userId)
            -- Send logic here
            
            return true -- success
        end,
    },
})

-- Submit task
local taskId = scheduler:SubmitTask({
    Name = "send_notification",
    Priority = "high",
    Payload = {
        userId = 123,
        message = "Welcome!",
    },
})

-- Check status
local task = scheduler:GetTask(taskId)
print("Status:", task.Status)

Quick Start (Large Game - 1000+ CCU)

-- Optimized for 30+ servers
local scheduler, preset = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true, -- for critical tasks
})

-- Create fewer workers with higher concurrency
scheduler:CreateWorker({
    MaxConcurrent = 15, -- higher than small games
    Handlers = {
        ["process_payment"] = function(task)
            -- Critical handler
            return true
        end,
        ["log_analytics"] = function(task)
            -- Non-critical handler
            return true
        end,
    },
})

-- Fast tasks: standard mode
scheduler:SubmitTask({
    Name = "log_analytics",
    Payload = { event = "player_joined" },
})

-- Critical tasks: exactly-once mode
scheduler:SubmitTask({
    Name = "process_payment",
    ExactlyOnce = true, -- guaranteed execution
    Payload = { amount = 1000 },
})

Execution Modes

TaskScheduler supports two execution modes. Choose the right one for each task type.

Standard Mode (Default)

scheduler:SubmitTask({
    Name = "send_notification",
    Priority = "normal",
    Payload = { ... },
    -- ExactlyOnce defaults to false
})

Characteristics

At-least-once delivery (same as AWS SQS, Kafka, RabbitMQ)
Zero coordination overhead
Optimal performance for high-throughput workloads
Tasks may execute multiple times under failure scenarios
Handlers should be idempotent (industry best practice)

Best For

Sending notifications (duplicate is harmless)
Logging analytics (idempotent increment)
Cache invalidation (safe to repeat)
Background processing (95% of tasks)

Performance

Latency: ~5-10ms
Throughput: 1000s of tasks per second
Overhead: None

Exactly-Once Mode (Optional)

scheduler:SubmitTask({
    Name = "process_payment",
    Priority = "critical",
    ExactlyOnce = true, -- Enable distributed locking
    Payload = { transactionId = "txn_123", amount = 1000 },
})

Characteristics

Exactly-once execution guarantee via distributed locks
Prevents duplicate processing across all servers
Automatic lock acquisition and release
Built-in deadlock prevention with TTL
Slight latency overhead (~50-100ms)

How It Works

Worker acquires DataStore-based lock before execution
If lock is held by another server, task is skipped
Lock auto-extends during execution
Lock auto-releases on completion or timeout

Best For

Processing payments (can't double-charge)
Economy transactions (can't duplicate currency)
Badge awards (want true deduplication)
Critical operations (5% of tasks)

Performance

Latency: ~50-100ms (lock overhead)
Throughput: Same as standard mode
Overhead: 2-3 DataStore operations per task

Cost vs Benefit: Use exactly-once sparingly. Most tasks (95%) don't need it. Idempotent handlers in standard mode are often sufficient and much faster.

When to Use Each Mode

Task Type	Mode	Reasoning
Send notification	Standard	Duplicate notification is harmless
Log analytics event	Standard	Idempotent increment is fine
Invalidate cache	Standard	Running twice is safe
Award badge	Standard*	Use idempotency check in handler
Process payment	Exactly-Once	Cannot double-charge users
Grant currency	Exactly-Once	Cannot duplicate economy
Critical transaction	Exactly-Once	Requires guarantee

* For badge awards, you can use either mode with proper idempotency checks

Scaling Guide

TaskScheduler auto-scales with configuration presets. Choose based on your player count and server count.

Small Games (10-100 CCU, 1-5 servers)

local scheduler, preset = TaskScheduler.newWithPreset("SMALL", {
    PersistTasks = true,
})

Configuration

Workers per server: 3
Max concurrent per worker: 5
Heartbeat interval: 5 seconds
Poll interval: 0.5 seconds

Messaging Load

10 servers × 1 heartbeat every 5s = 2 publishes/sec

Medium Games (100-1000 CCU, 5-20 servers)

local scheduler, preset = TaskScheduler.newWithPreset("MEDIUM", {
    PersistTasks = true,
    ObservabilityEnabled = true,
})

Configuration

Workers per server: 5
Max concurrent per worker: 8
Heartbeat interval: 10 seconds
Poll interval: 0.5 seconds

Messaging Load

20 servers × 1 heartbeat every 10s = 2 publishes/sec

Large Games (1000+ CCU, 20+ servers)

local scheduler, preset = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true, -- for critical tasks
})

Configuration

Workers per server: 2-3 (fewer workers, higher concurrency each)
Max concurrent per worker: 15
Heartbeat interval: 15 seconds
Poll interval: 1 second
Adaptive rate limiting: Enabled

Messaging Load

50 servers × 1 heartbeat every 15s = 3.3 publishes/sec

100 servers × 1 heartbeat every 15s (with backoff) = ~4 publishes/sec

MessagingService Limit: ~2.5 publishes/sec per topic. TaskScheduler stays well under limits at all scales with adaptive backoff.

Scaling Comparison

Preset	Servers	Workers/Server	Heartbeat	Messaging Load	Status
SMALL	1-5	3	5s	2/sec	Good
MEDIUM	5-20	5	10s	2/sec	Good
LARGE	20-100+	2-3	15s	3-4/sec	Good
v1.0 (old)	12-15	5	5s/worker	20/sec	Over Limit

Core Concepts

Tasks

A task is a unit of work with a name, payload, and configuration. Tasks move through states:

pending: Queued, waiting for worker
running: Currently executing
completed: Successfully finished
failed: Failed but will retry
dead: Permanently failed (moved to dead letter queue)

Workers

Workers pull tasks from the queue and execute handler functions. Each worker:

Runs up to MaxConcurrent tasks simultaneously
Polls the queue at PollInterval
Requests work stealing when underutilized
Tracked by server-level heartbeat

Priority Levels

Tasks are executed in priority order:

critical: Urgent tasks (executed first)
high: Important tasks
normal: Default priority
low: Background tasks

API Reference

TaskScheduler.new

TaskScheduler.new(config: SchedulerConfig): TaskScheduler

Creates a new task scheduler instance with custom configuration.

Config Options

Field	Type	Default	Description
PersistTasks	boolean	false	Enable DataStore persistence and recovery
ObservabilityEnabled	boolean	false	Enable detailed event logging
EnableExactlyOnce	boolean	false	NEW Enable exactly-once execution mode
DeadLetterRetention	number	86400	Seconds to keep dead tasks (24 hours)
CleanupInterval	number	300	Seconds between cleanup runs
PersistenceBatchSize	number	10	Batch DataStore writes
PersistenceBatchTimeout	number	2	Flush persistence every N seconds

TaskScheduler.newWithPreset

TaskScheduler.newWithPreset(preset: string, config: SchedulerConfig?): (TaskScheduler, PresetConfig)

NEW v2.0 Creates a scheduler with preset configuration for easy scaling.

Presets

"SMALL" - Small games (10-100 CCU, 1-5 servers)
"MEDIUM" - Medium games (100-1000 CCU, 5-20 servers)
"LARGE" - Large games (1000+ CCU, 20+ servers)
"SINGLE_SERVER" - Testing / single server (disables messaging)

local scheduler, preset = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    EnableExactlyOnce = true,
})

print("Workers per server:", preset.workers)
print("Max concurrent:", preset.maxConcurrent)

CreateWorker

scheduler:CreateWorker(config: WorkerConfig): string

Creates a worker that executes tasks.

Config Options

Field	Type	Default	Description
MaxConcurrent	number	5	Max tasks running simultaneously
Handlers	table	required	Map of task names to handler functions
PollInterval	number	0.5	Seconds between queue polls
StealThreshold	number	10	Request work stealing if queue below this

SubmitTask

scheduler:SubmitTask(config: TaskConfig): string

Submits a task to the queue.

Config Options

Field	Type	Default	Description
Name	string	required	Handler name to execute
Priority	string	"normal"	critical, high, normal, or low
ExactlyOnce	boolean	false	NEW Enable distributed locking
Payload	table	{}	Data passed to handler
MaxRetries	number	3	Retry attempts before dead
RetryDelay	number	5	Base delay (exponential: 5s, 10s, 20s...)
Timeout	number	60	Seconds before timeout
DependsOn	{string}	{}	Array of task IDs that must complete first
ScheduledFor	number	nil	Unix timestamp to delay execution
LocalIdempotencyKey	string	nil	Deduplicate on same server only

GetTask

scheduler:GetTask(taskId: string): Task?

Retrieves a task by ID. Returns nil if not found.

CancelTask

scheduler:CancelTask(taskId: string): boolean

Cancels a pending task. Returns false if task is running or complete.

GetStats

scheduler:GetStats(): table

Returns scheduler statistics.

Configuration

Recommended Configurations by Scale

Small Game

local scheduler = TaskScheduler.newWithPreset("SMALL", {
    PersistTasks = true,
})

Medium Game

local scheduler = TaskScheduler.newWithPreset("MEDIUM", {
    PersistTasks = true,
    ObservabilityEnabled = true,
})

Large Game

local scheduler = TaskScheduler.newWithPreset("LARGE", {
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true,
})

Custom Configuration

local scheduler = TaskScheduler.new({
    PersistTasks = true,
    ObservabilityEnabled = true,
    EnableExactlyOnce = true,
    DeadLetterRetention = 172800, -- 48 hours
    CleanupInterval = 600, -- 10 minutes
    PersistenceBatchSize = 20,
})

Best Practices

1. Write Idempotent Handlers

Even with exactly-once mode, handlers should be idempotent:

-- GOOD: Idempotent handler
["award_badge"] = function(task)
    local userId = task.Payload.userId
    local badgeId = task.Payload.badgeId
    
    -- Check if already awarded
    local key = userId .. "_" .. badgeId
    if DataStore:GetAsync(key) then
        return true -- already done
    end
    
    -- Award badge
    BadgeService:AwardBadge(userId, badgeId)
    DataStore:SetAsync(key, true)
    
    return true
end

2. Use Exactly-Once Sparingly

-- Standard mode for most tasks (95%)
scheduler:SubmitTask({
    Name = "send_notification",
    Payload = { ... },
})

-- Exactly-once only for critical tasks (5%)
scheduler:SubmitTask({
    Name = "process_payment",
    ExactlyOnce = true,
    Payload = { ... },
})

3. Choose the Right Preset

-- Small game
local scheduler = TaskScheduler.newWithPreset("SMALL")

-- Large game (don't use SMALL preset!)
local scheduler = TaskScheduler.newWithPreset("LARGE")

4. Monitor Statistics

task.spawn(function()
    while true do
        task.wait(60)
        local stats = scheduler:GetStats()
        
        if stats.dead > 10 then
            warn("High dead letter count:", stats.dead)
        end
        
        if stats.pending > 500 then
            warn("Queue backlog detected:", stats.pending)
        end
    end
end)

Examples

Mixed Execution Modes

local scheduler = TaskScheduler.newWithPreset("MEDIUM", {
    EnableExactlyOnce = true,
})

scheduler:CreateWorker({
    Handlers = {
        ["log_analytics"] = function(task)
            -- Non-critical: standard mode is fine
            AnalyticsStore:IncrementAsync("events", 1)
            return true
        end,
        
        ["grant_currency"] = function(task)
            -- Critical: ensure idempotency
            local userId = task.Payload.userId
            local amount = task.Payload.amount
            
            -- Check if already granted
            local key = "currency_" .. task.Id
            if DataStore:GetAsync(key) then
                return true
            end
            
            -- Grant currency
            local profile = ProfileStore:LoadProfileAsync(userId)
            profile.Data.Currency += amount
            DataStore:SetAsync(key, true)
            
            return true
        end,
    },
})

-- Fast: standard mode
scheduler:SubmitTask({
    Name = "log_analytics",
    Payload = { event = "player_joined" },
})

-- Guaranteed: exactly-once mode
scheduler:SubmitTask({
    Name = "grant_currency",
    ExactlyOnce = true,
    Payload = { userId = 123, amount = 100 },
})

Dependency Chains

local task1 = scheduler:SubmitTask({
    Name = "fetch_data",
})

local task2 = scheduler:SubmitTask({
    Name = "process_data",
    DependsOn = {task1},
})

local task3 = scheduler:SubmitTask({
    Name = "save_results",
    DependsOn = {task2},
    ExactlyOnce = true, -- ensure final save is guaranteed
})

Scheduled Tasks

-- Daily reset at midnight
local midnight = os.time({
    year = 2026,
    month = 2,
    day = 17,
    hour = 0,
})

scheduler:SubmitTask({
    Name = "daily_reset",
    ScheduledFor = midnight,
    ExactlyOnce = true, -- ensure it only runs once
})

Performance

Throughput

Mode	Latency	Throughput	Overhead
Standard	5-10ms	1000s/sec	None
Exactly-Once	50-100ms	Same	2-3 DataStore ops

Messaging Load by Scale

Scale	Servers	Publishes/sec	vs Limit (2.5/sec)
SMALL	10	2	80%
MEDIUM	20	2	80%
LARGE	50	3.3	132% (backoff kicks in)
LARGE (backoff)	100	~4	160% (further backoff)

Adaptive backoff automatically adjusts intervals to stay under limits

v1.0 vs v2.0

Metric	v1.0	v2.0	Improvement
Max servers	12-15	100+	600%+
Messaging load	20/sec	3-4/sec	80% reduction
Execution modes	1	2	Exactly-once added
Setup complexity	Manual	Presets	One-line

Troubleshooting

Tasks Not Executing

Possible causes:

No worker with matching handler
All workers at MaxConcurrent limit
Task has unmet dependencies
Task is scheduled for future
Exactly-once lock is held by another server

High Dead Letter Count

-- Inspect dead letters
for taskId, task in pairs(scheduler.DeadLetterQueue) do
    print("Failed:", task.Name)
    print("Error:", task.LastError)
    print("Attempts:", task.Attempts)
    print("ExactlyOnce:", task.ExactlyOnce)
end

MessagingService Rate Limits

If you see RateLimitBackoff events frequently:

Use a larger preset (MEDIUM LARGE)
Reduce worker count
Adaptive backoff will handle it automatically

Exactly-Once Tasks Skipped

Check for TaskLockFailed events in observability logs. This means another server is already executing the task (working as intended).

Getting Help

Enable ObservabilityEnabled for detailed logs
Check task Status and LastError fields
Monitor GetStats() output
Contact @chi0sk on Discord