Specification for the Declarative Scheduler

This document provides a normative specification for the backend declarative scheduler's public interface and externally observable operational semantics.

Introduction & Normative Language

This specification defines the externally observable behavior of the Declarative Scheduler. It serves as:

An integration contract for other teams
A foundation for conformance testing
A guide for independent re-implementations
A behavioral lock for future refactors

Normative Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.

Scope

In scope: Public interface (initialize, stop) and all externally observable behaviors including timing, persistence, logging, errors, and state transitions.

Out of scope: Internal module structure, private helper APIs, storage engine internals beyond externally visible effects.

Public API Surface

The scheduler exposes exactly two public methods forming a minimal lifecycle interface.

Scheduler Interface

A Scheduler instance MUST provide the following interface:

interface Scheduler {
    initialize(registrations: Registration[]): Promise<void>
    stop(): Promise<void>
}

Registration Format

A Registration MUST be a 4-tuple array with the following structure:

type Registration = [
    string,     // Task name (unique identifier)
    string,     // Cron expression (POSIX format)
    Callback,   // Async function to execute
    Duration    // Retry delay duration
]

Where:

Task name MUST be a non-empty string unique within the registration set
Cron expression MUST be a valid POSIX 5-field cron expression
Callback MUST be an async function () => Promise<void>
Duration MUST be a non-negative time duration

Operational Model & Time Semantics

Time Provider and Granularity

The scheduler MUST:

Use the host system's local clock as the authoritative time source
Operate at minute-level granularity
Evaluate task schedules at each minute boundary
Consider a minute boundary to occur at the start of each minute (seconds = 0)

Timely Behavior

The scheduler MUST:

Handle multiple due tasks in parallel with no ordering guarantees

The scheduler MUST:

Use the host system's local timezone for all time calculations
Handle Daylight Saving Time (DST) transitions according to the host system's clock
Consider a minute that does not exist during DST transitions (e.g., 2:30 AM during "spring forward") as automatically skipped
Execute tasks multiple times for minutes that occur twice during DST transitions (e.g., 2:30 AM during "fall back")
Continue normal scheduling after DST transitions without requiring restart

DST Transition Behavior:

Spring Forward (Lost Hour): Tasks scheduled during the skipped hour MUST NOT execute that day
Fall Back (Repeated Hour): Tasks scheduled during the repeated hour MUST execute both times the minute occurs
Next Execution Calculation: MUST correctly account for DST transitions when calculating future occurrences

No Make-Up Execution Policy

Critical Invariant: When tasks miss multiple scheduled executions due to downtime, the scheduler MUST execute each task at most once when resuming, regardless of how many executions were missed.

Rationale: This prevents resource overwhelming and maintains predictable load patterns.

Example: A task scheduled 15,30,45,0 * * * * (every 15 minutes) that misses 4 executions during a 1-hour outage MUST run only once when the scheduler resumes, not 4 times.

Startup Semantics

First Startup (No Persisted State)

For every task, which has not yet been persisted, the scheduler MUST:

Evaluate task's cron expression against the current time
Execute immediately if and only if cron expression exactly matches the current minute
Schedule task for its next future occurrence

Subsequent Restarts (With Persisted State)

When persisted state exists, the scheduler MUST:

Load the previous execution state
Apply persistence override logic (see Persistence Semantics)
Continue normal scheduling based on last known attempt/success/failure times

Scheduler Lifecycle

Scheduler State Model

The scheduler MUST exist in exactly one of the following states:

Scheduler State Definitions

Uninitialized: Scheduler created but not yet initialized
Initializing: Processing registrations, applying overrides, starting tasks
Running: Normal operation with polling active and tasks scheduled
Stopping: Graceful shutdown in progress, waiting for running tasks
Stopped: Cleanup complete, no active polling or tasks

Scheduler State Transitions

Uninitialized to Initializing

Guard: initialize(registrations) called with valid input

Action: Begin validation, persistence override resolution

Events: SchedulerInitializationStarted

Initializing to Running

Guard: All registrations validated, overrides applied, tasks scheduled

Action: Start polling loop, mark scheduler as active

Events: SchedulerInitializationCompleted

Initializing to Uninitialized

Guard: Initialization fails due to validation errors

Action: Clean up partial state, reset to uninitialized

Events: SchedulerInitializationFailed

Initializing to Initializing (Error State)

Guard: initialize(registrations) called while scheduler is already initializing

Action: Throw SchedulerAlreadyActiveError with state "initializing"

Events: None (error thrown synchronously)

Running to Running (Error State)

Guard: initialize(registrations) called while scheduler is running

Action: Throw SchedulerAlreadyActiveError with state "running"

Events: None (error thrown synchronously)

Running to Stopping

Guard: stop() called

Action: Stop accepting new polls, wait for running tasks

Events: SchedulerStopRequested

Stopping to Stopped

Guard: All running tasks complete, polling stopped

Action: Final cleanup, release resources

Events: SchedulerStopped

Task Lifecycle

Task State Model

Each task MUST exist in exactly one of the following states:

Task State Definitions

AwaitingRun: Task is waiting for its next cron occurrence
Running: Task callback is currently executing
AwaitingRetry: Task failed and is waiting for retry delay to pass

Task State Transitions

From AwaitingRun to Running

Guard:

Either Case A or Case B.

Case A: Current minute matches the cron expression AND the task has not been run this minute.
Case B: There has been a time in the past when the cron expression matched, but the scheduler missed it, except if the task has never run before.

Action: Invoke task callback, record attempt timestamp

Events: TaskRunStarted

From Running to AwaitingRun

Guard: Task callback completes successfully

Action: Record success timestamp, clear any pending retry

Events: TaskRunCompleted

From Running to AwaitingRetry

Guard: Task callback throws an error or rejects

Action: Record failure timestamp, calculate pendingRetryUntil = now + retryDelay

Events: TaskRunFailed

From AwaitingRetry to Running

Guard: Current time $\geq$ pendingRetryUntil

Action: Clear retry state, invoke task callback, record attempt timestamp

Events: TaskRetryStarted

From AwaitingRetry to AwaitingRun

Guard: New cron occurrence is due while task is in retry state

Action: Clear retry state, proceed with cron execution

Events: TaskRetryPreempted, TaskRunStarted

Timestamp Management

The scheduler MUST maintain the following timestamps for each task:

lastAttemptAt: Timestamp of most recent execution attempt (success or failure)
lastSuccessAt: Timestamp of most recent successful execution (if any)
pendingRetryUntil: Timestamp when retry is allowed (if in AwaitingRetry state)

Polling Lifecycle

Polling State Model

State Definitions

Inactive: No polling loop running, no tasks scheduled
Active: Polling loop running, evaluating scheduled tasks
Stopping: Stop requested, waiting for running tasks to complete

State Transitions

Inactive to Active

Guard: First task is scheduled via schedule() call

Action: Start polling loop

Events: PollingStarted

Active to Inactive

Guard: Last scheduled task is cancelled via cancel() call

Action: Stop polling loop

Events: PollingStopped

Active to Stopping

Guard: stopLoop() is called

Action: Mark scheduler as stopping, complete current poll cycle

Events: PollingStopRequested

Stopping to Inactive

Guard: All currently running tasks complete execution

Action: Final cleanup, release resources

Events: PollingStopped

Cron Language Specification

The scheduler MUST accept strictly POSIX-compliant cron expressions as defined in IEEE Std 1003.1.

Formal Grammar (EBNF)

cron-expr    = SP* minute SP+ hour SP+ day SP+ month SP+ weekday SP* ;
minute       = field-content ;
hour         = field-content ;
day          = field-content ;
month        = field-content ;
weekday      = field-content ;

field-content = "*" / element-list ;
element-list  = element ("," element)* ;
element       = number / range ;
range         = number "-" number ;
number        = DIGIT+ ;

SP           = " " / "\t" ;
DIGIT        = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" ;

Field Ranges

minute: 0–59
hour: 0–23
day: 1–31
month: 1–12
weekday: 0–6 (0 = Sunday, 6 = Saturday)

Validation Rules

The scheduler MUST reject expressions that:

Do not contain exactly 5 fields separated by whitespace
Contain step syntax (/N)
Contain named values (jan, mon, sunday)
Contain macro expressions (@daily, @hourly)
Contain Quartz-specific tokens (?, L, W, #)
Contain weekday value 7 (use 0 for Sunday)
Contain wrap-around ranges where start > end
Contain values outside the valid range for each field
Contain non-decimal numeric formats (scientific notation, hex, signs)

Day-of-Month/Day-of-Week Semantics

When both day-of-month (DOM) and day-of-week (DOW) are restricted (not *), the scheduler MUST execute the task if either condition matches (OR logic).

Truth Table:

DOM	DOW	Logic	Example	Runs On
`*`	`*`	AND	`0 0 * * *`	Every day at midnight
`*`	restricted	DOW only	`0 0 * * 1`	Every Monday at midnight
restricted	`*`	DOM only	`0 0 15 * *`	15th of every month at midnight
restricted	restricted	OR	`0 0 1,15 * 1`	1st, 15th, OR every Monday at midnight

Examples

Valid expressions:

0 0 * * * - Daily at midnight
15 3 * * 1-5 - 3:15 AM on weekdays
0,30 * * * * - Every 30 minutes
0 12 14 2 * - Noon on February 14th

Invalid expressions:

*/15 * * * * - Step syntax not allowed
0 0 * * mon - Named values not allowed
@daily - Macros not allowed
0 0 ? * * - Quartz tokens not allowed

Error Model

Error Taxonomy

The scheduler MUST throw the following error types under the specified conditions:

Registration Validation Errors

RegistrationsNotArrayError

When: initialize() called with non-array registrations parameter
Message: "Registrations must be an array"
Details: None

RegistrationShapeError

When: Registration tuple has wrong length or invalid types
Message: "Invalid registration shape: expected [string, string, function, Duration]"
Details: { registrationIndex: number, received: any }

InvalidRegistrationError

When: Registration contains invalid data beyond shape issues
Message: Varies based on specific validation failure
Details: { field: string, value: any, reason: string }

ScheduleDuplicateTaskError

When: Multiple registrations have the same task name
Message: "Task with name \"<name>\" is already scheduled"
Details: { taskName: string }

SchedulerAlreadyActiveError

When: initialize() called while scheduler is already initializing or running
Message: "Cannot initialize scheduler: scheduler is already <state>" where state is "initializing" or "running"
Details: { currentState: string }

CronExpressionInvalidError

When: Cron expression fails validation
Message: "Invalid cron expression \"<expr>\": <field> field <reason>"
Details: { expression: string, field: string, reason: string }

NegativeRetryDelayError

When: Retry delay is negative
Message: "Retry delay must be non-negative"
Details: { retryDelayMs: number }

Cron Expression Parsing Errors

InvalidCronExpressionError (from expression module)

When: Cron parsing fails due to syntax errors
Message: "Invalid cron expression \"<expr>\": <field> field <reason>"
Details: { expression: string, field: string, reason: string }

FieldParseError

When: Individual field parsing fails within cron expression
Message: Specific to field validation failure
Details: { fieldValue: string, fieldName: string }

Cron Calculation Errors

CronCalculationError

When: Date calculation fails: no future or previous occurrences
Message: "Failed to calculate next occurrence: <cause>"
Details: { expression: string, currentTime: string, cause: Error }

Task State Management Errors

TaskTryDeserializeError (Base class)

When: Task state deserialization fails
Message: Varies based on specific failure
Details: { field: string, value: any, expectedType: string }

TaskMissingFieldError

When: Required field missing from persisted task state
Message: "Missing required field: <field>"
Details: { field: string }

TaskInvalidTypeError

When: Field has wrong type in persisted task state
Message: "Invalid type for field '<field>': expected <expected>, got <actual>"
Details: { field: string, value: any, expectedType: string, actualType: string }

TaskInvalidValueError

When: Field has invalid value in persisted task state
Message: "Invalid value for field '<field>': <reason>"
Details: { field: string, value: any, reason: string }

TaskInvalidStructureError

When: Task state structure is fundamentally invalid
Message: Varies based on structural issue
Details: { value: any }

Error Throwing Guarantees

The scheduler MUST:

Throw validation errors synchronously during initialize() before any scheduling begins
Wrap and re-throw unexpected errors with enhanced context
Preserve original error information in details.cause when wrapping
Use consistent error names and message formats across versions
Include sufficient detail in error messages for debugging without exposing security-sensitive information

Persistence Semantics & Overrides

Override Resolution

When initialize() is called, the scheduler MUST compare provided registrations against persisted state and categorize each task as:

Classification Types

New Task: Exists in registrations but not in persisted state

Action: Create new task state, apply first startup semantics

Preserved Task: Exists in both with identical configuration

Action: Load existing state, continue normal scheduling

Overridden Task: Exists in both but with different configuration

Action: Update persisted state with new configuration, keep execution history (attempts, successes, failures)

Orphaned Task: Exists in both. Was started by previous instance, but has not finished under it.

Action: Update persisted state with new configuration, but set to restart immediately.

Configuration Comparison

Tasks are considered identical if and only if:

Task name matches exactly
Cron expression string matches exactly
Retry delay duration matches exactly

Any difference in the above fields MUST trigger override behavior.

Scheduler Identifier

The scheduler SHOULD:

Generate a unique identifier on first initialization
Use this identifier to detect orphaned tasks from other scheduler instances

Override Atomicity

All persistence override operations MUST be applied atomically. If any override operation fails, the scheduler MUST restore the previous state and throw an error.

Concurrency & Reentrancy

Parallel Execution

The scheduler MUST:

Allow multiple tasks to execute concurrently
Provide no ordering guarantees between simultaneous task executions
Ensure each individual task executes serially (no concurrent executions of the same task)

Reentrancy Protection

The scheduler MUST:

Reject multiple concurrent calls to initialize(). Only the first call MUST proceed; subsequent calls MUST throw SchedulerAlreadyActiveError.
Allow multiple concurrent calls to stop()
Allow stop() to be called during initialize()
Ensure stop() waits for any in-progress initialize() to complete

Resource Management

The scheduler MUST:

Wait for all running tasks to complete before stop() returns
Clean up polling resources regardless of task completion success
Handle task execution failures without affecting other running tasks

Determinism & Idempotency

Deterministic Behavior

Given identical inputs, the scheduler SHOULD produce deterministic outputs:

Same registrations + same persisted state + same wall clock time = same execution decisions
Task execution order within a poll MAY vary but task selection MUST be deterministic

Idempotency Guarantees

initialize() Idempotency:

Multiple calls with identical registrations MUST have no additional effect
Subsequent calls MUST not duplicate task scheduling
Override detection MUST work correctly across multiple calls

State Persistence Idempotency:

Writing the same state multiple times MUST be safe
Partial failures MUST not corrupt state
Recovery from crashes MUST restore consistent state

Non-Deterministic Elements

The following behaviors MAY vary between equivalent runs:

Exact execution timing within the same minute
Task execution order within a single poll
Specific polling interval timing (as long as all minutes are covered)

Formal Theory of Observable Behavior

This specification contains a formal, mathematical model of the scheduler's observable behavior. This model is defined in the companion document scheduler-theory.md.

Real-time bounds

These are operational timing requirements for implementations and operators. They are engineering targets.

R1 — Scheduling latency target. When the scheduler is running and a task is due according to the cron layer (i.e., the system clock reaches the minute boundary specified by the task's cron expression), the implementation SHOULD start the task's callback within approximately 1 minute of that minute boundary, assuming no deliberate stop is in progress and the task is not running already. This upper bound SHOULD scale linearly with the number of scheduled tasks (e.g., 100000 tasks = 10 minutes, 1000000 tasks = 100 minutes), but implementations SHOULD keep this as low as possible through efficient scheduling algorithms.

R2 — Post‑restart recovery target. If the scheduler process restarts while a task callback was in flight, then after restart and once the task is present in the active registrations and eligible to run, the implementation SHOULD start the task's callback within approximately 1 minute of the next eligible minute boundary, assuming no deliberate stop is in progress and the task is not running already. This upper bound SHOULD scale linearly with the number of scheduled tasks (e.g., 100000 tasks = 10 minutes, 1000000 tasks = 100 minutes), but implementations SHOULD keep this as low as possible through efficient scheduling algorithms.

Assumptions & Notes

External factors such as OS suspension, VM pauses, heavy load, or administrative throttling can and will extend observed latencies beyond these targets; implementations SHOULD surface such deviations in metrics/logs so operators can take corrective action.

References & Glossary

References

RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels
RFC 8174 - Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words
POSIX crontab - The Open Group Base Specifications
POSIX Programmer's Manual - crontab(1p)

Glossary

Cron Expression: A POSIX-compliant 5-field time specification string

Declarative Configuration: Task definitions provided as data rather than imperative commands

Make-Up Execution: Executing missed occurrences after downtime (explicitly NOT supported)

Override: Replacing persisted task configuration with new registration data

Polling: Periodic evaluation of task schedules to determine execution

Registration: A 4-tuple defining a scheduled task's identity, schedule, callback, and retry behavior

Task: A scheduled unit of work with associated execution state

Temporal Granularity: The minimum time resolution for scheduling (1 minute)

Introduction & Normative Language​

Normative Language​

Scope​

Public API Surface​

Scheduler Interface​

Registration Format​

Operational Model & Time Semantics​

Time Provider and Granularity​

Timely Behavior​

No Make-Up Execution Policy​

Startup Semantics​

First Startup (No Persisted State)​

Subsequent Restarts (With Persisted State)​

Scheduler Lifecycle​

Scheduler State Model​

Scheduler State Definitions​

Scheduler State Transitions​

Uninitialized to Initializing​

Initializing to Running​

Initializing to Uninitialized​

Initializing to Initializing (Error State)​

Running to Running (Error State)​

Running to Stopping​

Stopping to Stopped​

Task Lifecycle​

Task State Model​

Task State Definitions​

Task State Transitions​

From AwaitingRun to Running​

From Running to AwaitingRun​

From Running to AwaitingRetry​

From AwaitingRetry to Running​

From AwaitingRetry to AwaitingRun​

Timestamp Management​

Polling Lifecycle​

Polling State Model​

State Definitions​

State Transitions​

Inactive to Active​

Active to Inactive​

Active to Stopping​

Stopping to Inactive​

Cron Language Specification​

Formal Grammar (EBNF)​

Field Ranges​

Validation Rules​

Day-of-Month/Day-of-Week Semantics​

Examples​

Error Model​

Error Taxonomy​

Registration Validation Errors​

Cron Expression Parsing Errors​

Cron Calculation Errors​

Task State Management Errors​

Error Throwing Guarantees​

Persistence Semantics & Overrides​

Override Resolution​

Classification Types​

Configuration Comparison​

Scheduler Identifier​

Override Atomicity​

Concurrency & Reentrancy​

Parallel Execution​

Reentrancy Protection​

Resource Management​

Determinism & Idempotency​

Deterministic Behavior​

Idempotency Guarantees​

Non-Deterministic Elements​

Formal Theory of Observable Behavior​

Real-time bounds​

Assumptions & Notes​

References & Glossary​

References​

Glossary​

Introduction & Normative Language

Normative Language

Scope

Public API Surface

Scheduler Interface

Registration Format

Operational Model & Time Semantics

Time Provider and Granularity

Timely Behavior

No Make-Up Execution Policy

Startup Semantics

First Startup (No Persisted State)

Subsequent Restarts (With Persisted State)

Scheduler Lifecycle

Scheduler State Model

Scheduler State Definitions

Scheduler State Transitions

Uninitialized to Initializing

Initializing to Running

Initializing to Uninitialized

Initializing to Initializing (Error State)

Running to Running (Error State)

Running to Stopping

Stopping to Stopped

Task Lifecycle

Task State Model

Task State Definitions

Task State Transitions

From AwaitingRun to Running

From Running to AwaitingRun

From Running to AwaitingRetry

From AwaitingRetry to Running

From AwaitingRetry to AwaitingRun

Timestamp Management

Polling Lifecycle

Polling State Model

State Definitions

State Transitions

Inactive to Active

Active to Inactive

Active to Stopping

Stopping to Inactive

Cron Language Specification

Formal Grammar (EBNF)

Field Ranges

Validation Rules

Day-of-Month/Day-of-Week Semantics

Examples

Error Model

Error Taxonomy

Registration Validation Errors

Cron Expression Parsing Errors

Cron Calculation Errors

Task State Management Errors

Error Throwing Guarantees

Persistence Semantics & Overrides

Override Resolution

Classification Types

Configuration Comparison

Scheduler Identifier

Override Atomicity

Concurrency & Reentrancy

Parallel Execution

Reentrancy Protection

Resource Management

Determinism & Idempotency

Deterministic Behavior

Idempotency Guarantees

Non-Deterministic Elements

Formal Theory of Observable Behavior

Real-time bounds

Assumptions & Notes

References & Glossary

References

Glossary