Specification for the Declarative Scheduler
This document provides a normative specification for the backend declarative scheduler's public interface and externally observable operational semantics.
Introduction & Normative Language
This specification defines the externally observable behavior of the Declarative Scheduler. It serves as:
- An integration contract for other teams
- A foundation for conformance testing
- A guide for independent re-implementations
- A behavioral lock for future refactors
Normative Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.
Scope
In scope: Public interface (initialize, stop) and all externally observable behaviors including timing, persistence, logging, errors, and state transitions.
Out of scope: Internal module structure, private helper APIs, storage engine internals beyond externally visible effects.
Public API Surface
The scheduler exposes exactly two public methods forming a minimal lifecycle interface.
Scheduler Interface
A Scheduler instance MUST provide the following interface:
interface Scheduler {
initialize(registrations: Registration[]): Promise<void>
stop(): Promise<void>
}
Registration Format
A Registration MUST be a 4-tuple array with the following structure:
type Registration = [
string, // Task name (unique identifier)
string, // Cron expression (POSIX format)
Callback, // Async function to execute
Duration // Retry delay duration
]
Where:
- Task name MUST be a non-empty string unique within the registration set
- Cron expression MUST be a valid POSIX 5-field cron expression
- Callback MUST be an async function
() => Promise<void> - Duration MUST be a non-negative time duration
Operational Model & Time Semantics
Time Provider and Granularity
The scheduler MUST:
- Use the host system's local clock as the authoritative time source
- Operate at minute-level granularity
- Evaluate task schedules at each minute boundary
- Consider a minute boundary to occur at the start of each minute (seconds = 0)
Timely Behavior
The scheduler MUST:
- Handle multiple due tasks in parallel with no ordering guarantees
The scheduler MUST:
- Use the host system's local timezone for all time calculations
- Handle Daylight Saving Time (DST) transitions according to the host system's clock
- Consider a minute that does not exist during DST transitions (e.g., 2:30 AM during "spring forward") as automatically skipped
- Execute tasks multiple times for minutes that occur twice during DST transitions (e.g., 2:30 AM during "fall back")
- Continue normal scheduling after DST transitions without requiring restart
DST Transition Behavior:
- Spring Forward (Lost Hour): Tasks scheduled during the skipped hour MUST NOT execute that day
- Fall Back (Repeated Hour): Tasks scheduled during the repeated hour MUST execute both times the minute occurs
- Next Execution Calculation: MUST correctly account for DST transitions when calculating future occurrences
No Make-Up Execution Policy
Critical Invariant: When tasks miss multiple scheduled executions due to downtime, the scheduler MUST execute each task at most once when resuming, regardless of how many executions were missed.
Rationale: This prevents resource overwhelming and maintains predictable load patterns.
Example: A task scheduled 15,30,45,0 * * * * (every 15 minutes) that misses 4 executions during a 1-hour outage MUST run only once when the scheduler resumes, not 4 times.
Startup Semantics
First Startup (No Persisted State)
For every task, which has not yet been persisted, the scheduler MUST:
- Evaluate task's cron expression against the current time
- Execute immediately if and only if cron expression exactly matches the current minute
- Schedule task for its next future occurrence
Subsequent Restarts (With Persisted State)
When persisted state exists, the scheduler MUST:
- Load the previous execution state
- Apply persistence override logic (see Persistence Semantics)
- Continue normal scheduling based on last known attempt/success/failure times
Scheduler Lifecycle
Scheduler State Model
The scheduler MUST exist in exactly one of the following states:
Scheduler State Definitions
- Uninitialized: Scheduler created but not yet initialized
- Initializing: Processing registrations, applying overrides, starting tasks
- Running: Normal operation with polling active and tasks scheduled
- Stopping: Graceful shutdown in progress, waiting for running tasks
- Stopped: Cleanup complete, no active polling or tasks
Scheduler State Transitions
Uninitialized to Initializing
Guard: initialize(registrations) called with valid input
Action: Begin validation, persistence override resolution
Events: SchedulerInitializationStarted
Initializing to Running
Guard: All registrations validated, overrides applied, tasks scheduled
Action: Start polling loop, mark scheduler as active
Events: SchedulerInitializationCompleted
Initializing to Uninitialized
Guard: Initialization fails due to validation errors
Action: Clean up partial state, reset to uninitialized
Events: SchedulerInitializationFailed
Initializing to Initializing (Error State)
Guard: initialize(registrations) called while scheduler is already initializing
Action: Throw SchedulerAlreadyActiveError with state "initializing"
Events: None (error thrown synchronously)
Running to Running (Error State)
Guard: initialize(registrations) called while scheduler is running
Action: Throw SchedulerAlreadyActiveError with state "running"
Events: None (error thrown synchronously)
Running to Stopping
Guard: stop() called
Action: Stop accepting new polls, wait for running tasks
Events: SchedulerStopRequested
Stopping to Stopped
Guard: All running tasks complete, polling stopped
Action: Final cleanup, release resources
Events: SchedulerStopped
Task Lifecycle
Task State Model
Each task MUST exist in exactly one of the following states:
Task State Definitions
- AwaitingRun: Task is waiting for its next cron occurrence
- Running: Task callback is currently executing
- AwaitingRetry: Task failed and is waiting for retry delay to pass
Task State Transitions
From AwaitingRun to Running
Guard:
Either Case A or Case B.
-
Case A: Current minute matches the cron expression AND the task has not been run this minute.
-
Case B: There has been a time in the past when the cron expression matched, but the scheduler missed it, except if the task has never run before.
Action: Invoke task callback, record attempt timestamp
Events: TaskRunStarted
From Running to AwaitingRun
Guard: Task callback completes successfully
Action: Record success timestamp, clear any pending retry
Events: TaskRunCompleted
From Running to AwaitingRetry
Guard: Task callback throws an error or rejects
Action: Record failure timestamp, calculate pendingRetryUntil = now + retryDelay
Events: TaskRunFailed
From AwaitingRetry to Running
Guard: Current time pendingRetryUntil
Action: Clear retry state, invoke task callback, record attempt timestamp
Events: TaskRetryStarted
From AwaitingRetry to AwaitingRun
Guard: New cron occurrence is due while task is in retry state
Action: Clear retry state, proceed with cron execution
Events: TaskRetryPreempted, TaskRunStarted
Timestamp Management
The scheduler MUST maintain the following timestamps for each task:
lastAttemptAt: Timestamp of most recent execution attempt (success or failure)lastSuccessAt: Timestamp of most recent successful execution (if any)pendingRetryUntil: Timestamp when retry is allowed (if in AwaitingRetry state)
Polling Lifecycle
Polling State Model
State Definitions
- Inactive: No polling loop running, no tasks scheduled
- Active: Polling loop running, evaluating scheduled tasks
- Stopping: Stop requested, waiting for running tasks to complete
State Transitions
Inactive to Active
Guard: First task is scheduled via schedule() call
Action: Start polling loop
Events: PollingStarted
Active to Inactive
Guard: Last scheduled task is cancelled via cancel() call
Action: Stop polling loop
Events: PollingStopped
Active to Stopping
Guard: stopLoop() is called
Action: Mark scheduler as stopping, complete current poll cycle
Events: PollingStopRequested
Stopping to Inactive
Guard: All currently running tasks complete execution
Action: Final cleanup, release resources
Events: PollingStopped
Cron Language Specification
The scheduler MUST accept strictly POSIX-compliant cron expressions as defined in IEEE Std 1003.1.
Formal Grammar (EBNF)
cron-expr = SP* minute SP+ hour SP+ day SP+ month SP+ weekday SP* ;
minute = field-content ;
hour = field-content ;
day = field-content ;
month = field-content ;
weekday = field-content ;
field-content = "*" / element-list ;
element-list = element ("," element)* ;
element = number / range ;
range = number "-" number ;
number = DIGIT+ ;
SP = " " / "\t" ;
DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" ;
Field Ranges
- minute: 0–59
- hour: 0–23
- day: 1–31
- month: 1–12
- weekday: 0–6 (0 = Sunday, 6 = Saturday)
Validation Rules
The scheduler MUST reject expressions that:
- Do not contain exactly 5 fields separated by whitespace
- Contain step syntax (
/N) - Contain named values (
jan,mon,sunday) - Contain macro expressions (
@daily,@hourly) - Contain Quartz-specific tokens (
?,L,W,#) - Contain weekday value
7(use0for Sunday) - Contain wrap-around ranges where start > end
- Contain values outside the valid range for each field
- Contain non-decimal numeric formats (scientific notation, hex, signs)
Day-of-Month/Day-of-Week Semantics
When both day-of-month (DOM) and day-of-week (DOW) are restricted (not *), the scheduler MUST execute the task if either condition matches (OR logic).
Truth Table:
| DOM | DOW | Logic | Example | Runs On |
|---|---|---|---|---|
* | * | AND | 0 0 * * * | Every day at midnight |
* | restricted | DOW only | 0 0 * * 1 | Every Monday at midnight |
| restricted | * | DOM only | 0 0 15 * * | 15th of every month at midnight |
| restricted | restricted | OR | 0 0 1,15 * 1 | 1st, 15th, OR every Monday at midnight |
Examples
Valid expressions:
0 0 * * *- Daily at midnight15 3 * * 1-5- 3:15 AM on weekdays0,30 * * * *- Every 30 minutes0 12 14 2 *- Noon on February 14th
Invalid expressions:
*/15 * * * *- Step syntax not allowed0 0 * * mon- Named values not allowed@daily- Macros not allowed0 0 ? * *- Quartz tokens not allowed
Error Model
Error Taxonomy
The scheduler MUST throw the following error types under the specified conditions:
Registration Validation Errors
RegistrationsNotArrayError
- When:
initialize()called with non-array registrations parameter - Message:
"Registrations must be an array" - Details: None
RegistrationShapeError
- When: Registration tuple has wrong length or invalid types
- Message:
"Invalid registration shape: expected [string, string, function, Duration]" - Details:
{ registrationIndex: number, received: any }
InvalidRegistrationError
- When: Registration contains invalid data beyond shape issues
- Message: Varies based on specific validation failure
- Details:
{ field: string, value: any, reason: string }
ScheduleDuplicateTaskError
- When: Multiple registrations have the same task name
- Message:
"Task with name \"<name>\" is already scheduled" - Details:
{ taskName: string }
SchedulerAlreadyActiveError
- When:
initialize()called while scheduler is already initializing or running - Message:
"Cannot initialize scheduler: scheduler is already <state>"where state is "initializing" or "running" - Details:
{ currentState: string }
CronExpressionInvalidError
- When: Cron expression fails validation
- Message:
"Invalid cron expression \"<expr>\": <field> field <reason>" - Details:
{ expression: string, field: string, reason: string }
NegativeRetryDelayError
- When: Retry delay is negative
- Message:
"Retry delay must be non-negative" - Details:
{ retryDelayMs: number }
Cron Expression Parsing Errors
InvalidCronExpressionError (from expression module)
- When: Cron parsing fails due to syntax errors
- Message:
"Invalid cron expression \"<expr>\": <field> field <reason>" - Details:
{ expression: string, field: string, reason: string }
FieldParseError
- When: Individual field parsing fails within cron expression
- Message: Specific to field validation failure
- Details:
{ fieldValue: string, fieldName: string }
Cron Calculation Errors
CronCalculationError
- When: Date calculation fails: no future or previous occurrences
- Message:
"Failed to calculate next occurrence: <cause>" - Details:
{ expression: string, currentTime: string, cause: Error }
Task State Management Errors
TaskTryDeserializeError (Base class)
- When: Task state deserialization fails
- Message: Varies based on specific failure
- Details:
{ field: string, value: any, expectedType: string }
TaskMissingFieldError
- When: Required field missing from persisted task state
- Message:
"Missing required field: <field>" - Details:
{ field: string }
TaskInvalidTypeError
- When: Field has wrong type in persisted task state
- Message:
"Invalid type for field '<field>': expected <expected>, got <actual>" - Details:
{ field: string, value: any, expectedType: string, actualType: string }
TaskInvalidValueError
- When: Field has invalid value in persisted task state
- Message:
"Invalid value for field '<field>': <reason>" - Details:
{ field: string, value: any, reason: string }
TaskInvalidStructureError
- When: Task state structure is fundamentally invalid
- Message: Varies based on structural issue
- Details:
{ value: any }
Error Throwing Guarantees
The scheduler MUST:
- Throw validation errors synchronously during
initialize()before any scheduling begins - Wrap and re-throw unexpected errors with enhanced context
- Preserve original error information in
details.causewhen wrapping - Use consistent error names and message formats across versions
- Include sufficient detail in error messages for debugging without exposing security-sensitive information
Persistence Semantics & Overrides
Override Resolution
When initialize() is called, the scheduler MUST compare provided registrations against persisted state and categorize each task as:
Classification Types
New Task: Exists in registrations but not in persisted state
- Action: Create new task state, apply first startup semantics
Preserved Task: Exists in both with identical configuration
- Action: Load existing state, continue normal scheduling
Overridden Task: Exists in both but with different configuration
- Action: Update persisted state with new configuration, keep execution history (attempts, successes, failures)
Orphaned Task: Exists in both. Was started by previous instance, but has not finished under it.
- Action: Update persisted state with new configuration, but set to restart immediately.
Configuration Comparison
Tasks are considered identical if and only if:
- Task name matches exactly
- Cron expression string matches exactly
- Retry delay duration matches exactly
Any difference in the above fields MUST trigger override behavior.
Scheduler Identifier
The scheduler SHOULD:
- Generate a unique identifier on first initialization
- Use this identifier to detect orphaned tasks from other scheduler instances
Override Atomicity
All persistence override operations MUST be applied atomically. If any override operation fails, the scheduler MUST restore the previous state and throw an error.
Concurrency & Reentrancy
Parallel Execution
The scheduler MUST:
- Allow multiple tasks to execute concurrently
- Provide no ordering guarantees between simultaneous task executions
- Ensure each individual task executes serially (no concurrent executions of the same task)
Reentrancy Protection
The scheduler MUST:
- Reject multiple concurrent calls to
initialize(). Only the first call MUST proceed; subsequent calls MUST throwSchedulerAlreadyActiveError. - Allow multiple concurrent calls to
stop() - Allow
stop()to be called duringinitialize() - Ensure
stop()waits for any in-progressinitialize()to complete
Resource Management
The scheduler MUST:
- Wait for all running tasks to complete before
stop()returns - Clean up polling resources regardless of task completion success
- Handle task execution failures without affecting other running tasks
Determinism & Idempotency
Deterministic Behavior
Given identical inputs, the scheduler SHOULD produce deterministic outputs:
- Same registrations + same persisted state + same wall clock time = same execution decisions
- Task execution order within a poll MAY vary but task selection MUST be deterministic
Idempotency Guarantees
initialize() Idempotency:
- Multiple calls with identical registrations MUST have no additional effect
- Subsequent calls MUST not duplicate task scheduling
- Override detection MUST work correctly across multiple calls
State Persistence Idempotency:
- Writing the same state multiple times MUST be safe
- Partial failures MUST not corrupt state
- Recovery from crashes MUST restore consistent state
Non-Deterministic Elements
The following behaviors MAY vary between equivalent runs:
- Exact execution timing within the same minute
- Task execution order within a single poll
- Specific polling interval timing (as long as all minutes are covered)
Formal Theory of Observable Behavior
This specification contains a formal, mathematical model of the scheduler's observable behavior. This model is defined in the companion document scheduler-theory.md.
Real-time bounds
These are operational timing requirements for implementations and operators. They are engineering targets.
R1 — Scheduling latency target. When the scheduler is running and a task is due according to the cron layer (i.e., the system clock reaches the minute boundary specified by the task's cron expression), the implementation SHOULD start the task's callback within approximately 1 minute of that minute boundary, assuming no deliberate stop is in progress and the task is not running already. This upper bound SHOULD scale linearly with the number of scheduled tasks (e.g., 100000 tasks = 10 minutes, 1000000 tasks = 100 minutes), but implementations SHOULD keep this as low as possible through efficient scheduling algorithms.
R2 — Post‑restart recovery target. If the scheduler process restarts while a task callback was in flight, then after restart and once the task is present in the active registrations and eligible to run, the implementation SHOULD start the task's callback within approximately 1 minute of the next eligible minute boundary, assuming no deliberate stop is in progress and the task is not running already. This upper bound SHOULD scale linearly with the number of scheduled tasks (e.g., 100000 tasks = 10 minutes, 1000000 tasks = 100 minutes), but implementations SHOULD keep this as low as possible through efficient scheduling algorithms.
Assumptions & Notes
External factors such as OS suspension, VM pauses, heavy load, or administrative throttling can and will extend observed latencies beyond these targets; implementations SHOULD surface such deviations in metrics/logs so operators can take corrective action.
References & Glossary
References
- RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels
- RFC 8174 - Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words
- POSIX crontab - The Open Group Base Specifications
- POSIX Programmer's Manual - crontab(1p)
Glossary
Cron Expression: A POSIX-compliant 5-field time specification string
Declarative Configuration: Task definitions provided as data rather than imperative commands
Make-Up Execution: Executing missed occurrences after downtime (explicitly NOT supported)
Override: Replacing persisted task configuration with new registration data
Polling: Periodic evaluation of task schedules to determine execution
Registration: A 4-tuple defining a scheduled task's identity, schedule, callback, and retry behavior
Task: A scheduled unit of work with associated execution state
Temporal Granularity: The minimum time resolution for scheduling (1 minute)