We built a distributed timer service capable of handling 100,000 timer creations per second with high precision and at least once delivery guarantees. The architecture separates concerns between a stateless Timer Service API (for CRUD operations) and horizontally scalable Timer Processors (for expiration handling). Workers scan their partitions for soon-to-expire timers (2-3 minute look-ahead window), load them into in-memory data structures for precise firing, then publish notifications to Kafka. ZooKeeper coordinates partition ownership among workers, preventing duplicate processing through ephemeral nodes and automatic rebalancing. DynamoDB provides the storage layer with a clever GSI design using time-bucketing and worker assignment for efficient scanning. Key innovations include temporal partitioning via time buckets, a two-stage scan-and-fire mechanism, ZooKeeper-based coordination, checkpoint-based recovery, and at-least-once delivery semantics.
Tech Stack: DynamoDB, Kafka, ZooKeeper
\
In today's microservices landscape, countless applications need to schedule delayed actions: sending reminder emails, expiring user sessions, triggering scheduled workflows, or managing SLA-based notifications. Yet despite this universal need, most teams either build bespoke solutions or rely on heavyweight job schedulers that aren't optimized for high-throughput timer management.
What if we could build a generic, horizontally scalable timer service that handles 100,000 timer creations per second while maintaining high precision and reliability? Let's dive into the architecture.
\
Our timer service needs to support four core operations:
Create Timer: Allow users to schedule a timer with custom expiration times and notification metadata
Retrieve Timer: Query existing timer details by ID
Delete Timer: Cancel timers before they fire
Notify on Expiration: Reliably deliver notifications when timers expire
\
The real challenge lies in the non-functional requirements:
High Throughput: Support ~100,000 timer creations per second
Precision: Maintain accuracy in timer expiration (minimize drift)
Scalability: Handle burst scenarios where thousands of timers fire simultaneously
Availability: Ensure the timer creation service remains highly available
\
The system consists of four main components working in concert:
\
The Timer Service exposes a RESTful API for timer management:
Create Timer
POST /createTimer { "UserDrivenTimerID": "user-defined-id", "Namespace": "payment-reminders", "timerExpiration": "2025-11-10T18:00:00Z", "notificationChannelMetadata": { "topic": "payment-notifications", "context": {"orderId": "12345"} } }
Retrieve Timer
GET /timer?timerId=<system-generated-id>
Delete Timer
DELETE /timer?timerId=<system-generated-id>
The API layer sits behind a load balancer, distributing requests across multiple service instances for horizontal scalability.
We use DynamoDB for its ability to handle high write throughput with predictable performance. The table is structured for our access patterns:
Primary Key: namespace:UserDrivenTimerID:uuid
This composite key ensures even distribution across partitions while allowing user-defined identifiers.
Key Attributes:
expiration_timestamp: Human-readable expiration timetime_bucket: Temporal partitioning for efficient scanningworkerId: Worker assignment for load distributionMessageMetadata: JSON containing Kafka topic and context dataGlobal Secondary Index (GSI): timers_scan_gsi
time_bucket:workerIdexpiration_timestampThis GSI is the secret sauce enabling efficient timer scanning. By combining time buckets with worker IDs, we achieve:
Primary Key: worker_id
Each timer processor maintains a checkpoint containing:
{ "time_bucket": "2025-11-10-18", "expiration_time": "2025-11-10T18:30:45Z" }
This enables crash recovery and prevents duplicate processing.
Before processors can scan partitions, they need to coordinate who owns what. This is where ZooKeeper comes in.
ZooKeeper manages partition ownership to ensure each partition is processed by exactly one worker at any time, preventing duplicate processing and wasted resources.
How it works:
/workers/worker-1)/workers path and participate in partition rebalancing when:/partitions/partition-5/owner → worker-2)Rebalancing Example:
Initial: 10 partitions, 2 workers - Worker-1: partitions [0,1,2,3,4] - Worker-2: partitions [5,6,7,8,9] Worker-3 joins → Rebalance triggered - Worker-1: partitions [0,1,2,3] - Worker-2: partitions [4,5,6] - Worker-3: partitions [7,8,9]
Benefits:
Timer processors are the workhorses of the system. Each processor follows a two-stage approach: scan and schedule, then fire and notify.
Stage 1: Scan and Schedule (every 30-60 seconds)
// DynamoDB Query using the timers_scan_gsi { TableName: "Timers", IndexName: "timers_scan_gsi", KeyConditionExpression: "time_bucket_worker = :tbw AND expiration_timestamp BETWEEN :checkpoint AND :lookahead", ExpressionAttributeValues: { ":tbw": "2025-11-10-18:worker-1", ":checkpoint": last_checkpoint_time, // e.g., "2025-11-10T18:42:00Z" ":lookahead": current_time + 3_minutes // e.g., "2025-11-10T18:48:00Z" } }
InMemoryTimer { timerId: "abc-123" expirationTime: "2025-11-10T18:45:30Z" messageMetadata: {...} }
Stage 2: Fire and Notify (continuous)
At-Least-Once Delivery Guarantee
The system guarantees at-least-once delivery through several mechanisms:
Example Timeline:
T+0s: Scan finds timer expiring at T+120s T+0s: Create in-memory timer, update checkpoint T+120s: In-memory timer fires T+120s: Publish to Kafka T+121s: Async delete from DynamoDB (batch)
If the worker crashes at T+90s, the replacement worker will:
The processors run continuously, scanning their assigned partitions at regular intervals (30-60 seconds) while the in-memory timers fire with millisecond precision.
Processed timers are published to Kafka topics, where user-owned consumers can subscribe and handle notifications according to their business logic. This decoupling provides:
Flexibility: Users define their own notification handlers
Reliability: Kafka's durability ensures messages aren't lost
Scalability: Consumer groups can scale independently
\
The architecture deliberately separates the Timer Service (write path) from Timer Processors (read/process path). This separation enables:
Time buckets are crucial for managing scan efficiency. Consider bucketing by hour:
2025-11-10T18:45:00Z → bucket 2025-11-10-182025-11-10T19:15:00Z → bucket 2025-11-10-19Benefits:
The combination of workerId field and ZooKeeper coordination enables robust horizontal scaling:
During Timer Creation:
workerId = hash(namespace:UserDrivenTimerID) % worker_countworkerId is stored with the timer for routingDuring Timer Processing:
time_bucket:workerIdPreventing Duplicate Work:
This design eliminates race conditions and ensures exactly-once processing per timer.
100K writes/second across DynamoDB:
Simultaneous expiration handling:
Memory considerations:
At-least-once delivery impact:
Duplicate notifications are rare (only on worker crashes during the look-ahead window)
Consumers can implement idempotency using timer IDs
\
There's a small window between timer creation and processor visibility (DynamoDB GSI replication lag, typically milliseconds). For most use cases, this is acceptable.
The two-stage approach (scan → in-memory → fire) creates interesting trade-offs:
Scan Interval (30-60 seconds):
Look-ahead Window (2-3 minutes):
In-Memory Timer Precision:
Once loaded in memory, timers fire with millisecond precision
Uses efficient data structures (timing wheels or priority queues)
End-to-end latency: database polling interval + Kafka publish time
\
Redis with sorted sets (using expiration timestamp as score) is a popular alternative. However:
Using Kafka's timestamp-based retention is interesting but:
Requires custom consumer logic for time-based processing
Doesn't support easy retrieval and deletion of pending timers
Retention policies may conflict with timer expiration times
\
Building a distributed timer service that handles 100,000 operations per second requires careful consideration of data modeling, partitioning strategies, and component separation. By leveraging DynamoDB's scalability, implementing smart time-bucketing, and separating concerns between creation and processing, we can build a robust, horizontally scalable timer service.
The architecture described here provides a solid foundation that can be adapted to various use cases: from simple reminder systems to complex workflow orchestration engines. The key is understanding your specific requirements around precision, throughput, and consistency, then tuning the system accordingly.
What timer-based challenges are you solving in your systems? How would you extend this architecture for your use case?
\

