docs/infra/EVENT_SYSTEMS_ARCHITECTURE.md
Event Systems Architecture
Overview
Wallcrawler uses three complementary event systems to enable synchronous browser session creation on serverless infrastructure:
- EventBridge - AWS infrastructure events
- DynamoDB Streams - Application state changes
- SNS - Synchronous communication
Why Three Event Systems?
The Core Challenge
The API provides synchronous session creation (POST /v1/sessions) where clients wait up to 45 seconds for a fully ready browser with connection details. This requires coordinating:
- Container lifecycle (AWS infrastructure)
- Application readiness (Chrome + CDP proxy)
- API response delivery (Lambda coordination)
Each System's Role
EventBridge
- Source: AWS ECS service
- Events: Task state changes (PROVISIONING → RUNNING → STOPPED)
- Purpose: Monitor container lifecycle and failures
- Consumer:
ecs-task-processorLambda
DynamoDB Streams
- Source: Session table updates
- Events: All session state changes
- Purpose: Capture application-level state transitions
- Consumer:
sessions-stream-processorLambda
SNS
- Source: Stream processor Lambda
- Events: Session ready notifications
- Purpose: Wake up waiting Lambda functions
- Consumer:
sessions-createLambda
Session Creation Flow
sequenceDiagram
participant Client
participant API as sessions-create Lambda
participant DB as DynamoDB
participant ECS as ECS Task
participant EB as EventBridge
participant EBL as ecs-task-processor
participant Stream as DynamoDB Streams
participant SP as stream-processor
participant SNS as SNS Topic
Client->>API: POST /v1/sessions
API->>DB: Create session (CREATING)
API->>ECS: RunTask
API->>SNS: Subscribe & Wait
Note over ECS: Container starts
ECS->>EB: Task RUNNING event
EB->>EBL: Process event
EBL->>DB: Update (RUNNING)
Note over ECS: Chrome initializes
ECS->>DB: Update (READY)
DB->>Stream: Status change event
Stream->>SP: Process stream
SP->>SNS: Publish ready notification
SNS->>API: Deliver notification
API->>Client: Return session details
Key Design Decisions
Container Updates vs EventBridge Updates
- EventBridge: Knows when container is RUNNING (infrastructure ready)
- Container: Knows when Chrome is READY (application ready)
- Both update DynamoDB for complete visibility
Why Not Just Polling?
- Would require multiple DynamoDB reads
- Higher latency (polling intervals)
- More expensive and less efficient
Why Not Just EventBridge?
- EventBridge alone can't wake a waiting Lambda
- No way to correlate container events with API requests
- Missing application-level readiness signals
Benefits of This Architecture
- Loose Coupling: Each component has minimal dependencies
- Reliability: Multiple checkpoints ensure session readiness
- Observability: Every state change is tracked
- Real-time: Push-based notifications minimize latency
- Fault Tolerance: Failed containers are detected via EventBridge
Alternative Approaches Considered
- Step Functions: Added complexity for simple wait pattern
- SQS Long Polling: Required message correlation logic
- Direct Container-to-SNS: Tighter coupling, more permissions
- Pure Polling: Inefficient and higher latency
The current design elegantly solves synchronous session creation while maintaining serverless best practices.
