wallcrawler

view on github

Created: June 10, 2025

Last commit: October 3, 2025

Go68.3%

TypeScript23.6%

Shell5.4%

Makefile1.1%

Dockerfile0.8%

+1 more

Remote browser automationLLM-powered browsingLarge Language Models (LLM)Browser sessions & contextsPersisted browser contexts (S3 archives)REST APISDK integrationEvent-driven architectureInfrastructure as Code (IaC)Monorepopnpmpnpm install:submodulespnpm generate-envpnpm buildpnpm deploypnpm lintpnpm testpnpm -r devpnpm cdkAWS CLICDK Toolkit (cdk)CDK helpersDocker (local builds)Multi-Arch Docker SolutionGit submodulesAPI GatewayAmazon API GatewayEventBridgeAmazon EventBridgeDynamoDBAmazon DynamoDBS3Amazon S3RedisCloudWatchAmazon CloudWatchJWT (JSON Web Token)SHA-256REST

README.md

Wallcrawler Monorepo

Self‑hosted, AWS‑backed remote browser platform with Stagehand LLM browsing, compatible with Browserbase APIs. This monorepo contains the infrastructure, backend services, SDK, and UI components to run Wallcrawler in your own AWS account.

Quick links

Architecture: docs/infra/ARCHITECTURE.md
Event Systems: docs/infra/EVENT_SYSTEMS_ARCHITECTURE.md
DynamoDB Schema: docs/infra/DYNAMODB_SCHEMA.md
Multi‑Arch Docker: docs/infra/MULTI_ARCH_DOCKER_SOLUTION.md
Deployment Guide: docs/deploy/DEPLOYMENT_GUIDE.md
API Endpoints Reference: docs/api/api-endpoints-reference.md
SDK Integration Guide: docs/api/sdk-integration-guide.md
Sessions:
- JWT Signing Key Flow: docs/api/sessions/jwt-signing-key-flow.md
- Container Lifecycle: docs/api/sessions/wallcrawler-container-lifecycle.md
- CloudWatch Logging Best Practices: docs/api/sessions/cloudwatch-logging-best-practices.md

Packages

@wallcrawler/aws-cdk — AWS CDK app defining all infrastructure (API Gateway, Lambda, ECS/Fargate, EventBridge, DynamoDB, Redis, etc.)
- Source: packages/aws-cdk/
- See: docs/infra/ARCHITECTURE.md and docs/deploy/DEPLOYMENT_GUIDE.md
@wallcrawler/backend-go — Go Lambda handlers and services for SDK‑compatible endpoints and orchestration
- Source: packages/backend-go/
- README: packages/backend-go/README.md
@wallcrawler/sdk-node — TypeScript/Node client for Wallcrawler’s REST API (Browserbase‑compatible)
- Source: packages/sdk-node/
- README: packages/sdk-node/README.md
- API: packages/sdk-node/api.md
@wallcrawler/stagehand — Stagehand fork used by Wallcrawler for LLM‑powered browsing
- Source: packages/stagehand/
- README: packages/stagehand/README.md
@wallcrawler/components — UI components (e.g., BrowserViewport) for embedding live sessions
- Source: packages/components/

Prerequisites

Node.js >= 18 and pnpm >= 8
Go >= 1.21 (for backend)
AWS CLI configured for your target account
Docker (for local builds and multi‑arch images)

Getting started

# 1) Initialize submodules (if any)
pnpm install:submodules

# 2) Install dependencies
pnpm install

# 3) Build everything
pnpm build

# 4) Generate local env (CDK helpers)
pnpm generate-env

# 5) Deploy (see Deployment Guide for environments/config)
pnpm deploy

Additional scripts:

Lint: pnpm lint
Tests: pnpm test
Dev (package‑scoped): pnpm -r dev
CDK Toolkit: pnpm cdk

Configuration

The backend reads several environment variables at runtime:

WALLCRAWLER_MAX_SESSION_TIMEOUT — Maximum allowed session duration in seconds (defaults to 3600).
PROJECTS_TABLE_NAME, API_KEYS_TABLE_NAME, CONTEXTS_TABLE_NAME — Automatically injected by the CDK stack for the Lambda functions.
CONTEXTS_BUCKET_NAME — S3 bucket that stores browser context archives for persisted sessions.
SESSIONS_TABLE_NAME — Sessions table (wallcrawler-sessions by default).
Contexts (browser profiles) remain project-scoped. If you expose contexts to end users, ensure your application filters by both projectId and your own user identifier before forwarding requests to Wallcrawler.
API keys can be associated with multiple projects. When a key has more than one project, include x-wc-project-id on each request to select the target project; the authorizer denies access if the requested project is not in the key's allowlist.

Data Stores

DynamoDB
- wallcrawler-sessions — Session metadata, lifecycle history, and connection info.
- wallcrawler-projects — Project configuration (default timeout, concurrency limits, billing tier).
- wallcrawler-api-keys — SHA-256 hashed API keys mapped to one or more projects (projectIds attribute) with status flags.
- wallcrawler-contexts — Browser context metadata and S3 object keys. Add per-user ownership metadata in your app if you need user-level isolation.
S3
- wallcrawler-contexts-* — Stores compressed Chrome user data directories for persisted contexts.

API compatibility

Wallcrawler provides Browserbase‑compatible APIs and Stagehand endpoints. For exact routes, request/response shapes, and streaming behavior, see:

docs/api/api-endpoints-reference.md
docs/api/sdk-integration-guide.md

Architecture overview

High‑level design, event flows, and data models are covered in the docs referenced above. For a visual, see docs/infra/wallcrawler-aws-architecture.png.