wallcrawler

Created: June 10, 2025
Last commit: October 3, 2025
Go68.3%
TypeScript23.6%
Shell5.4%
Makefile1.1%
Dockerfile0.8%
+1 more
Remote browser automationLLM-powered browsingLarge Language Models (LLM)Browser sessions & contextsPersisted browser contexts (S3 archives)REST APISDK integrationEvent-driven architectureInfrastructure as Code (IaC)Monorepopnpmpnpm install:submodulespnpm generate-envpnpm buildpnpm deploypnpm lintpnpm testpnpm -r devpnpm cdkAWS CLICDK Toolkit (cdk)CDK helpersDocker (local builds)Multi-Arch Docker SolutionGit submodulesAPI GatewayAmazon API GatewayEventBridgeAmazon EventBridgeDynamoDBAmazon DynamoDBS3Amazon S3RedisCloudWatchAmazon CloudWatchJWT (JSON Web Token)SHA-256REST
README.md

Wallcrawler Monorepo

Self‑hosted, AWS‑backed remote browser platform with Stagehand LLM browsing, compatible with Browserbase APIs. This monorepo contains the infrastructure, backend services, SDK, and UI components to run Wallcrawler in your own AWS account.

Quick links

Packages

Prerequisites

  • Node.js >= 18 and pnpm >= 8
  • Go >= 1.21 (for backend)
  • AWS CLI configured for your target account
  • Docker (for local builds and multi‑arch images)

Getting started

# 1) Initialize submodules (if any)
pnpm install:submodules

# 2) Install dependencies
pnpm install

# 3) Build everything
pnpm build

# 4) Generate local env (CDK helpers)
pnpm generate-env

# 5) Deploy (see Deployment Guide for environments/config)
pnpm deploy

Additional scripts:

  • Lint: pnpm lint
  • Tests: pnpm test
  • Dev (package‑scoped): pnpm -r dev
  • CDK Toolkit: pnpm cdk

Configuration

The backend reads several environment variables at runtime:

  • WALLCRAWLER_MAX_SESSION_TIMEOUT — Maximum allowed session duration in seconds (defaults to 3600).
  • PROJECTS_TABLE_NAME, API_KEYS_TABLE_NAME, CONTEXTS_TABLE_NAME — Automatically injected by the CDK stack for the Lambda functions.
  • CONTEXTS_BUCKET_NAME — S3 bucket that stores browser context archives for persisted sessions.
  • SESSIONS_TABLE_NAME — Sessions table (wallcrawler-sessions by default).
  • Contexts (browser profiles) remain project-scoped. If you expose contexts to end users, ensure your application filters by both projectId and your own user identifier before forwarding requests to Wallcrawler.
  • API keys can be associated with multiple projects. When a key has more than one project, include x-wc-project-id on each request to select the target project; the authorizer denies access if the requested project is not in the key's allowlist.

Data Stores

  • DynamoDB
    • wallcrawler-sessions — Session metadata, lifecycle history, and connection info.
    • wallcrawler-projects — Project configuration (default timeout, concurrency limits, billing tier).
    • wallcrawler-api-keys — SHA-256 hashed API keys mapped to one or more projects (projectIds attribute) with status flags.
    • wallcrawler-contexts — Browser context metadata and S3 object keys. Add per-user ownership metadata in your app if you need user-level isolation.
  • S3
    • wallcrawler-contexts-* — Stores compressed Chrome user data directories for persisted contexts.

API compatibility

Wallcrawler provides Browserbase‑compatible APIs and Stagehand endpoints. For exact routes, request/response shapes, and streaming behavior, see:

  • docs/api/api-endpoints-reference.md
  • docs/api/sdk-integration-guide.md

Architecture overview

High‑level design, event flows, and data models are covered in the docs referenced above. For a visual, see docs/infra/wallcrawler-aws-architecture.png.