joshua

Human data for autonomous agents.

Reproducible demonstrations, rigorous evaluations, and human feedback - so your agents act with precision.

What we make

We capture how people actually work on the web - clicks, keys, timing, outcomes - then package it as clean trajectories for training and testing agents.

Demos

JSONL + MP4 + DOM metadata

Eval

pass/fail, action scoring, Playwright replays

RLHF

pairwise rankings with rationales

Safety

adversarial and jailbreak scenarios

Deterministic replays ≥90% · First-pass acceptance ≥95% · PII-safe by design.

Example: Frontier Lab Order

A recent large-scale dataset we delivered

75,000

videos delivered

Covering diverse web interactions

3,750

hours of demonstrations

Task-level human interactions

Multi-domain

coverage

E-commerce, SaaS, government portals, travel, professional services

Request a Sample

See real examples of our human-generated web agent training data. Get access to a curated dataset with 200 task demonstrations.

Includes JSONL trajectories, MP4 recordings, and evaluation metrics

Process

Define

Targets, constraints, success.

Record & review

Humans complete tasks. Dual-pass QA. PII scrubbed. Deterministic replays.

Deliver

JSONL + MP4 + metrics. Ready for training, eval, or fine-tuning.

Services

Tailored data solutions for your agent development needs

Human Web Demonstrations

Screen recordings with precise action sequences for training web agents

JSONL trajectories
MP4 recordings
DOM metadata

Complex Workflow Capture

Multi-step processes across enterprise applications and platforms

End-to-end workflows
Error handling paths
Edge case coverage

Agent Evaluation Sets

Comprehensive test suites to validate agent performance

Pass/fail criteria
Action scoring
Playwright replays

RLHF & Safety Data

Human feedback for reinforcement learning and safety alignment

Pairwise rankings
Rationale generation
Adversarial scenarios

Frontier labs · Agent platforms · Enterprise AI teams.

Contact Us

Ready to enhance your agent with high-quality human data? Let's discuss your specific needs.