Saad Tarhi

Professional

Hardening AI Sandboxes in Shipper

Stabilizing AI generated code and sandbox routing

Challenge: AI generated edits and proxy issues left some Shipper apps broken or unrecoverable.

Solution: Implemented a validation gate for AI edits, wired it into the existing auto-fixing system, and fixed proxy middleware behavior to recover projects safely.

Result: Previously broken sandboxes became accessible again, and AI edits stopped silently corrupting projects, giving the team a more reliable base to iterate on.

  • Node.js
  • AI Code Sandboxes (Modal, Daytona)
  • Proxy Middleware
  • Next.js
  • TypeScript
  • Vite
Shipper interface showing AI generated app running inside an isolated sandbox
Making AI generated apps reliable inside isolated sandboxes

Context

Stabilizing AI Generated Apps In Sandboxes

Why this work mattered and the environment it lived in.

Shipper is an AI assisted app builder that provisions isolated sandboxes, runs a standard Vite + React + TypeScript template, and lets an AI agent iteratively edit the code.While the core workflow worked, reliability was fragile:
  • Some apps stopped loading entirely due to proxy errors between the web client and the sandbox.
  • Other apps were silently broken by malformed AI output or invalid TypeScript applied directly to the project.
  • An auto-fixing subsystem existed but was not systematically invoked, so bad edits could corrupt projects instead of being repaired.
The goal of my work was to:
  1. Treat AI generated edits as untrusted input and validate them before they touch the filesystem.
  2. Integrate the existing auto-fixing system into the main pipeline instead of leaving it unused.
  3. Fix proxy middleware behavior so apps broken by infrastructure issues could be recovered without manual intervention.

Snapshot

Highlights, Challenges, Skills

A concise readout for quick evaluators.

Highlights

  • Implemented a gate that validates AI generated edits before they touch the project
  • Integrated the existing auto-fixing system into the main AI editing pipeline
  • Repaired proxy middleware behavior that left otherwise valid apps unreachable

Challenges

  • Diagnosing failures that could originate in AI output, proxy routing, or sandbox behavior
  • Designing a validation strategy that catches clearly bad edits without blocking normal iteration
  • Integrating an underused auto-fixing system into a live pipeline without regressing behavior

Skills Demonstrated

  • TypeScript debugging in complex pipelines
  • Designing defensive layers for AI generated code
  • Production log analysis and incident debugging

Role

My Role & Contributions

I was brought in as a senior engineer to debug broken apps, harden the AI editing pipeline, and repair proxy behavior so Shipper could safely run AI generated code in isolated sandboxes.
  • Pipeline Hardening: Designed and implemented a gate for AI generated edits and integrated it with the existing auto-fixing subsystem
  • Production Debugging: Investigated logs and sandbox behavior to identify why some apps were broken or unreachable
  • Proxy Fixes: Patched proxy middleware so apps broken by proxy errors became recoverable instead of permanently dead
  • Responsibility Boundaries: Distinguished platform failures from genuine TypeScript errors in user projects
  • Testing Strategy: Defined practical flows to validate fixes on both existing and newly created apps
  • Consulting Input: Provided advisory feedback on storage, multi-tenant auth, and hiring assessments when asked

Key Contributions

Key Contributions & Technical Challenges

Open narratives that explain the impact of each slice of work.

Contribution 01

AI Edit Gating And Auto-fix Integration

Introduced a validation layer between AI output and the project filesystem.

Before my changes, AI generated diffs were applied directly to projects as if they were trusted. If the model produced malformed or incomplete code, the result was often a broken sandbox with no automatic recovery.I implemented a gating layer that inspects AI edits for obvious structural and syntax problems, rejects invalid changes,and forwards them into the existing auto-fixing system. This turned the auto fixer from unused code into an active partof the pipeline and stopped clearly bad edits from corrupting projects.

Contribution 02

Fixing Proxy Middleware To Recover Broken Apps

Repaired proxy behavior that made valid apps appear dead and unreachable.

Several user apps were reported as "broken" but the root cause was not in their code. By reading logs and correlatingfailures with sandbox traffic, I traced the problem to proxy middleware between the UI and each sandbox.I adjusted how the proxy handled routing and error responses so that sandboxes were contacted correctly and transientproxy failures no longer left apps in a permanently broken state. After this change, three previously dead apps becameaccessible again, while one remaining broken app was correctly identified as having a genuine TypeScript error.

Contribution 03

Clarifying Testing Flows For Existing And New Apps

Defined how to validate the new behavior across both legacy and freshly generated projects.

To make the fixes actionable for the team, I documented how to test them on two fronts:
  • Existing projects that had been broken in production.
  • Newly generated apps created by the AI agent.
For existing apps, the flow focused on loading the project, asking the AI to perform edits, and verifying that changesapplied cleanly without breaking the preview. For new apps, the emphasis was on ensuring that invalid AI output wasgated and routed to the auto fixer rather than corrupting the initial template.

Technical Deep Dive

Technical Implementation Details

The Shipper stack combines an AI agent, a TypeScript codebase, and isolated sandboxes. My work sat in the criticalpath between AI output, filesystem mutations, and sandbox routing.

AI Gating Architecture
I inserted a validation step between the AI agent and the project filesystem. The gate:
  • Examines generated diffs for obviously invalid or incomplete code.
  • Rejects edits that fail these checks.
  • Passes rejected edits into the existing auto-fixing subsystem.
This design treats AI output as untrusted input, similar to validation layers in security sensitive systems.

Auto-fix Integration
The auto fixer existed but was not systematically triggered. By routing rejected edits into it, I:
  • Exercised the auto fixer in real scenarios.
  • Turned hidden flakiness into visible bugs that can be improved over time.
  • Ensured that some syntax issues were repaired automatically in my tests instead of breaking projects.

Proxy Middleware Repairs
On the infrastructure side, I:
  • Used sandbox and proxy logs to trace why certain apps showed generic proxy errors.
  • Identified misbehavior in the proxy layer rather than in user projects.
  • Adjusted routing and error handling so proxy failures no longer left apps permanently unreachable.

Together, these changes moved Shipper closer to a robust AI product where AI generated code can iterate quickly withoutcompromising stability or recoverability.

Growth

Leadership & Technical Growth

  • Reinforced the importance of treating AI output as untrusted and validating it like any other external input
  • Learned how small changes in proxy middleware can completely change the perceived health of sandboxed apps
  • Gained experience integrating an existing auto-fixing system into a live AI editing pipeline
  • Improved at separating platform responsibilities from user code responsibilities when debugging
  • Strengthened my ability to turn scattered failures and logs into a coherent, testable fix strategy

Toolkit

Technologies & Tools

Core Stack

  • Node.js
  • Next.js
  • TypeScript
  • Vite

Libraries

  • Sandbox APIs for isolated code execution (Modal, Daytona)
  • Internal AI agent and auto-fix subsystems

Tools

  • Proxy middleware and routing layer
  • Sandbox and server logs for debugging
  • Git based workflows for deploying and testing fixes