Participant Guide
Competition Instructions
Everything you need to build, submit, and score your ticket-routing agent.
Overview
The STLabs Compe-ticket-ion challenges you to build a container-based agent that classifies support tickets into ordered (integration, method) pairs. Your agent runs inside a sandboxed environment with a fixed 15-minute timeout, and is scored on correctness, speed, and cost efficiency.
1. Getting Started
- Sign in with your
@mit.eduemail using a one-time magic link on the login page. - View your registry credentials on the Submissions page under “Registry Access”. You’ll receive a private ECR namespace and push-only credentials.
- Build your agent as a Docker container targeting
linux/amd64. Your container’sCMD/ENTRYPOINTis executed as-is inside the sandbox.
2. Input / Output Protocol
Your agent receives ticket data on stdin and must emit results on stdout, one JSON object per line.
Input format (one line per ticket)
{ "ticket_id": "t001", "text": "Customer reports Slack integration not sending notifications..." }Output format (one line per ticket)
{ "ticket_id": "t001", "pairs": [{ "integration": "slack", "method": "send_message" }] }Ordering matters
The pairs array must be returned in the same order the integrations/methods appear in the ticket text.
Scoring uses a longest-common-subsequence (LCS) algorithm on the ordered list, so swapping the order of correct pairs will lower your accuracy score.
Semantic matching
Integration and method names in the truth set may not be identical to the phrasing in the ticket text — they can be semantically similar.
For example, a ticket mentioning “posting a message in Slack” might map to { "integration": "slack", "method": "send_message" }. Your agent should normalize to the canonical integration/method vocabulary rather than echoing the exact words from the ticket.
Rules
- Each output line must include
ticket_idand apairsarray withintegrationandmethodfields. - First valid output per
ticket_idwins — duplicates are warned but not penalized. - Unknown ticket IDs are silently ignored.
- Unknown integration or method values are treated as mismatches.
- Partial output is scored; there is no failure penalty.
3. Runtime Environment
Your container runs in an E2B sandbox with a 15-minute timeout and restricted network egress (platform domain only).
Environment variables available to your agent
| Variable | Description |
|---|---|
| OPENROUTER_BASE_URL | Base URL for the OpenRouter proxy provided by the platform. |
| OPENROUTER_API_KEY | Per-submission signed proxy token for model API calls. |
| RUN_PROTOCOL_VERSION | Protocol version string for forward compatibility. |
4. LLM Access
You cannot call your own models or any external LLM API directly.
The sandbox firewall blocks all outbound traffic except to the platform domain. All model calls must go through the platform’s OpenRouter proxy at the URL provided in OPENROUTER_BASE_URL. Attempts to reach OpenAI, Anthropic, or any other provider directly will fail with a network error.
The proxy is fully compatible with the OpenAI Python SDK. Point base_url at the environment variable and authenticate with the provided proxy token:
Python example using the OpenAI SDK
import os
from openai import OpenAI
client = OpenAI(
base_url=os.environ["OPENROUTER_BASE_URL"],
api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
model="openai/gpt-4o-mini", # must be an allowlisted model
messages=[
{"role": "system", "content": "You are a ticket routing assistant."},
{"role": "user", "content": ticket_text},
],
)
print(response.choices[0].message.content)This works because OpenRouter exposes an OpenAI-compatible /chat/completions endpoint. Simply swap base_url to the proxy and you’re set — no other code changes required.
- Only allowlisted models may be used. Requests for other models will be rejected by the proxy.(All models are currently allowed.)
- A per-submission cost cap is enforced — exceeding it will terminate your run immediately.
5. Submitting Your Agent
- Build and tag your container for
linux/amd64. - Push the image to your assigned ECR namespace using the credentials shown on the Submissions page.
- The platform automatically detects pushes and enqueues an evaluation run. You can track status on the Submissions page.
- Up to 3 queued submissions are allowed at a time. You can cancel queued submissions before they start running.
6. Scoring
Each submission is scored using the formula:
final_score = (w1 * completion_rate)
+ (w2 * accuracy)
+ (w3 * speed_score)
+ (w4 * cost_score)- Accuracy is LCS-based on ordered
(integration, method)pairs per ticket. - Speed score =
min(1, baseline_ms / mean_time_ms) - Cost score =
min(1, baseline_usd / cost_per_ticket) - Fixed baselines are used — no peer normalization.
Scoring weights (w1–w4) are set before the competition begins and we do not plan to change them during the event. In the unlikely event that a weight adjustment is necessary, all leaderboard submissions will be automatically re-scored to ensure fairness.
7. Leaderboard
- Your best score under the active scoring version is shown.
- Failed or timed-out runs are excluded.
- Tie-breaker: earliest
completed_attimestamp. - During the event, only aggregate metrics are shown. Per-ticket breakdowns are not revealed.
8. Rules and Constraints
- Your container must target
linux/amd64and use digest-pinned image references. - Network access is restricted to the platform domain only — no external API calls.
- Submissions are evaluated against a fixed 200-ticket set that is disjoint from any samples or templates.
- Ground truth is never revealed to participants.
- Credential rotation has a cooldown period. Plan key rotations accordingly.
9. Prizes
First Place
$500 Amazon Gift Card
Second Place
$200 Amazon Gift Card
Third Place
$50 Amazon Gift Card