Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content

The Case for a Layered Event Matcher


1. Where the Current Flat Approach Breaks Down

The current path from webhook to agent execution is:

HTTP POST → parseEvent() → WebhookEvent → matchPipelines() → assembleContext() → executePipeline()

matchPipelines() is the entire routing brain:

// pipeline.ts:55–67
const triggers: string[] = pipeline.trigger[source] ?? [];
if (triggers.some((t) => event.event === t || event.event.startsWith(t + "."))) {
  matches.push([name, pipeline]);
}

That is a string prefix test. It is the only filter in the system. Every crack in the current design flows from that single fact.

1.1 Multi-repo is structurally impossible

When GitHub sends pull_request.opened, the payload contains event.payload.repository.full_name — e.g. acme/frontend or acme/payments. The current matcher never looks at it. You cannot write two pr-review pipelines that differ only by repo without naming them differently and duplicating every other field. There is no filter.repos concept anywhere in PipelineTrigger.

Concretely: if you add a second webhook source (a second GitHub App, a monorepo), every pull_request.opened will match every pr-review-type pipeline simultaneously. You'll get N×M agent invocations.

1.2 Context assembly can't do its job without project scope

assembleContext() in context.ts has two stubs that are deliberately returning null:

async function gatherGitHistory(_event: WebhookEvent): Promise<string | null> {
  // TODO: use git log on the affected files
  return null;
}
 
async function gatherRelatedPRs(_event: WebhookEvent): Promise<string | null> {
  // TODO: query GitHub API for related PRs
  return null;
}

These are not stubs because the implementation is hard. They are stubs because the WebhookEvent passed in does not carry the information needed to do the work: no repo URL, no installation token, no default branch, no clone path. The context layer would have to re-parse event.payload.repository — buried in an untyped Record<string, unknown> — with no guarantee that field exists (e.g. Linear events have no repo at all).

The context builder is structurally blocked until there is a typed project scope object passed alongside the event.

1.3 Agent selection is static and cannot be rule-driven

PipelineConfig has a single agent: string field. Every invocation of that pipeline uses the same agent. You cannot express:

  • "use security-agent for anything in acme/payments"
  • "use fast-reviewer for draft PRs, thorough-reviewer for PRs targeting main"
  • "fall back to reviewer when no specialist matches"

The only workaround is to create separate pipelines with separate agent fields, duplicating all prompt and guardrail config. That duplication grows O(repos × agent-types).

1.4 The trigger schema can't compose filters

PipelineTrigger is { github?: string[], linear?: string[], webhook?: string[] }. It is a flat list of event name strings. There is no place to express:

  • "only for PRs targeting main or release/*"
  • "except when the author is dependabot[bot]"
  • "only for repos in the acme org"
  • "and only when the PR has the needs-review label"

Adding Slack would require modifying the PipelineTrigger interface, the config type, the matching code, and the context switch in formatEvent() — all in separate files with no clear ownership boundary.

1.5 formatEvent() has implicit source knowledge baked in

// context.ts:60–76
function formatEvent(event: WebhookEvent): string {
  if (event.source === "github") {
    const pr = payload.pull_request as Record<string, unknown>;
    // ...
  }
}

Every new domain source (Slack, Sentry, JIRA) requires a new if branch here, and another in assembleContext(), and another in matchPipelines(). The domain logic is fractured across three files with no extraction point.


2. The Layered Architecture

The key insight: matching, context, and execution have fundamentally different information needs. Mixing them prevents each from working correctly.

Layer 1: Domain Adapter     — HTTP → normalized WebhookEvent (exists)
Layer 2: Event Classifier   — WebhookEvent → typed ClassifiedEvent<T>
Layer 3: Project Router     — ClassifiedEvent → ProjectContext
Layer 4: Pipeline Matcher   — event + project → [MatchedPipeline]
Layer 5: Context Builder    — event + project + pipeline → EnrichedContext
Layer 6: Executor           — agent + prompt + context → spawn (exists)

Layers 2–5 are the missing infrastructure. Each has a clear, narrow contract. None leaks into the others.


3. Concrete Interface Proposal

Layer 2 — Event Classifier

// packages/core/src/classifier.ts
 
export type EventDomain = 'github' | 'linear' | 'slack' | 'generic';
 
export type GitHubEventType =
  | 'pull_request'
  | 'push'
  | 'issue'
  | 'issue_comment'
  | 'pull_request_review'
  | 'check_run'
  | 'deployment';
 
export type LinearEventType = 'Issue' | 'Comment' | 'Project' | 'Cycle';
 
// Typed, normalized GitHub PR payload — no more Record<string, unknown> casting
export interface GitHubPRData {
  number: number;
  title: string;
  url: string;
  author: string;
  authorType: 'user' | 'bot';     // detect dependabot, renovate, etc.
  base: string;
  head: string;
  body: string;
  draft: boolean;
  labels: string[];
  repoFullName: string;           // 'acme/frontend' — the critical routing key
  repoDefaultBranch: string;
  installationId?: string;        // for per-repo GitHub App auth
  orgLogin: string;               // 'acme'
}
 
export interface GitHubPushData {
  repoFullName: string;
  ref: string;
  branch: string;
  pusher: string;
  commitCount: number;
  compareUrl: string;
  installationId?: string;
}
 
export interface LinearIssueData {
  id: string;
  title: string;
  description: string;
  teamKey: string;                // 'HAR' — the critical routing key
  projectId?: string;
  priority: number;
  assigneeId?: string;
  workspaceId: string;
}
 
export interface ClassifiedEvent<T = unknown> {
  raw: WebhookEvent;              // always available for fallback
  domain: EventDomain;
  type: string;                   // 'pull_request', 'Issue', etc.
  action: string;                 // 'opened', 'create', 'synchronize'
  qualifiedName: string;          // 'pull_request.opened' — the current event string
  data: T;                        // typed normalized payload
  receivedAt: Date;
}
 
export interface EventClassifier {
  /**
   * Classify a raw webhook event into a typed, normalized form.
   * Returns null if the event is unrecognized or malformed.
   */
  classify(event: WebhookEvent): ClassifiedEvent | null;
}

Why this matters: ClassifiedEvent<GitHubPRData> has .data.repoFullName as a typed string. There is no more event.payload.repository?.full_name as string scattered across the codebase. The classifier owns all the payload-parsing risk in one place.


Layer 3 — Project Router

// packages/core/src/project.ts
 
export interface ProjectContext {
  // Identity
  domain: EventDomain;
  org?: string;                   // 'acme'
  repo?: string;                  // 'frontend'
  repoFullName?: string;          // 'acme/frontend'
  repoCloneUrl?: string;          // for git operations in context builder
  defaultBranch?: string;
 
  // Linear
  teamKey?: string;               // 'HAR'
  linearWorkspaceId?: string;
 
  // Auth
  installationId?: string;        // GitHub App installation for this repo
  tokenRef?: string;              // secret reference for API calls
 
  // Metadata
  tags?: string[];                // inherited from bento.yaml project config
}
 
export interface ProjectRouter {
  /**
   * Extract project scope from a classified event.
   * Never throws — returns a partial context if info is unavailable.
   */
  extractProject(event: ClassifiedEvent): ProjectContext;
}

Why this matters: assembleContext() currently receives a WebhookEvent and must do event.payload.repository as Record<string,unknown> before it can fetch git history. With ProjectContext, the context builder receives project.repoCloneUrl directly. The two stubs become implementable immediately.


Layer 4 — Pipeline Matcher (replaces matchPipelines())

// services/daemon/src/matcher.ts
 
// Extended trigger config (backwards compatible)
export interface PipelineTriggerFilter {
  repos?: string[];               // ['acme/frontend', 'acme/*'] — glob patterns
  branches?: string[];            // ['main', 'release/*']
  authors?: string[];             // ['!dependabot[bot]'] — '!' prefix = exclude
  labels?: string[];              // PR must have at least one of these
  draft?: boolean;                // true = only drafts, false = only non-drafts
  teams?: string[];               // Linear team keys: ['HAR', 'ENG']
}
 
// This extends PipelineConfig.trigger — no breaking change
export interface PipelineTriggerV2 extends PipelineTrigger {
  filter?: PipelineTriggerFilter;
}
 
export interface MatchedPipeline {
  name: string;
  config: PipelineConfig;
  agent: DiscoveredAgent;         // resolved here — executor never looks up by name again
  project: ProjectContext;
  matchScore: number;             // higher = more specific match (repo-filter beats no-filter)
}
 
export interface PipelineMatcher {
  /**
   * Find all pipelines that match this event and project context.
   * Returns matches sorted by specificity (most specific first).
   */
  match(
    event: ClassifiedEvent,
    project: ProjectContext,
    pipelines: Record<string, PipelineConfig>,
    agents: Map<string, DiscoveredAgent>,
  ): MatchedPipeline[];
}

Match scoring example: A pipeline with filter.repos: ['acme/payments'] scores higher than one with no filter when the event comes from acme/payments. This prevents a "catch-all" pipeline from shadowing a specialized one, and makes the tie-breaking rule explicit rather than dependent on YAML ordering.


Layer 5 — Context Builder (replaces assembleContext())

// services/daemon/src/context.ts (enhanced)
 
export interface ContextBuilderOptions {
  pipeline: PipelineConfig;
  event: ClassifiedEvent;
  project: ProjectContext;
  logger: Logger;
}
 
export interface EnrichedContext {
  // Existing
  summary: string;
  sections: ContextSection[];
 
  // New — available to executor for prompt rendering
  project: ProjectContext;
 
  // Structured event data for typed template vars (not just payload paths)
  eventData: Record<string, unknown>;
}
 
export interface ContextBuilder {
  /**
   * Assemble context for a pipeline run.
   * Receives typed event + project — no payload-parsing needed here.
   */
  assemble(options: ContextBuilderOptions): Promise<EnrichedContext>;
}

Why event-aware context building is necessary: Git history for a PR requires:

  1. Repo clone URL (project.repoCloneUrl) — from Layer 3
  2. PR head SHA (event.data.head for GitHub) — from Layer 2
  3. Auth token (project.installationId) — from Layer 3

None of these were available before this refactor. Context building was fundamentally blocked by the flat architecture. With these layers, gatherGitHistory() becomes:

async function gatherGitHistory(
  event: ClassifiedEvent<GitHubPRData>,
  project: ProjectContext,
): Promise<string | null> {
  if (!project.repoCloneUrl) return null;
  const token = await resolveInstallationToken(project.installationId);
  // git log --oneline -20 HEAD..event.data.head
  // now actually implementable
}

4. How Multi-Project Support Works End-to-End

With these layers, bento.yaml can express per-repo pipelines without duplication:

pipelines:
  # Catches all PRs — low specificity
  pr-review-default:
    trigger:
      github: [pull_request.opened, pull_request.synchronize]
    agent: reviewer
    prompt: "Review this PR..."
 
  # Catches only payments/ PRs — higher specificity, wins over default
  pr-review-payments:
    trigger:
      github: [pull_request.opened, pull_request.synchronize]
      filter:
        repos: ['acme/payments']
        branches: ['main']
    agent: security-reviewer      # different agent
    prompt: "Review this payment-service PR with security focus..."
 
  # Only non-bot PRs on frontend, targeting main
  pr-review-frontend:
    trigger:
      github: [pull_request.opened]
      filter:
        repos: ['acme/frontend']
        authors: ['!dependabot[bot]', '!renovate[bot]']
        draft: false
    agent: reviewer
    context:
      git_history: true           # now actually works via ProjectContext

The PipelineMatcher scores pr-review-payments above pr-review-default for acme/payments events because its filter is more specific. The correct agent is resolved at match time and passed to the executor — no secondary lookup.


5. The Abstraction Boundary

The critical boundary is between Layers 4 and 5 — between matching and context building.

Matching (Layers 1–4) must be:

  • Cheap — runs for every event, most will not match
  • Stateless — no I/O, no API calls
  • Deterministic — same event always produces the same matches

Context building (Layer 5) must be:

  • Rich — fetch git history, query APIs, recall sessions
  • Event-type-aware — PRs need diff context, issues need related ticket context
  • Project-scope-aware — needs repo URL, auth tokens, branch info

The current code blurs this boundary: handleEvent() calls matchPipelines() (cheap), then immediately calls assembleContext() (expensive, I/O-bound) — and passes the same untyped WebhookEvent to both. The expensive work can't use the information it needs because that information was never extracted from the event at the matching stage.

The clean rule: anything that requires I/O lives in Layer 5 and below. Anything that only reads the event lives in Layer 4 and above.


6. What Does Not Change

  • WebhookEvent interface — Layer 2 wraps it, not replaces it
  • WebhookServer — still parses and emits WebhookEvent
  • PipelineQueue / executePipeline() — executor is clean, unchanged
  • DiscoveredAgent / discoverAgents() — SOUL.md discovery is fine
  • bento.yaml structure — the existing pipeline format is valid; filter is additive
  • All existing tests — matchPipelines() behavior is preserved for trigger-only pipelines

The refactor is additive. Layer 2 and 3 are new files. Layer 4 is a replacement for the 12-line matchPipelines() method. Layer 5 is an enhancement to the existing assembleContext() signature. Nothing is deleted until the old code can be shadowed by the new.


7. Summary

Failing scenarioRoot cause in current codeResolved by layer
Two repos, different agentsmatchPipelines() ignores payloadLayer 3 + Layer 4 filter
Git history is always nullassembleContext() lacks repo URLLayer 3 → Layer 5
Adding Slack breaks 3 filesDomain logic split across filesLayer 2 classifier
Can't exclude bots from triggersNo filter clause in PipelineTriggerLayer 4 filter schema
Agent can't vary by repoagent is a static string fieldLayer 4 match scoring
Prompt template can't use typed fieldsrenderPrompt only walks raw payloadLayer 5 eventData

The layered architecture does not make the code more complex. It makes the current hidden complexity explicit, gives each concern a named owner, and unblocks the features — git history, multi-repo routing, per-repo agents — that are already stubbed out or missing from the codebase.