Logo
  • blog
  • my toolkit
RubiPi 2  |  Learning by doing

RubiPi 2 | Learning by doing

So here the doing begins. As last step of RubiPi 1 | How pi works I let the clanker create a project scaffolding (only folders and empty files) and write a plan of what should be built. As this is mainly for learning an personal use, there is going to be a lot of simplification:

  • RubiPi will be built in Ruby, not TypeScript. Why ? Because Ruby is a lot easier to read.
  • Compared to the original `pi-mono`RubiPi intentionally removes most “product surface area” so the core ideas are easy to see: we’ll start with
    • one provider
    • one simple CLI mode
    • just a few tools
  • We’ll be skipping
    • UI polish
    • extensions
    • OAuth
    • multi-provider support
    • session trees
    • auto-compaction.

The result should be a small codebase where you can follow the entire agent flow (message → stream → tool call → tool result → continue) without jumping across dozens of files and subsystems.

Our plan

Imprint

Privacy

Twitter

LinkedIn

© Thomas Jansen 2025

# RubiPi – Minimal learning-focused build plan

## Goal
Build a **small, readable Ruby port** of the core “pi” experience to learn how an agent works end-to-end:
- send messages to an LLM
- stream the response
- handle tool calls (function calling)
- execute local tools
- feed tool results back to the model
- repeat until the model finishes

This is not a feature-complete clone. It’s a “core loop + thin CLI” project.

## Non-goals (for the minimal version)
- Full-screen terminal UI (TUI), fancy rendering, themes
- Extensions / plugins / skills packages system
- OAuth login flows
- Multi-provider support (start with one provider)
- Session tree navigation (forking/branching)
- Auto-compaction / summarization
- Web UI / Slack bot / pod management

## Architecture (3 layers, one gem)
Keep everything in one repo/gem, but separate concerns by namespace/folders:

1) **`RubiPi::AI`** (LLM connector)
   - Models, messages, tools, and a unified streaming interface
   - A provider registry/dispatcher (even if we only register one provider initially)

2) **`RubiPi::AgentCore`** (the agent loop)
   - The loop that:
     - asks the model
     - executes tool calls
     - appends tool results
     - continues until done
   - Emits a stream of events for UI/CLI to render

3) **`RubiPi::CodingAgent`** (product harness)
   - CLI parsing and “modes” (start with print mode)
   - Tool implementations (read/write/bash)
   - Optional minimal session persistence (later milestone)

This mirrors the TypeScript monorepo split:
- `packages/ai` → `RubiPi::AI`
- `packages/agent` → `RubiPi::AgentCore`
- `packages/coding-agent` → `RubiPi::CodingAgent`

## Minimal feature set (what we will actually build)
### A. CLI (print mode)
- Command: `bin/rubipi "message..."` (or stdin)
- Streams assistant text to stdout
- If the assistant calls tools, execute them and continue automatically
- Exits when the assistant is done

### B. One provider (OpenAI-compatible)
- Implement **one** provider using an OpenAI-compatible endpoint:
  - Chat Completions compatible streaming (SSE) with tool calling
  - Environment variable for API key
- Model config provided via CLI flags or defaults:
  - `--base-url`, `--model`, `--api-key` (or `OPENAI_API_KEY`)

### C. Tool calling
- Define a Ruby tool interface:
  - name, description, JSON schema-ish parameters (minimal at first), execute(args)
- Tools to implement:
  - `read`: read a file
  - `write`: write/overwrite a file
  - `bash`: run a shell command (careful, but fine for local learning)

### D. Event stream (for learning + testability)
Emit events like:
- `agent_start`, `turn_start`
- `message_start`, `message_delta`, `message_end`
- `tool_start`, `tool_end`
- `turn_end`, `agent_end`

The CLI prints from these events; later a TUI could also consume them.

## Build sequence (milestones)

### Milestone 1: “Hello streaming”
**Outcome:** `rubipi "hi"` streams model text and exits cleanly.
- Implement `RubiPi::AI::Stream` with a single provider adapter
- Parse SSE chunks, produce text deltas, finish with a final message

**Done when:**
- Text appears incrementally (not only at the end)
- Exit code is 0 on success; non-0 with a readable error on failure

### Milestone 2: “Tool call round-trip”
**Outcome:** model requests a tool → RubiPi executes it → model continues.
- Implement `RubiPi::AgentCore::AgentLoop`
- Implement minimal tool registry + 1 tool (`read`)
- Support: assistant → toolCall → toolResult → assistant

**Done when:**
- Prompt “Read RubiPi/plan.md and summarize it” results in a tool call + summary

### Milestone 3: “Useful tool set”
**Outcome:** basic coding-agent-ish actions are possible.
- Add `write` and `bash` tools
- Add basic safety rails (see below)

**Done when:**
- Prompt “Create a file X with Y content” causes `write`
- Prompt “Run `ruby -v`” causes `bash`

### Milestone 4: “Usable CLI ergonomics”
**Outcome:** feels like a small CLI tool, not a script.
- Support stdin piping
- Add flags:
  - `--model`, `--base-url`, `--api-key`
  - `--json` (optional) to output event stream as JSONL for debugging
- Clear help output

**Done when:**
- `echo "hello" | rubipi` works
- `rubipi --help` shows options

### Milestone 5 (optional): “Minimal sessions”
**Outcome:** persist conversation to a JSONL file.
- Append messages and tool results
- Support `--session <path>` and `--continue`

**Done when:**
- You can run two commands and the second run continues prior context

## Safety rails (minimal)
Even for a personal learning project, add a few guardrails early:
- `read`/`write` limited to a project root (default: current working dir)
- `bash` runs with a timeout and captures output size (truncate large output)
- Never auto-execute “dangerous” commands by default (optional prompt/allowlist later)

## Testing approach (keep it lightweight)
- Unit test:
  - SSE parsing (feed chunks → expect deltas)
  - tool argument decoding
  - agent loop transitions (tool call → tool result)
- Avoid end-to-end provider tests unless you want to provide real keys in CI.

## Open questions (decide as we implement)
- Choose the first API format:
  - OpenAI Chat Completions compatible streaming is the simplest starting point
  - Later: OpenAI Responses / other providers
- How strict to be on tool parameter validation in Ruby (start permissive, tighten later)