RubiPi 2 | Learning by doing

So here the doing begins. As last step of RubiPi 1 | How pi works I let the clanker create a project scaffolding (only folders and empty files) and write a plan of what should be built. As this is mainly for learning an personal use, there is going to be a lot of simplification:

RubiPi will be built in Ruby, not TypeScript. Why ? Because Ruby is a lot easier to read.
Compared to the original `pi-mono`RubiPi intentionally removes most “product surface area” so the core ideas are easy to see: we’ll start with

one provider
one simple CLI mode
just a few tools

We’ll be skipping

UI polish
extensions
OAuth
multi-provider support
session trees
auto-compaction.

The result should be a small codebase where you can follow the entire agent flow (message → stream → tool call → tool result → continue) without jumping across dozens of files and subsystems.

Our plan

# RubiPi – Minimal learning-focused build plan ## Goal Build a **small, readable Ruby port** of the core “pi” experience to learn how an agent works end-to-end: - send messages to an LLM - stream the response - handle tool calls (function calling) - execute local tools - feed tool results back to the model - repeat until the model finishes This is not a feature-complete clone. It’s a “core loop + thin CLI” project. ## Non-goals (for the minimal version) - Full-screen terminal UI (TUI), fancy rendering, themes - Extensions / plugins / skills packages system - OAuth login flows - Multi-provider support (start with one provider) - Session tree navigation (forking/branching) - Auto-compaction / summarization - Web UI / Slack bot / pod management ## Architecture (3 layers, one gem) Keep everything in one repo/gem, but separate concerns by namespace/folders: 1) **`RubiPi::AI`** (LLM connector) - Models, messages, tools, and a unified streaming interface - A provider registry/dispatcher (even if we only register one provider initially) 2) **`RubiPi::AgentCore`** (the agent loop) - The loop that: - asks the model - executes tool calls - appends tool results - continues until done - Emits a stream of events for UI/CLI to render 3) **`RubiPi::CodingAgent`** (product harness) - CLI parsing and “modes” (start with print mode) - Tool implementations (read/write/bash) - Optional minimal session persistence (later milestone) This mirrors the TypeScript monorepo split: - `packages/ai` → `RubiPi::AI` - `packages/agent` → `RubiPi::AgentCore` - `packages/coding-agent` → `RubiPi::CodingAgent` ## Minimal feature set (what we will actually build) ### A. CLI (print mode) - Command: `bin/rubipi "message..."` (or stdin) - Streams assistant text to stdout - If the assistant calls tools, execute them and continue automatically - Exits when the assistant is done ### B. One provider (OpenAI-compatible) - Implement **one** provider using an OpenAI-compatible endpoint: - Chat Completions compatible streaming (SSE) with tool calling - Environment variable for API key - Model config provided via CLI flags or defaults: - `--base-url`, `--model`, `--api-key` (or `OPENAI_API_KEY`) ### C. Tool calling - Define a Ruby tool interface: - name, description, JSON schema-ish parameters (minimal at first), execute(args) - Tools to implement: - `read`: read a file - `write`: write/overwrite a file - `bash`: run a shell command (careful, but fine for local learning) ### D. Event stream (for learning + testability) Emit events like: - `agent_start`, `turn_start` - `message_start`, `message_delta`, `message_end` - `tool_start`, `tool_end` - `turn_end`, `agent_end` The CLI prints from these events; later a TUI could also consume them. ## Build sequence (milestones) ### Milestone 1: “Hello streaming” **Outcome:** `rubipi "hi"` streams model text and exits cleanly. - Implement `RubiPi::AI::Stream` with a single provider adapter - Parse SSE chunks, produce text deltas, finish with a final message **Done when:** - Text appears incrementally (not only at the end) - Exit code is 0 on success; non-0 with a readable error on failure ### Milestone 2: “Tool call round-trip” **Outcome:** model requests a tool → RubiPi executes it → model continues. - Implement `RubiPi::AgentCore::AgentLoop` - Implement minimal tool registry + 1 tool (`read`) - Support: assistant → toolCall → toolResult → assistant **Done when:** - Prompt “Read RubiPi/plan.md and summarize it” results in a tool call + summary ### Milestone 3: “Useful tool set” **Outcome:** basic coding-agent-ish actions are possible. - Add `write` and `bash` tools - Add basic safety rails (see below) **Done when:** - Prompt “Create a file X with Y content” causes `write` - Prompt “Run `ruby -v`” causes `bash` ### Milestone 4: “Usable CLI ergonomics” **Outcome:** feels like a small CLI tool, not a script. - Support stdin piping - Add flags: - `--model`, `--base-url`, `--api-key` - `--json` (optional) to output event stream as JSONL for debugging - Clear help output **Done when:** - `echo "hello" | rubipi` works - `rubipi --help` shows options ### Milestone 5 (optional): “Minimal sessions” **Outcome:** persist conversation to a JSONL file. - Append messages and tool results - Support `--session <path>` and `--continue` **Done when:** - You can run two commands and the second run continues prior context ## Safety rails (minimal) Even for a personal learning project, add a few guardrails early: - `read`/`write` limited to a project root (default: current working dir) - `bash` runs with a timeout and captures output size (truncate large output) - Never auto-execute “dangerous” commands by default (optional prompt/allowlist later) ## Testing approach (keep it lightweight) - Unit test: - SSE parsing (feed chunks → expect deltas) - tool argument decoding - agent loop transitions (tool call → tool result) - Avoid end-to-end provider tests unless you want to provide real keys in CI. ## Open questions (decide as we implement) - Choose the first API format: - OpenAI Chat Completions compatible streaming is the simplest starting point - Later: OpenAI Responses / other providers - How strict to be on tool parameter validation in Ruby (start permissive, tighten later)