Skip to content

Scry — Development Phases

Phase Overview

Phase Focus Status
Phase 0 Scaffolding Shipped
Phase 1 AI Chat MVP — text/voice/image, ~70 MCP tools, Claude Shipped
Phase 1.5 Multi-provider + parallel + watchers + UX polish Shipped
Phase 2 Browse hub + Dashboard + Logs + TF + Processes Shipped
Phase 3 Rich rendering + live mini-panels + monitors + fleet Shipped
Phase 4 Reliability + onboarding + packaging In progress
Phase 5 Testing + Release Pending

The roadmap was originally written as a six-phase timeline. Phase 3 grew into a much larger surface than the original "camera + plot" plan: a full rich-renderer subsystem (sensor panels, scene snapshots, BT inline, live mini-panels), multi-step diagnostic plans, edge-triggered background monitors, fleet overview, robot comparison, and a robot switcher. The phase table above reflects what's actually shipped.


Phase 0: Scaffolding — Shipped

Robot Side (scry-connect)

  • Python project with pyproject.toml and package structure
  • MCP server skeleton using mcp SDK on Streamable HTTP :5339
  • ros_list_topics proves MCP works
  • rclpy node + manager facade
  • Test with MCP Inspector tool

Android Side

  • Project setup (Kotlin, Compose, Hilt, Room, OkHttp)
  • Theme (graphite + green dark palette, Material 3, JetBrains Mono for data)
  • Navigation scaffold (5-tab bottom nav: Fleets / Robot / Scry / ROS / Settings)
  • Rosbridge WebSocket client class stub (data/rosbridge/RosbridgeClient.kt) — present but dormant; SSE through scry-connect is the active path
  • Connection screen with saved-robot list

Gate

  • Phone connects to connect on robot
  • Phone displays list of topics
  • MCP server responds to tools/list

Phase 1: AI Chat MVP — Shipped

Robot Side

  • ~99 MCP tools across per-verb manager classes: topic, service, node, param, action, lifecycle, pkg, component, ros2_control, daemon, multicast, doctor, extensions, interface, bag, dds/env, diagnostics, tf, process, system, watcher, behavior_tree, teleop, docker
  • Permission model: every tool tagged write=True/False. Mirrored on the Android side via McpToolCatalog.WRITE_TOOLS; CI parity test keeps the two in sync.
  • Shell-backed managers route through a single shell_runner with bounded timeouts and structured errors
  • rclpy-backed managers share one node + multi-threaded executor on a background thread
  • MCP resources: ros://system/info, ros://topics/{name}/schema, ros://nodes/{name}/info
  • Image handling: CompressedImage/Image → resize to 512×512 → base64 JPEG
  • Safety: rate limiting (token bucket), subprocess timeouts, structured errors (NotFound, Timeout, InvalidArgument, CliNotAvailable)
  • SSE streaming endpoint (GET /stream?topic=…)

Android Side

  • MCP client (HTTP Streamable transport, McpClient)
  • AI provider abstraction (AiClient interface)
  • Claude API client with streaming responses
  • Tool-call proxy loop (AiProxyLoop — AI → phone → MCP → phone → AI)
  • Chat UI: message list, text input, streaming text display
  • Tool call progress indicators (collapsible cards)
  • Write operation confirmation dialogs
  • Markdown rendering in chat messages
  • Voice input (SpeechRecognizer)
  • Image attachment (camera capture + gallery pick)
  • Chat history persistence (Room, per-robot)
  • Connection management (add/edit/delete robots, saved in Room)

Gate

  • User connects → text → AI calls MCP tools → diagnosis
  • Voice input works
  • Image attachment works
  • Write operations show confirmation dialog
  • Chat history persists across app restarts

Phase 1.5: Multi-provider + parallel + watchers + UX polish — Shipped

Robot side (connect)

  • WatcherManager — four long-running tools that observe ROS events for up to 60 s:
  • ros_watch_topic(topic, duration, condition?, max_events) — subscribes, optionally stops early when a condition like pose.x > 5 matches
  • ros_watch_diagnostics(duration, level_filter?) — captures /diagnostics with level filtering
  • ros_watch_node(node, duration, poll_interval) — detects appearance/disappearance (crash/restart)
  • ros_wait_for_topic(topic, timeout) — blocks until a topic appears (bringup debugging)
  • Safe condition expression parser (no eval, dotted paths, comparison operators)
  • Write tool registry kept in sync with Android McpToolCatalog.WRITE_TOOLS via drift test

Android side

  • Parallel tool execution in AiProxyLoop. Read-only tools run with async+awaitAll; writes still sequentialise, each gated by per-tool-id Approvals deferred
  • Multi-provider: OpenAiClient, GeminiClient, OllamaClient in addition to ClaudeClient. Each handles its own streaming wire format:
  • Anthropic: SSE + input_json_delta piecewise tool args
  • OpenAI: SSE + piecewise tool_calls.function.arguments
  • Gemini: SSE + atomic functionCall parts
  • Ollama: NDJSON + atomic tool_calls
  • Provider-agnostic history replay (buildHistory replays ToolUse/ToolResult so stateless providers work on session resume)
  • McpClient SSE parser fix — multi-line data: continuations, proper unwrap via jsonPrimitive.content. Dashboard and Topics parsers made envelope-tolerant.
  • VerbosityLevel (Terse / Normal / Detailed) setting injected into system prompt. System prompt rewritten: technical tone, no emojis, tables for structured data, parallel reads preferred
  • Hold-to-talk mic with SpeechRecognizer + live partial transcripts + auto-send on release
  • UI overhaul — graphite + green dark palette, softer accents, typographic hierarchy, suggestion chips, quick-action pills above the input bar, compact tool cards with expand-for-input
  • Settings redesigned — bottom-sheet API key entry, per-provider model pickers via top-bar chip, verbosity selector, Ollama URL field

Gate

  • Switch provider in top-bar chip → next chat uses the new provider without restart
  • AI asks multiple read questions in one turn → connect sees concurrent requests
  • Multiple write tool requests queue per-tool approval dialogs
  • Hold mic, speak, release → message auto-sends
  • Dashboard and Topics render real data (regression from v1 parser bug)

Phase 2: Browse hub + Dashboard + Logs + TF + Processes — Shipped

Android Side

  • Dashboard ("Robot" tab) — sectioned, honest. Identity / Graph / Liveness / Diagnostics / DDS health (opt-in probe). No fabricated traffic-light verdicts; every value is a fact. See ui/dashboard/.
  • Browse hub ("ROS" tab) — single hub for ten entity families:
  • Topics — list + detail with live JSON tree + rolling Hz/bw via ros_read_topic polling loop
  • Nodes — ros_inspect_node, tap pubs/subs/srvs to drill in
  • Services — request/response schema tree
  • Actions — goal/result/feedback schema tree
  • Lifecycle — state badges + available transitions
  • Parameters — per-node parameter tree
  • Components — loaded plugins per container
  • Logs (/rosout) — live SSE + recent history, level/node/grep filters, severity stripe, auto-scroll-lock with "↓ N new" pill. Per-node logs reachable from Node Detail.
  • TF — depth-flattened tree from ros_tf_frames, broadcaster + rate per row, orphan-frame warning; Frame Detail with live 2 Hz ros_tf_lookup.
  • Processes — system-wide ROS-related ps view via ros_list_system_processes; sortable by cpu/mem/pid/name; stats card + stdout tail for Scry-launched.
  • All list screens share EntityListScaffold (search + sort + pin + refresh)
  • Pinned items per (kind, robotId) persist in SecurePrefs.pinnedItems
  • "Ask Scry about this" hand-off → seeded chat (Routes.chatWithSeed)
  • Dashboard diagnostic warnings tappable → seeded "Investigate" chat
  • Connect SSE rewrite: persistent subscription, real ROS Hz/bw/count, image throttle, /clock+/tf throttle

Gate

  • Dashboard shows real robot health data
  • Topic browser lists all topics with search, sort, pinning, hidden toggle
  • Topic detail shows live messages updating in real-time
  • JSON tree view renders nested message structures with copy-path
  • "Ask Scry about /topic" round-trip works end-to-end

Phase 3: Rich rendering + live mini-panels + monitors + fleet — Shipped

Originally scoped as "camera + plot" — grew into a full rich-renderer subsystem because the AI's natural output for spatial / temporal / multi-field data is unintelligible as raw JSON. Driving principle: trust the renderer. The AI's prose adds context and anomaly callouts; the card carries the data.

Rich-renderer subsystem (ui/chat/rich/)

  • RichDispatcher — routes tool results to inline blocks by render_hint or by tool name. See RichRenderer.kt.
  • GroupedList — namespace-grouped topic / node / service / process lists with pub/sub badges, 0 sub flagged red
  • Tree — TF tree with per-edge rate badges
  • EntityCard — TF lookup, node inspection, parameter description
  • StatusBanner — health / doctor / diagnostics with OK / WARN / ERROR tone
  • Metric card — single big number + sparkline, tone-coloured by threshold (Hz / bw / delay)
  • LineChart — multi-series rolling chart from ros_read_topic count≥2 or ros_watch_topic
  • LogViewer — level chips, search, virtualised list (for ros_get_recent_logs)
  • SceneSnapshot — composed top-down view: occupancy grid + base_link silhouette + scan dots + path overlay + scale bar (from ros_read_scene)
  • GpsView — OSM map + marker + trail (for sensor_msgs/NavSatFix)
  • ImagePreview — FeedTile chrome (topic chip, dim/format chip, tap-to-zoom)
  • SensorPanel — same renderer as the Viz tab: Imu attitude / Battery cells / Range cone / MagneticField arrow / Wrench bars / Joy sticks / scalar gauges
  • BtTreeView — full behaviour-tree map inline (auto-fit, kind-coloured nodes, status-coloured borders). Re-uses Viz-tab BtSnapshotCanvas for parity.
  • ConfirmationCard — auto-rendered when AI announces a write (per the writes skill); shows args with diff against current value for ros_set_parameter
  • JsonTreeView fallback

Live mini-panels (render_panel)

  • Phone-side meta tool render_panel(topic, kind, duration_s, fields) — embed 1–30 s SSE-driven mini-panel into chat
  • kind ∈ {sensor, plot, scene, gps, camera}; plot takes fields: [dot-path] for multi-series overlay
  • render_scene_live(map_topic?, pose_topic?, scan_topic?, path_topic?, duration_s) — composed live scene (parallel SSE per layer, single canvas)
  • Settles into Frozen final frame or Failed("no messages received")

Diagnostic plans (emit_plan)

  • Phone-side meta tool for any debugging request needing ≥3 tool calls
  • Checklist of {label, tool, status, outcome?} rendered as a PlanBlock
  • Re-emit with updated status + verdict once concluded

Background monitors (monitor_threshold / cancel_monitor)

  • Edge-triggered watch on a topic field — alerts fire only on entry to tripped state, not continuously
  • App-scoped MonitorRegistry (Hilt singleton, SupervisorJob + Dispatchers.Default) drives one SSE subscription per active monitor
  • Alerts post into chat as assistant messages via ChatRepository.append
  • MonitorChipStrip — sticky strip between header and chat showing every active monitor with cancel button
  • Returns id for explicit cancellation

Fleet + comparison

  • fleet_overview — pings every saved robot in parallel via MCP.healthCheckTimed; renders FleetOverviewBlock with online dot, ping ms, per-robot summary
  • compare_robots(left_name, right_name, dimension, rows) — RobotComparisonBlock with two-column metric grid + diff-tinted right column
  • Robot switcher — chat header robot-name row is tappable → DropdownMenu lists every saved robot with the active one highlighted; switching swaps ActiveRobotStore and rebinds the session

Inline visualisation parity

  • Camera feeds, IMU, scans, GPS, battery, BT — all render inline in chat the same way they render on the Viz tab; same sensor renderers, same scene canvas, same BT canvas
  • scry://viz?section=…&topic=… deep-links from prose into the appropriate Viz section
  • Suggestion chips on empty chat surface every Phase 2/3 capability (39 prompts in assets/prompts/suggestions.txt)

Anomaly callouts

  • Auto-overlays on sensor cards: Battery (low <20 %, critical <10 %), Range (out of bounds), Imu (>3g critical), MagneticField (outside 10–100 µT), Wrench (50 N / 5 Nm envelope), scalar gauges (per ScalarConfig.spec)
  • Same overlays apply identically on Viz tab and in chat (anomalyFor / chatAnomalyFor)

Skills + Tier-0 prompt

  • presentation.md skill — reference for matching tool output to renderers; trimmed to ~1.5 K tokens
  • writes.md skill — write-op announcement protocol
  • System prompt updated with the 6 new core meta-tools

Gate

  • Tool result → rich block render works for every block class
  • render_panel / render_scene_live settle within duration_s + 1
  • monitor_threshold survives app backgrounding; alert appears as chat message on trip
  • Switching robot from chat header rebinds session without crash
  • Anomaly badges appear identically in chat and Viz tab

Phase 4: Reliability + onboarding + packaging — In progress

Android Side

  • Multi-robot quick switch — chat header dropdown (shipped as part of Phase 3)
  • All four AI providers wired and shipping (Claude, OpenAI, Gemini, Ollama — Phase 1.5)
  • Settings screen with credential entry per provider
  • Error handling: structured network errors, timeouts, reconnection with backoff
  • Connection health monitoring (periodic connect ping, auto-reconnect on flap)
  • App icon and splash screen
  • Onboarding flow (first launch → set up AI provider → connect robot)
  • Performance: memory profiling under long-running monitors + scene SSE, battery profiling, cold-start measurement on mid-range Android
  • Edge cases: very large messages, missing topics, slow networks, SSE reconnect with backoff

Robot Side

  • Install script (robot-setup/install.sh)
  • Dockerfile (with --public-internet and --token flags wired through compose)
  • Optional systemd service file (scry.service)

Live BT status streaming (deferred from Phase 3)

  • render_bt_live phone-side tool — subscribe to /behavior_tree_log for ~10 s and replay status updates into the inline BT canvas. Canvas already accepts nodeStates; wiring is mechanical. (See note at end of Phase 3 — pattern matches render_scene_live.)

Gate

  • Switch between 2+ robots without crashes (already gated by Phase 3)
  • App handles network disconnection gracefully (TODO)
  • Settings persist correctly ()
  • First-run onboarding completes in <2 min (TODO)

Phase 5: Testing + Release — Pending

Testing

  • Drift detectorstests/test_tools_registry.py (core-set parity + write classification), tests/test_skill_tool_references.py (skill → tool reference check, token budget)
  • Unit tests: ViewModels, protocol parsing, tool proxy logic
  • UI tests: chat rendering, navigation, topic list (Compose Test)
  • E2E tests: physical device + robot running turtlesim
  • Test with multiple robot types (TurtleBot, custom robots)
  • Test all MCP tools against real ROS 2 environment
  • Test with different DDS implementations (Fast-DDS, CycloneDDS)
  • Battery / performance testing on mid-range Android device
  • Security audit pass — see docs/SECURITY_AUDIT.md (all C/H closed, M-2/M-3 deferred with explicit trade-offs)

Release Prep

  • Play Store listing (screenshots, description, icon)
  • PyPI package for scry-connect
  • User documentation (README, quick start guide)
  • Beta distribution (internal testing track on Play Store)

Gate

  • All tests pass
  • Beta tested with 3+ different robot setups
  • Play Store review submitted
  • scry-connect published on PyPI

Key Test Scenarios

  1. Happy path: Connect → ask question → get diagnosis with tool calls
  2. Network interruption: Robot disconnects mid-chat → graceful recovery
  3. High-frequency topic: Subscribe to 100 Hz IMU → throttling works, no OOM
  4. Large messages: PointCloud2 or large image → graceful handling
  5. Camera + chat: Streaming camera while chatting → no blocking
  6. Write confirmation: AI proposes publish / lifecycle transition / controller switch → user approves/denies → correct behaviour
  7. Multi-robot: Connect to 2 robots, switch from chat header → correct context isolation
  8. Long conversation: 50+ messages → tiered-context system keeps prompt under budget
  9. Ollama local: Works without internet connectivity
  10. Different DDS: Works with Fast-DDS, CycloneDDS, Zenoh — ros_get_dds_env surfaces the right variables per middleware
  11. ros2_control fleet: Ask "which controllers are active?" → connect returns manager state without needing shell access
  12. Lifecycle flow: AI proposes configure → activate on a Nav2 node → user approves each transition individually
  13. Live mini-panel: "Plot /cmd_vel linear.x and angular.z for 5 s" → multi-series chart embedded in chat
  14. Live scene: "Show the robot moving on the map for 10 s" → composed top-down view (map + pose + scan + path) embedded in chat
  15. Monitor: "Tell me if /battery drops below 20 %" → monitor strip appears, alert fires on trip, auto-disarms while still tripped
  16. Fleet overview: "How's the fleet?" with 2+ saved robots → online dot + ping ms per robot