Scry — Development Phases¶
Phase Overview¶
| Phase | Focus | Status |
|---|---|---|
| Phase 0 | Scaffolding | Shipped |
| Phase 1 | AI Chat MVP — text/voice/image, ~70 MCP tools, Claude | Shipped |
| Phase 1.5 | Multi-provider + parallel + watchers + UX polish | Shipped |
| Phase 2 | Browse hub + Dashboard + Logs + TF + Processes | Shipped |
| Phase 3 | Rich rendering + live mini-panels + monitors + fleet | Shipped |
| Phase 4 | Reliability + onboarding + packaging | In progress |
| Phase 5 | Testing + Release | Pending |
The roadmap was originally written as a six-phase timeline. Phase 3 grew into a much larger surface than the original "camera + plot" plan: a full rich-renderer subsystem (sensor panels, scene snapshots, BT inline, live mini-panels), multi-step diagnostic plans, edge-triggered background monitors, fleet overview, robot comparison, and a robot switcher. The phase table above reflects what's actually shipped.
Phase 0: Scaffolding — Shipped¶
Robot Side (scry-connect)¶
- Python project with
pyproject.tomland package structure - MCP server skeleton using
mcpSDK on Streamable HTTP :5339 -
ros_list_topicsproves MCP works - rclpy node + manager facade
- Test with MCP Inspector tool
Android Side¶
- Project setup (Kotlin, Compose, Hilt, Room, OkHttp)
- Theme (graphite + green dark palette, Material 3, JetBrains Mono for data)
- Navigation scaffold (5-tab bottom nav: Fleets / Robot / Scry / ROS / Settings)
- Rosbridge WebSocket client class stub (
data/rosbridge/RosbridgeClient.kt) — present but dormant; SSE through scry-connect is the active path - Connection screen with saved-robot list
Gate¶
- Phone connects to connect on robot
- Phone displays list of topics
- MCP server responds to
tools/list
Phase 1: AI Chat MVP — Shipped¶
Robot Side¶
- ~99 MCP tools across per-verb manager classes: topic, service, node, param, action, lifecycle, pkg, component, ros2_control, daemon, multicast, doctor, extensions, interface, bag, dds/env, diagnostics, tf, process, system, watcher, behavior_tree, teleop, docker
- Permission model: every tool tagged
write=True/False. Mirrored on the Android side viaMcpToolCatalog.WRITE_TOOLS; CI parity test keeps the two in sync. - Shell-backed managers route through a single
shell_runnerwith bounded timeouts and structured errors - rclpy-backed managers share one node + multi-threaded executor on a background thread
- MCP resources:
ros://system/info,ros://topics/{name}/schema,ros://nodes/{name}/info - Image handling: CompressedImage/Image → resize to 512×512 → base64 JPEG
- Safety: rate limiting (token bucket), subprocess timeouts, structured errors (
NotFound,Timeout,InvalidArgument,CliNotAvailable) - SSE streaming endpoint (
GET /stream?topic=…)
Android Side¶
- MCP client (HTTP Streamable transport,
McpClient) - AI provider abstraction (
AiClientinterface) - Claude API client with streaming responses
- Tool-call proxy loop (
AiProxyLoop— AI → phone → MCP → phone → AI) - Chat UI: message list, text input, streaming text display
- Tool call progress indicators (collapsible cards)
- Write operation confirmation dialogs
- Markdown rendering in chat messages
- Voice input (SpeechRecognizer)
- Image attachment (camera capture + gallery pick)
- Chat history persistence (Room, per-robot)
- Connection management (add/edit/delete robots, saved in Room)
Gate¶
- User connects → text → AI calls MCP tools → diagnosis
- Voice input works
- Image attachment works
- Write operations show confirmation dialog
- Chat history persists across app restarts
Phase 1.5: Multi-provider + parallel + watchers + UX polish — Shipped¶
Robot side (connect)¶
-
WatcherManager— four long-running tools that observe ROS events for up to 60 s: ros_watch_topic(topic, duration, condition?, max_events)— subscribes, optionally stops early when a condition likepose.x > 5matchesros_watch_diagnostics(duration, level_filter?)— captures /diagnostics with level filteringros_watch_node(node, duration, poll_interval)— detects appearance/disappearance (crash/restart)ros_wait_for_topic(topic, timeout)— blocks until a topic appears (bringup debugging)- Safe condition expression parser (no eval, dotted paths, comparison operators)
- Write tool registry kept in sync with Android
McpToolCatalog.WRITE_TOOLSvia drift test
Android side¶
- Parallel tool execution in
AiProxyLoop. Read-only tools run withasync+awaitAll; writes still sequentialise, each gated by per-tool-idApprovalsdeferred - Multi-provider:
OpenAiClient,GeminiClient,OllamaClientin addition toClaudeClient. Each handles its own streaming wire format: - Anthropic: SSE +
input_json_deltapiecewise tool args - OpenAI: SSE + piecewise
tool_calls.function.arguments - Gemini: SSE + atomic
functionCallparts - Ollama: NDJSON + atomic
tool_calls - Provider-agnostic history replay (
buildHistoryreplaysToolUse/ToolResultso stateless providers work on session resume) -
McpClientSSE parser fix — multi-linedata:continuations, proper unwrap viajsonPrimitive.content. Dashboard and Topics parsers made envelope-tolerant. -
VerbosityLevel(Terse / Normal / Detailed) setting injected into system prompt. System prompt rewritten: technical tone, no emojis, tables for structured data, parallel reads preferred - Hold-to-talk mic with
SpeechRecognizer+ live partial transcripts + auto-send on release - UI overhaul — graphite + green dark palette, softer accents, typographic hierarchy, suggestion chips, quick-action pills above the input bar, compact tool cards with expand-for-input
- Settings redesigned — bottom-sheet API key entry, per-provider model pickers via top-bar chip, verbosity selector, Ollama URL field
Gate¶
- Switch provider in top-bar chip → next chat uses the new provider without restart
- AI asks multiple read questions in one turn → connect sees concurrent requests
- Multiple write tool requests queue per-tool approval dialogs
- Hold mic, speak, release → message auto-sends
- Dashboard and Topics render real data (regression from v1 parser bug)
Phase 2: Browse hub + Dashboard + Logs + TF + Processes — Shipped¶
Android Side¶
- Dashboard ("Robot" tab) — sectioned, honest. Identity / Graph / Liveness / Diagnostics / DDS health (opt-in probe). No fabricated traffic-light verdicts; every value is a fact. See
ui/dashboard/. - Browse hub ("ROS" tab) — single hub for ten entity families:
- Topics — list + detail with live JSON tree + rolling Hz/bw via
ros_read_topicpolling loop - Nodes —
ros_inspect_node, tap pubs/subs/srvs to drill in - Services — request/response schema tree
- Actions — goal/result/feedback schema tree
- Lifecycle — state badges + available transitions
- Parameters — per-node parameter tree
- Components — loaded plugins per container
- Logs (
/rosout) — live SSE + recent history, level/node/grep filters, severity stripe, auto-scroll-lock with "↓ N new" pill. Per-node logs reachable from Node Detail. - TF — depth-flattened tree from
ros_tf_frames, broadcaster + rate per row, orphan-frame warning; Frame Detail with live 2 Hzros_tf_lookup. - Processes — system-wide ROS-related ps view via
ros_list_system_processes; sortable by cpu/mem/pid/name; stats card + stdout tail for Scry-launched. - All list screens share
EntityListScaffold(search + sort + pin + refresh) - Pinned items per
(kind, robotId)persist inSecurePrefs.pinnedItems - "Ask Scry about this" hand-off → seeded chat (
Routes.chatWithSeed) - Dashboard diagnostic warnings tappable → seeded "Investigate" chat
- Connect SSE rewrite: persistent subscription, real ROS Hz/bw/count, image throttle, /clock+/tf throttle
Gate¶
- Dashboard shows real robot health data
- Topic browser lists all topics with search, sort, pinning, hidden toggle
- Topic detail shows live messages updating in real-time
- JSON tree view renders nested message structures with copy-path
- "Ask Scry about /topic" round-trip works end-to-end
Phase 3: Rich rendering + live mini-panels + monitors + fleet — Shipped¶
Originally scoped as "camera + plot" — grew into a full rich-renderer subsystem because the AI's natural output for spatial / temporal / multi-field data is unintelligible as raw JSON. Driving principle: trust the renderer. The AI's prose adds context and anomaly callouts; the card carries the data.
Rich-renderer subsystem (ui/chat/rich/)¶
- RichDispatcher — routes tool results to inline blocks by
render_hintor by tool name. SeeRichRenderer.kt. - GroupedList — namespace-grouped topic / node / service / process lists with pub/sub badges,
0 subflagged red - Tree — TF tree with per-edge rate badges
- EntityCard — TF lookup, node inspection, parameter description
- StatusBanner — health / doctor / diagnostics with OK / WARN / ERROR tone
- Metric card — single big number + sparkline, tone-coloured by threshold (Hz / bw / delay)
- LineChart — multi-series rolling chart from
ros_read_topic count≥2orros_watch_topic - LogViewer — level chips, search, virtualised list (for
ros_get_recent_logs) - SceneSnapshot — composed top-down view: occupancy grid + base_link silhouette + scan dots + path overlay + scale bar (from
ros_read_scene) - GpsView — OSM map + marker + trail (for
sensor_msgs/NavSatFix) - ImagePreview — FeedTile chrome (topic chip, dim/format chip, tap-to-zoom)
- SensorPanel — same renderer as the Viz tab: Imu attitude / Battery cells / Range cone / MagneticField arrow / Wrench bars / Joy sticks / scalar gauges
- BtTreeView — full behaviour-tree map inline (auto-fit, kind-coloured nodes, status-coloured borders). Re-uses Viz-tab
BtSnapshotCanvasfor parity. - ConfirmationCard — auto-rendered when AI announces a write (per the
writesskill); shows args with diff against current value forros_set_parameter - JsonTreeView fallback
Live mini-panels (render_panel)¶
- Phone-side meta tool
render_panel(topic, kind, duration_s, fields)— embed 1–30 s SSE-driven mini-panel into chat -
kind ∈ {sensor, plot, scene, gps, camera}; plot takesfields: [dot-path]for multi-series overlay -
render_scene_live(map_topic?, pose_topic?, scan_topic?, path_topic?, duration_s)— composed live scene (parallel SSE per layer, single canvas) - Settles into Frozen final frame or Failed("no messages received")
Diagnostic plans (emit_plan)¶
- Phone-side meta tool for any debugging request needing ≥3 tool calls
- Checklist of
{label, tool, status, outcome?}rendered as aPlanBlock - Re-emit with updated status +
verdictonce concluded
Background monitors (monitor_threshold / cancel_monitor)¶
- Edge-triggered watch on a topic field — alerts fire only on entry to tripped state, not continuously
- App-scoped
MonitorRegistry(Hilt singleton,SupervisorJob + Dispatchers.Default) drives one SSE subscription per active monitor - Alerts post into chat as assistant messages via
ChatRepository.append - MonitorChipStrip — sticky strip between header and chat showing every active monitor with cancel button
- Returns
idfor explicit cancellation
Fleet + comparison¶
-
fleet_overview— pings every saved robot in parallel viaMCP.healthCheckTimed; renders FleetOverviewBlock with online dot, ping ms, per-robot summary -
compare_robots(left_name, right_name, dimension, rows)— RobotComparisonBlock with two-column metric grid + diff-tinted right column - Robot switcher — chat header robot-name row is tappable → DropdownMenu lists every saved robot with the active one highlighted; switching swaps
ActiveRobotStoreand rebinds the session
Inline visualisation parity¶
- Camera feeds, IMU, scans, GPS, battery, BT — all render inline in chat the same way they render on the Viz tab; same sensor renderers, same scene canvas, same BT canvas
-
scry://viz?section=…&topic=…deep-links from prose into the appropriate Viz section - Suggestion chips on empty chat surface every Phase 2/3 capability (39 prompts in
assets/prompts/suggestions.txt)
Anomaly callouts¶
- Auto-overlays on sensor cards: Battery (low <20 %, critical <10 %), Range (out of bounds), Imu (>3g critical), MagneticField (outside 10–100 µT), Wrench (50 N / 5 Nm envelope), scalar gauges (per
ScalarConfig.spec) - Same overlays apply identically on Viz tab and in chat (
anomalyFor/chatAnomalyFor)
Skills + Tier-0 prompt¶
-
presentation.mdskill — reference for matching tool output to renderers; trimmed to ~1.5 K tokens -
writes.mdskill — write-op announcement protocol - System prompt updated with the 6 new core meta-tools
Gate¶
- Tool result → rich block render works for every block class
-
render_panel/render_scene_livesettle withinduration_s + 1 -
monitor_thresholdsurvives app backgrounding; alert appears as chat message on trip - Switching robot from chat header rebinds session without crash
- Anomaly badges appear identically in chat and Viz tab
Phase 4: Reliability + onboarding + packaging — In progress¶
Android Side¶
- Multi-robot quick switch — chat header dropdown (shipped as part of Phase 3)
- All four AI providers wired and shipping (Claude, OpenAI, Gemini, Ollama — Phase 1.5)
- Settings screen with credential entry per provider
- Error handling: structured network errors, timeouts, reconnection with backoff
- Connection health monitoring (periodic connect ping, auto-reconnect on flap)
- App icon and splash screen
- Onboarding flow (first launch → set up AI provider → connect robot)
- Performance: memory profiling under long-running monitors + scene SSE, battery profiling, cold-start measurement on mid-range Android
- Edge cases: very large messages, missing topics, slow networks, SSE reconnect with backoff
Robot Side¶
- Install script (
robot-setup/install.sh) - Dockerfile (with
--public-internetand--tokenflags wired through compose) - Optional systemd service file (
scry.service)
Live BT status streaming (deferred from Phase 3)¶
-
render_bt_livephone-side tool — subscribe to/behavior_tree_logfor ~10 s and replay status updates into the inline BT canvas. Canvas already acceptsnodeStates; wiring is mechanical. (See note at end of Phase 3 — pattern matchesrender_scene_live.)
Gate¶
- Switch between 2+ robots without crashes (already gated by Phase 3)
- App handles network disconnection gracefully (TODO)
- Settings persist correctly ()
- First-run onboarding completes in <2 min (TODO)
Phase 5: Testing + Release — Pending¶
Testing¶
- Drift detectors —
tests/test_tools_registry.py(core-set parity + write classification),tests/test_skill_tool_references.py(skill → tool reference check, token budget) - Unit tests: ViewModels, protocol parsing, tool proxy logic
- UI tests: chat rendering, navigation, topic list (Compose Test)
- E2E tests: physical device + robot running turtlesim
- Test with multiple robot types (TurtleBot, custom robots)
- Test all MCP tools against real ROS 2 environment
- Test with different DDS implementations (Fast-DDS, CycloneDDS)
- Battery / performance testing on mid-range Android device
- Security audit pass — see
docs/SECURITY_AUDIT.md(all C/H closed, M-2/M-3 deferred with explicit trade-offs)
Release Prep¶
- Play Store listing (screenshots, description, icon)
- PyPI package for scry-connect
- User documentation (README, quick start guide)
- Beta distribution (internal testing track on Play Store)
Gate¶
- All tests pass
- Beta tested with 3+ different robot setups
- Play Store review submitted
- scry-connect published on PyPI
Key Test Scenarios¶
- Happy path: Connect → ask question → get diagnosis with tool calls
- Network interruption: Robot disconnects mid-chat → graceful recovery
- High-frequency topic: Subscribe to 100 Hz IMU → throttling works, no OOM
- Large messages: PointCloud2 or large image → graceful handling
- Camera + chat: Streaming camera while chatting → no blocking
- Write confirmation: AI proposes publish / lifecycle transition / controller switch → user approves/denies → correct behaviour
- Multi-robot: Connect to 2 robots, switch from chat header → correct context isolation
- Long conversation: 50+ messages → tiered-context system keeps prompt under budget
- Ollama local: Works without internet connectivity
- Different DDS: Works with Fast-DDS, CycloneDDS, Zenoh —
ros_get_dds_envsurfaces the right variables per middleware - ros2_control fleet: Ask "which controllers are active?" → connect returns manager state without needing shell access
- Lifecycle flow: AI proposes
configure → activateon a Nav2 node → user approves each transition individually - Live mini-panel: "Plot /cmd_vel linear.x and angular.z for 5 s" → multi-series chart embedded in chat
- Live scene: "Show the robot moving on the map for 10 s" → composed top-down view (map + pose + scan + path) embedded in chat
- Monitor: "Tell me if /battery drops below 20 %" → monitor strip appears, alert fires on trip, auto-disarms while still tripped
- Fleet overview: "How's the fleet?" with 2+ saved robots → online dot + ping ms per robot