tinyweb/CLAUDE.md
Derick Phan 4df0ef03f5
Add CLAUDE.md with project architecture and conventions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 08:17:38 -07:00

2.6 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What is TinyWeb

A personal, decentralized search engine built on the Reticulum mesh network. Users curate and search their own index of web pages, share collections over an encrypted mesh, and subscribe to friends' indexes. No algorithms, no tracking.

Running

pip install -r requirements.txt
python app.py          # Starts RNS server + HTTP gateway on 0.0.0.0:8080
python gateway.py <hash>  # Run as HTTP gateway to a remote TinyWeb instance

There are no tests, linter, or build step.

Architecture

Three entry points form a pipeline:

  • app.py — Boots Reticulum, loads/creates identity from tinyweb_identity, announces on mesh, starts HTTP gateway as a daemon thread, then loops handling RNS requests.
  • gateway.pyBaseHTTPRequestHandler that translates HTTP GET/POST into a request dict and dispatches it. When local_dispatch is set (the default when launched from app.py), it calls handlers directly; otherwise it sends requests over a Reticulum link.
  • handlers.py — Central router (handle_request) that pattern-matches the path and calls the appropriate handler. Every handler returns {"status", "content_type", "body", "headers"}.

Database (db.py → index.db)

SQLite with FTS5. Schema is initialized and migrated in init_db() on every startup.

Key tables: pages (indexed URLs), links (extracted same-domain links), tags/page_tags (many-to-many tagging), pages_fts (full-text search via triggers), subscriptions (remote instances), remote_pages/remote_pages_fts (synced content).

get_db() opens a fresh connection each call — no connection pooling.

Patterns to follow

  • All HTML output is built as inline strings in handlers.py; there is no template engine. Use templates.wrap_page(title, body_html) to wrap content with boilerplate and custom CSS.
  • Use esc() (html.escape) for all user-supplied content rendered in HTML.
  • Handlers receive (path_segment, ...) args extracted by the router and return a response dict.
  • Tags are stored in a join table; orphaned rows in tags can accumulate — always query through page_tags for accurate counts.
  • Link extraction (extract_links) only follows same-domain URLs and skips binary file extensions and Wikipedia special pages.
  • URL cleanup: fragments are stripped, tracking params (utm_*, fbclid, gclid, etc.) are removed before storing.
  • Settings are stored as key-value pairs in the settings table; access via get_setting(key, default).