diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..98ade52 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,43 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What is TinyWeb + +A personal, decentralized search engine built on the Reticulum mesh network. Users curate and search their own index of web pages, share collections over an encrypted mesh, and subscribe to friends' indexes. No algorithms, no tracking. + +## Running + +```bash +pip install -r requirements.txt +python app.py # Starts RNS server + HTTP gateway on 0.0.0.0:8080 +python gateway.py # Run as HTTP gateway to a remote TinyWeb instance +``` + +There are no tests, linter, or build step. + +## Architecture + +Three entry points form a pipeline: + +- **app.py** — Boots Reticulum, loads/creates identity from `tinyweb_identity`, announces on mesh, starts HTTP gateway as a daemon thread, then loops handling RNS requests. +- **gateway.py** — `BaseHTTPRequestHandler` that translates HTTP GET/POST into a request dict and dispatches it. When `local_dispatch` is set (the default when launched from app.py), it calls handlers directly; otherwise it sends requests over a Reticulum link. +- **handlers.py** — Central router (`handle_request`) that pattern-matches the path and calls the appropriate handler. Every handler returns `{"status", "content_type", "body", "headers"}`. + +## Database (db.py → index.db) + +SQLite with FTS5. Schema is initialized and migrated in `init_db()` on every startup. + +Key tables: `pages` (indexed URLs), `links` (extracted same-domain links), `tags`/`page_tags` (many-to-many tagging), `pages_fts` (full-text search via triggers), `subscriptions` (remote instances), `remote_pages`/`remote_pages_fts` (synced content). + +`get_db()` opens a fresh connection each call — no connection pooling. + +## Patterns to follow + +- All HTML output is built as inline strings in handlers.py; there is no template engine. Use `templates.wrap_page(title, body_html)` to wrap content with boilerplate and custom CSS. +- Use `esc()` (html.escape) for all user-supplied content rendered in HTML. +- Handlers receive `(path_segment, ...)` args extracted by the router and return a response dict. +- Tags are stored in a join table; orphaned rows in `tags` can accumulate — always query through `page_tags` for accurate counts. +- Link extraction (`extract_links`) only follows same-domain URLs and skips binary file extensions and Wikipedia special pages. +- URL cleanup: fragments are stripped, tracking params (utm_*, fbclid, gclid, etc.) are removed before storing. +- Settings are stored as key-value pairs in the `settings` table; access via `get_setting(key, default)`.