Add CLAUDE.md with project architecture and conventions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Derick Phan 2026-03-26 08:17:38 -07:00
parent e0a12272ed
commit 4df0ef03f5
No known key found for this signature in database

43
CLAUDE.md Normal file
View file

@ -0,0 +1,43 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What is TinyWeb
A personal, decentralized search engine built on the Reticulum mesh network. Users curate and search their own index of web pages, share collections over an encrypted mesh, and subscribe to friends' indexes. No algorithms, no tracking.
## Running
```bash
pip install -r requirements.txt
python app.py # Starts RNS server + HTTP gateway on 0.0.0.0:8080
python gateway.py <hash> # Run as HTTP gateway to a remote TinyWeb instance
```
There are no tests, linter, or build step.
## Architecture
Three entry points form a pipeline:
- **app.py** — Boots Reticulum, loads/creates identity from `tinyweb_identity`, announces on mesh, starts HTTP gateway as a daemon thread, then loops handling RNS requests.
- **gateway.py**`BaseHTTPRequestHandler` that translates HTTP GET/POST into a request dict and dispatches it. When `local_dispatch` is set (the default when launched from app.py), it calls handlers directly; otherwise it sends requests over a Reticulum link.
- **handlers.py** — Central router (`handle_request`) that pattern-matches the path and calls the appropriate handler. Every handler returns `{"status", "content_type", "body", "headers"}`.
## Database (db.py → index.db)
SQLite with FTS5. Schema is initialized and migrated in `init_db()` on every startup.
Key tables: `pages` (indexed URLs), `links` (extracted same-domain links), `tags`/`page_tags` (many-to-many tagging), `pages_fts` (full-text search via triggers), `subscriptions` (remote instances), `remote_pages`/`remote_pages_fts` (synced content).
`get_db()` opens a fresh connection each call — no connection pooling.
## Patterns to follow
- All HTML output is built as inline strings in handlers.py; there is no template engine. Use `templates.wrap_page(title, body_html)` to wrap content with boilerplate and custom CSS.
- Use `esc()` (html.escape) for all user-supplied content rendered in HTML.
- Handlers receive `(path_segment, ...)` args extracted by the router and return a response dict.
- Tags are stored in a join table; orphaned rows in `tags` can accumulate — always query through `page_tags` for accurate counts.
- Link extraction (`extract_links`) only follows same-domain URLs and skips binary file extensions and Wikipedia special pages.
- URL cleanup: fragments are stripped, tracking params (utm_*, fbclid, gclid, etc.) are removed before storing.
- Settings are stored as key-value pairs in the `settings` table; access via `get_setting(key, default)`.