- Bulk delete and retag from browse page with checkboxes
- Select all / deselect all toggle
- Delete confirmation shows count of selected pages
- Auto-cleanup orphaned tags on delete, edit, and bulk actions
- Replace Google Fonts with system font stacks across all themes
- Add Referrer-Policy, X-Content-Type-Options, X-Frame-Options, CSP headers
- Add rel="noreferrer noopener" on all outbound links
- Add no-referrer and dns-prefetch-control meta tags to all themes
- Clean tracking params on outbound links from trusted/remote sources
- Remove Google domains from CSP whitelists
- Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds
- Add AGPLv3 license
- Move data storage to ~/.tinyweb/
- Add --version and --port CLI flags
- Add transport node selection in /style (smart regeneration preserves Reticulum config)
- Add discover more nodes link to rmap.world
- Add semantic_search setting to toggle AI-powered search on/off
- Skip embedding generation, hybrid search, and model preloading when disabled
- Use site owner's meta description as snippet instead of heuristic extraction
- Remove _generate_summary() and snippet() - no more generated snippets
- Show reranker/reindex controls grayed out when semantic search is off
- AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional
Implements a three-stage search pipeline:
1. BM25 keyword search via FTS5 with column weights
2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index
3. Optional cross-encoder reranking (on by default, toggleable in settings)
Top 20 results are reranked for precision, next 10 appended from RRF
for coverage, giving 30 total results across 3 pages.
- New embeddings.py with ONNX Runtime inference, text chunking, HNSW
index management, RRF fusion, and cross-encoder reranking
- Meta description extraction for authentic page snippets with centroid
extractive fallback
- Stopword filtering in FTS5 queries to avoid overly strict matching
- /reindex page for batch embedding of existing pages
- Semantic embedding of remote pages during subscription sync
- ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy)
- Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)
Browser textarea submissions convert \n to \r\n, causing the template
comparison against DEFAULT_TEMPLATE to always fail. This saved the bare
skeleton as a custom template, overriding the default navbar.
WAL + pooling:
- Enable WAL journal mode for concurrent read/write support
- Add connection pool (size 4) with return_db() to reuse connections
instead of opening/closing on every request
Pagination:
- Search results, /pages, and /tags/<name> now paginate at 50 per page
- Prev/next navigation links appear when results exceed one page
Delta sync:
- Pages table gains last_modified timestamp, set on insert/update
- /api/sites accepts ?since= param to return only changed pages
- Subscription sync uses last_sync timestamp for incremental fetches
- Remote pages upserted instead of delete-all/re-insert
- Full sync includes all_urls list for detecting remote deletions
- SSRF: disable automatic redirects, manually follow up to 5 hops with
IP re-validation at each step to prevent redirect-to-localhost bypass
- Identity file: enforce 0600 permissions on tinyweb_identity at load
and creation to prevent other users from reading the private key
- Error messages: replace raw exception strings with generic messages
to avoid leaking internal paths/hostnames to the UI
- DB connections: wrap all get_db() usage in try/finally to guarantee
close() even when handlers throw mid-operation
- Bookmark endpoint now requires a secret token (stored in settings)
- Style reset moved from GET to POST with CSRF protection
- Open redirect prevention in _redirect() helper
- Import capped at 100 URLs to prevent abuse
- page_tags cleaned up on delete + PRAGMA foreign_keys enabled
- CSP, X-Frame-Options, X-Content-Type-Options on all responses
- CSRF tokens now per-session via double-submit cookie pattern
- Tag names URL-decoded for special characters
- Gateway forwards cookies in request data
- CSRF: Generate random token at startup, include as hidden field in
all 11 POST forms, validate at top of POST dispatch (returns 403)
- SSRF: Block private/internal IP ranges (127/8, 10/8, 172.16/12,
192.168/16, 169.254/16, ::1, fc00::/7) by resolving hostname before
fetch. Remove verify=False from requests.get().
- DELETE: Change /delete/<id> from GET (instant delete) to GET
(confirmation page) + POST (actual delete) to prevent accidental
deletion from prefetchers/crawlers.
- FTS5: Wrap search input in double quotes to neutralize FTS5
operators (AND, OR, NOT, *, column:). Add try/except fallback.
- Replace CSS-only customization with full HTML template editing
- Users edit the entire page wrapper with {{content}} placeholder
- Add /style?reset escape hatch to recover from broken templates
- Move nav links to template, remove redundant nav from search page
- Delete remote pages when unsubscribing from an instance
Shows instance stats, destination hash for subscribing, and explains
the slow web movement and how TinyWeb works. Destination hash is
stored in settings on startup so the about page can display it.
URLs are cleaned of tracking parameters (utm_*, fbclid, gclid, etc.)
before indexing. Tags can be added when saving or editing pages,
browsed at /tags, and are included in search results. Tags are shared
via /api/sites and preserved when syncing/importing from subscriptions.
app.py now auto-starts the gateway HTTP server in a daemon thread,
so users only need `python app.py` to get everything running. The
gateway calls dispatch_request directly when co-located (local mode)
instead of trying to establish an RNS link to itself. Bookmarklet
hardcoded to localhost:8080. gateway.py still works standalone for
connecting to remote instances.
- Subscriptions now use Reticulum destination hashes instead of HTTP URLs
- All subscription syncing happens over encrypted RNS links (rns_client.py)
- Add remote_pages table for synced content from subscriptions
- Search results now include pages from synced subscriptions, grouped by source
- Remove HTTP dependency from subscription handlers
Replace HTTP server with Reticulum-native architecture. The server
now speaks only Reticulum, with a client-side gateway providing
browser access by translating HTTP to/from RNS requests.
- Extract db layer (db.py), templates (templates.py), handlers (handlers.py)
- app.py is now the RNS server with persistent identity and destination
- gateway.py bridges HTTP on localhost:8080 to RNS link requests
- Add rns dependency, add .gitignore