- Bulk delete and retag from browse page with checkboxes
- Select all / deselect all toggle
- Delete confirmation shows count of selected pages
- Auto-cleanup orphaned tags on delete, edit, and bulk actions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace Google Fonts with system font stacks across all themes
- Add Referrer-Policy, X-Content-Type-Options, X-Frame-Options, CSP headers
- Add rel="noreferrer noopener" on all outbound links
- Add no-referrer and dns-prefetch-control meta tags to all themes
- Clean tracking params on outbound links from trusted/remote sources
- Remove Google domains from CSP whitelists
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set min-height: 100vh on html/body so the cursor-bearing elements
fill the viewport even when content is short.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds pagination, meta, and success message styles, plus input
selectors for new form fields (edit page, manual entry, transport node).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds
- Add AGPLv3 license
- Move data storage to ~/.tinyweb/
- Add --version and --port CLI flags
- Add transport node selection in /style (smart regeneration preserves Reticulum config)
- Add discover more nodes link to rmap.world
- Add semantic_search setting to toggle AI-powered search on/off
- Skip embedding generation, hybrid search, and model preloading when disabled
- Use site owner's meta description as snippet instead of heuristic extraction
- Remove _generate_summary() and snippet() - no more generated snippets
- Show reranker/reindex controls grayed out when semantic search is off
- AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Case-insensitive meta description extraction (fixes sites like Lemmy
with capitalized "Description" meta name)
- Strip aside and noscript tags for cleaner body text
- Extract paragraph text separately for better sentence quality
- Prefer sentences mentioning the site name, then first quality
paragraph, then title as fallback
- Skip meta descriptions under 20 chars (e.g. just "Lemmy")
- Remove embedding/centroid dependency from summary generation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lemmy and other JS-heavy sites include noscript fallback text like
"Javascript is disabled" that pollutes the stored body text and
generated snippets/summaries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously reindex skipped pages that already had chunks, leaving stale
embeddings in place. It also overwrote good meta description summaries
with auto-generated ones. Now it clears all chunks first so everything
is re-embedded, and only generates summaries for pages missing one.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements a three-stage search pipeline:
1. BM25 keyword search via FTS5 with column weights
2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index
3. Optional cross-encoder reranking (on by default, toggleable in settings)
Top 20 results are reranked for precision, next 10 appended from RRF
for coverage, giving 30 total results across 3 pages.
- New embeddings.py with ONNX Runtime inference, text chunking, HNSW
index management, RRF fusion, and cross-encoder reranking
- Meta description extraction for authentic page snippets with centroid
extractive fallback
- Stopword filtering in FTS5 queries to avoid overly strict matching
- /reindex page for batch embedding of existing pages
- Semantic embedding of remote pages during subscription sync
- ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy)
- Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>