Commit graph

6 commits

Author SHA1 Message Date
Derick Phan
1bc695f508
Harden network and privacy defaults; fix several bugs
Security:
- Bind HTTP gateway to 127.0.0.1 by default; add --bind for LAN opt-in
- Restrict Reticulum mesh surface to GET /api/sites only (CSRF cannot
  authenticate mesh callers, so gate by whitelist)
- Cap request body size at 16 MiB to prevent memory DoS
- Redact /bookmark query strings from request logs so the bookmark token
  and URLs do not land in stdout / docker / journal logs
- Tighten FTS5 sanitizer: strip colon, drop AND/OR/NOT/NEAR operator words
- Expand .dockerignore; document trust model in README

Features:
- Add sharing mode toggle (share everything except private vs share only
  public-tagged) with /share/preview so users can see what subscribers
  would receive before enabling sharing

Bugs:
- handle_export() crashed on every call (missing query kwarg)
- Dead float16 decompression branch in embeddings.py silently corrupted
  the HNSW index when compress_embeddings was on
- GATEWAY_PORT staleness: --port and find_available_port had no effect
  on the actual bind
- semantic_search default mismatched between db.py ("1") and the rest of
  the app ("0"), causing embeddings to be generated when the UI said off
- Connection pool returned connections with uncommitted transactions to
  the next consumer
- Gateway POST body decode 502'd on non-UTF-8 input
- ensure_rns_config clobbered user-edited ~/.reticulum/config; now only
  rewrites files it authored (sentinel-tagged)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:37:45 -07:00
lichenblankie
8ecb963be4 Optimized storage and updated readme
All checks were successful
/ build (push) Successful in 2m19s
2026-04-11 21:59:55 +00:00
Test User
57a79e5e8e Add PyInstaller builds, AGPLv3 license, transport node selection, and rmap.world link
- Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds
- Add AGPLv3 license
- Move data storage to ~/.tinyweb/
- Add --version and --port CLI flags
- Add transport node selection in /style (smart regeneration preserves Reticulum config)
- Add discover more nodes link to rmap.world
2026-04-08 04:36:28 +00:00
Derick Phan
c959ee98ae
Make semantic search and reranking optional, use site meta descriptions for snippets
- Add semantic_search setting to toggle AI-powered search on/off
- Skip embedding generation, hybrid search, and model preloading when disabled
- Use site owner's meta description as snippet instead of heuristic extraction
- Remove _generate_summary() and snippet() - no more generated snippets
- Show reranker/reindex controls grayed out when semantic search is off
- AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 20:58:04 -07:00
Derick Phan
fd20454fa4
Fix reindex to re-embed all pages and preserve existing summaries
Previously reindex skipped pages that already had chunks, leaving stale
embeddings in place. It also overwrote good meta description summaries
with auto-generated ones. Now it clears all chunks first so everything
is re-embedded, and only generates summaries for pages missing one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 14:08:04 -07:00
Derick Phan
395fc17092
Add hybrid semantic search with optional cross-encoder reranking
Implements a three-stage search pipeline:
1. BM25 keyword search via FTS5 with column weights
2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index
3. Optional cross-encoder reranking (on by default, toggleable in settings)

Top 20 results are reranked for precision, next 10 appended from RRF
for coverage, giving 30 total results across 3 pages.

- New embeddings.py with ONNX Runtime inference, text chunking, HNSW
  index management, RRF fusion, and cross-encoder reranking
- Meta description extraction for authentic page snippets with centroid
  extractive fallback
- Stopword filtering in FTS5 queries to avoid overly strict matching
- /reindex page for batch embedding of existing pages
- Semantic embedding of remote pages during subscription sync
- ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy)
- Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 03:24:41 -07:00