tinyweb

Author	SHA1	Message	Date
lichenblankie	8205db9bc3	tightened network defaults, squashed bugs Security: - Bind HTTP gateway to 127.0.0.1 by default; add --bind for LAN opt-in - Restrict Reticulum mesh surface to GET /api/sites only (CSRF cannot authenticate mesh callers, so gate by whitelist) - Cap request body size at 16 MiB to prevent memory DoS - Redact /bookmark query strings from request logs so the bookmark token and URLs do not land in stdout / docker / journal logs - Tighten FTS5 sanitizer: strip colon, drop AND/OR/NOT/NEAR operator words - Expand .dockerignore; document trust model in README Features: - Add sharing mode toggle (share everything except private vs share only public-tagged) with /share/preview so users can see what subscribers would receive before enabling sharing Bugs: - handle_export() crashed on every call (missing query kwarg) - Dead float16 decompression branch in embeddings.py silently corrupted the HNSW index when compress_embeddings was on - GATEWAY_PORT staleness: --port and find_available_port had no effect on the actual bind - semantic_search default mismatched between db.py ("1") and the rest of the app ("0"), causing embeddings to be generated when the UI said off - Connection pool returned connections with uncommitted transactions to the next consumer - Gateway POST body decode 502'd on non-UTF-8 input - ensure_rns_config clobbered user-edited ~/.reticulum/config; now only rewrites files it authored (sentinel-tagged)	2026-06-05 05:29:36 +00:00
lichenblankie	30bc61212f	optimized storage, updated readme	2026-06-05 05:29:36 +00:00
lichenblankie	5b32d69863	added PyInstaller builds, AGPLv3, transport config - Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds - Add AGPLv3 license - Move data storage to ~/.tinyweb/ - Add --version and --port CLI flags - Add transport node selection in /style (smart regeneration preserves Reticulum config) - Add discover more nodes link to rmap.world	2026-06-05 05:29:36 +00:00
lichenblankie	9bc5abd32f	made semantic search optional, use meta snippets - Add semantic_search setting to toggle AI-powered search on/off - Skip embedding generation, hybrid search, and model preloading when disabled - Use site owner's meta description as snippet instead of heuristic extraction - Remove _generate_summary() and snippet() - no more generated snippets - Show reranker/reindex controls grayed out when semantic search is off - AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional	2026-06-05 05:29:35 +00:00
lichenblankie	3f8ebdab1d	fixed reindex, preserved summaries Previously reindex skipped pages that already had chunks, leaving stale embeddings in place. It also overwrote good meta description summaries with auto-generated ones. Now it clears all chunks first so everything is re-embedded, and only generates summaries for pages missing one.	2026-06-05 05:29:35 +00:00
lichenblankie	5ded9f1339	added hybrid semantic search with reranking Implements a three-stage search pipeline: 1. BM25 keyword search via FTS5 with column weights 2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index 3. Optional cross-encoder reranking (on by default, toggleable in settings) Top 20 results are reranked for precision, next 10 appended from RRF for coverage, giving 30 total results across 3 pages. - New embeddings.py with ONNX Runtime inference, text chunking, HNSW index management, RRF fusion, and cross-encoder reranking - Meta description extraction for authentic page snippets with centroid extractive fallback - Stopword filtering in FTS5 queries to avoid overly strict matching - /reindex page for batch embedding of existing pages - Semantic embedding of remote pages during subscription sync - ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy) - Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)	2026-06-05 05:29:35 +00:00

6 commits