tinyweb

Author	SHA1	Message	Date
lichenblankie	5b32d69863	added PyInstaller builds, AGPLv3, transport config - Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds - Add AGPLv3 license - Move data storage to ~/.tinyweb/ - Add --version and --port CLI flags - Add transport node selection in /style (smart regeneration preserves Reticulum config) - Add discover more nodes link to rmap.world	2026-06-05 05:29:36 +00:00
lichenblankie	9bc5abd32f	made semantic search optional, use meta snippets - Add semantic_search setting to toggle AI-powered search on/off - Skip embedding generation, hybrid search, and model preloading when disabled - Use site owner's meta description as snippet instead of heuristic extraction - Remove _generate_summary() and snippet() - no more generated snippets - Show reranker/reindex controls grayed out when semantic search is off - AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional	2026-06-05 05:29:35 +00:00
lichenblankie	3f8ebdab1d	fixed reindex, preserved summaries Previously reindex skipped pages that already had chunks, leaving stale embeddings in place. It also overwrote good meta description summaries with auto-generated ones. Now it clears all chunks first so everything is re-embedded, and only generates summaries for pages missing one.	2026-06-05 05:29:35 +00:00
lichenblankie	5ded9f1339	added hybrid semantic search with reranking Implements a three-stage search pipeline: 1. BM25 keyword search via FTS5 with column weights 2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index 3. Optional cross-encoder reranking (on by default, toggleable in settings) Top 20 results are reranked for precision, next 10 appended from RRF for coverage, giving 30 total results across 3 pages. - New embeddings.py with ONNX Runtime inference, text chunking, HNSW index management, RRF fusion, and cross-encoder reranking - Meta description extraction for authentic page snippets with centroid extractive fallback - Stopword filtering in FTS5 queries to avoid overly strict matching - /reindex page for batch embedding of existing pages - Semantic embedding of remote pages during subscription sync - ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy) - Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)	2026-06-05 05:29:35 +00:00

4 commits