tinyweb

Author	SHA1	Message	Date
lichenblankie	8205db9bc3	tightened network defaults, squashed bugs Security: - Bind HTTP gateway to 127.0.0.1 by default; add --bind for LAN opt-in - Restrict Reticulum mesh surface to GET /api/sites only (CSRF cannot authenticate mesh callers, so gate by whitelist) - Cap request body size at 16 MiB to prevent memory DoS - Redact /bookmark query strings from request logs so the bookmark token and URLs do not land in stdout / docker / journal logs - Tighten FTS5 sanitizer: strip colon, drop AND/OR/NOT/NEAR operator words - Expand .dockerignore; document trust model in README Features: - Add sharing mode toggle (share everything except private vs share only public-tagged) with /share/preview so users can see what subscribers would receive before enabling sharing Bugs: - handle_export() crashed on every call (missing query kwarg) - Dead float16 decompression branch in embeddings.py silently corrupted the HNSW index when compress_embeddings was on - GATEWAY_PORT staleness: --port and find_available_port had no effect on the actual bind - semantic_search default mismatched between db.py ("1") and the rest of the app ("0"), causing embeddings to be generated when the UI said off - Connection pool returned connections with uncommitted transactions to the next consumer - Gateway POST body decode 502'd on non-UTF-8 input - ensure_rns_config clobbered user-edited ~/.reticulum/config; now only rewrites files it authored (sentinel-tagged)	2026-06-05 05:29:36 +00:00
lichenblankie	2dbbc5a538	fixed edge-case domains	2026-06-05 05:29:36 +00:00
lichenblankie	30bc61212f	optimized storage, updated readme	2026-06-05 05:29:36 +00:00
lichenblankie	5b32d69863	added PyInstaller builds, AGPLv3, transport config - Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds - Add AGPLv3 license - Move data storage to ~/.tinyweb/ - Add --version and --port CLI flags - Add transport node selection in /style (smart regeneration preserves Reticulum config) - Add discover more nodes link to rmap.world	2026-06-05 05:29:36 +00:00
lichenblankie	b112ee3660	added reticulum hash option to add page	2026-06-05 05:29:36 +00:00
lichenblankie	a1358c1f3d	added manual URL entry	2026-06-05 05:29:35 +00:00
lichenblankie	9bc5abd32f	made semantic search optional, use meta snippets - Add semantic_search setting to toggle AI-powered search on/off - Skip embedding generation, hybrid search, and model preloading when disabled - Use site owner's meta description as snippet instead of heuristic extraction - Remove _generate_summary() and snippet() - no more generated snippets - Show reranker/reindex controls grayed out when semantic search is off - AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional	2026-06-05 05:29:35 +00:00
lichenblankie	e72afbb22e	improved snippet extraction (heuristic) - Case-insensitive meta description extraction (fixes sites like Lemmy with capitalized "Description" meta name) - Strip aside and noscript tags for cleaner body text - Extract paragraph text separately for better sentence quality - Prefer sentences mentioning the site name, then first quality paragraph, then title as fallback - Skip meta descriptions under 20 chars (e.g. just "Lemmy") - Remove embedding/centroid dependency from summary generation	2026-06-05 05:29:35 +00:00
lichenblankie	e8915fa381	stripped noscript tags from pages Lemmy and other JS-heavy sites include noscript fallback text like "Javascript is disabled" that pollutes the stored body text and generated snippets/summaries.	2026-06-05 05:29:35 +00:00
lichenblankie	5ded9f1339	added hybrid semantic search with reranking Implements a three-stage search pipeline: 1. BM25 keyword search via FTS5 with column weights 2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index 3. Optional cross-encoder reranking (on by default, toggleable in settings) Top 20 results are reranked for precision, next 10 appended from RRF for coverage, giving 30 total results across 3 pages. - New embeddings.py with ONNX Runtime inference, text chunking, HNSW index management, RRF fusion, and cross-encoder reranking - Meta description extraction for authentic page snippets with centroid extractive fallback - Stopword filtering in FTS5 queries to avoid overly strict matching - /reindex page for batch embedding of existing pages - Semantic embedding of remote pages during subscription sync - ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy) - Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)	2026-06-05 05:29:35 +00:00
lichenblankie	67084bbaed	enabled WAL mode, pooling, pagination WAL + pooling: - Enable WAL journal mode for concurrent read/write support - Add connection pool (size 4) with return_db() to reuse connections instead of opening/closing on every request Pagination: - Search results, /pages, and /tags/<name> now paginate at 50 per page - Prev/next navigation links appear when results exceed one page Delta sync: - Pages table gains last_modified timestamp, set on insert/update - /api/sites accepts ?since= param to return only changed pages - Subscription sync uses last_sync timestamp for incremental fetches - Remote pages upserted instead of delete-all/re-insert - Full sync includes all_urls list for detecting remote deletions	2026-06-05 05:29:35 +00:00
lichenblankie	b574c4b7f5	normalized URLs to prevent dupes clean_url() now canonicalizes: http→https, strips www., removes trailing slashes, drops default ports, and sorts query params. Prevents the same page from being indexed multiple times under different URL variations.	2026-06-05 05:29:35 +00:00
lichenblankie	6d649616ca	fixed index_url page_id mismatch lastrowid returns 0 when ON CONFLICT DO UPDATE fires on an existing row, causing links to not be cleaned up or associated correctly on re-index. Now fetches the actual row ID with a SELECT after upsert. Also adds try/finally for connection safety.	2026-06-05 05:29:35 +00:00
lichenblankie	449174b0ca	fixed SSRF bypass, tightened error handling - SSRF: disable automatic redirects, manually follow up to 5 hops with IP re-validation at each step to prevent redirect-to-localhost bypass - Identity file: enforce 0600 permissions on tinyweb_identity at load and creation to prevent other users from reading the private key - Error messages: replace raw exception strings with generic messages to avoid leaking internal paths/hostnames to the UI - DB connections: wrap all get_db() usage in try/finally to guarantee close() even when handlers throw mid-operation	2026-06-05 05:29:35 +00:00
lichenblankie	4899819597	added bookmark auth, CSP, per-session CSRF - Bookmark endpoint now requires a secret token (stored in settings) - Style reset moved from GET to POST with CSRF protection - Open redirect prevention in _redirect() helper - Import capped at 100 URLs to prevent abuse - page_tags cleaned up on delete + PRAGMA foreign_keys enabled - CSP, X-Frame-Options, X-Content-Type-Options on all responses - CSRF tokens now per-session via double-submit cookie pattern - Tag names URL-decoded for special characters - Gateway forwards cookies in request data	2026-06-05 05:29:35 +00:00
lichenblankie	0981c2e0a9	hardened CSRF, SSRF, FTS5 - CSRF: Generate random token at startup, include as hidden field in all 11 POST forms, validate at top of POST dispatch (returns 403) - SSRF: Block private/internal IP ranges (127/8, 10/8, 172.16/12, 192.168/16, 169.254/16, ::1, fc00::/7) by resolving hostname before fetch. Remove verify=False from requests.get(). - DELETE: Change /delete/<id> from GET (instant delete) to GET (confirmation page) + POST (actual delete) to prevent accidental deletion from prefetchers/crawlers. - FTS5: Wrap search input in double quotes to neutralize FTS5 operators (AND, OR, NOT, *, column:). Add try/except fallback.	2026-06-05 05:29:35 +00:00
lichenblankie	acfa9f6d4f	stripped tracking params, added tags URLs are cleaned of tracking parameters (utm_*, fbclid, gclid, etc.) before indexing. Tags can be added when saving or editing pages, browsed at /tags, and are included in search results. Tags are shared via /api/sites and preserved when syncing/importing from subscriptions.	2026-06-05 05:29:35 +00:00
lichenblankie	7ccaf93404	wired up mesh subscriptions + search - Subscriptions now use Reticulum destination hashes instead of HTTP URLs - All subscription syncing happens over encrypted RNS links (rns_client.py) - Add remote_pages table for synced content from subscriptions - Search results now include pages from synced subscriptions, grouped by source - Remove HTTP dependency from subscription handlers	2026-06-05 05:29:35 +00:00
lichenblankie	4b4e7e8081	ported everything to Reticulum mesh Replace HTTP server with Reticulum-native architecture. The server now speaks only Reticulum, with a client-side gateway providing browser access by translating HTTP to/from RNS requests. - Extract db layer (db.py), templates (templates.py), handlers (handlers.py) - app.py is now the RNS server with persistent identity and destination - gateway.py bridges HTTP on localhost:8080 to RNS link requests - Add rns dependency, add .gitignore	2026-06-05 05:29:35 +00:00

19 commits