Commit graph

59 commits

Author SHA1 Message Date
lichenblankie
46cd28ba54 integrated the forum plugin 2026-06-05 05:29:36 +00:00
lichenblankie
4a0214f020 reworked to distribute via clone, not registry 2026-06-05 05:29:36 +00:00
lichenblankie
495968ae27 fixed Docker socket mount 2026-06-05 05:29:36 +00:00
lichenblankie
f904746b7e switched to host-mode Docker 2026-06-05 05:29:36 +00:00
lichenblankie
30d7b5719a fixed CI: Docker in container 2026-06-05 05:29:36 +00:00
lichenblankie
1e6bada5c8 fixed CI: install jq for release 2026-06-05 05:29:36 +00:00
lichenblankie
038c5a61d7 fixed CI: --break-system-packages 2026-06-05 05:29:36 +00:00
lichenblankie
4fb5f2021c fixed CI: use apt-get for Python 2026-06-05 05:29:36 +00:00
lichenblankie
4d522ce62c added pytest test suite (174 tests)
174 tests covering URL normalization, FTS5 query sanitization, SSRF/CSRF
guards, sharing-mode logic, DB schema and upsert paths, handler
end-to-end flows, and gateway body-size / mesh-whitelist guards. Each
recent bug-fix commit (6ffd38d, 1bc695f, 8dffd8c) has an explicit
regression test in test_regressions.py. One xfail documents a minor
latent bug in clean_url where port 80 is not stripped from upgraded
https URLs.
2026-06-05 05:29:36 +00:00
lichenblankie
55c6619ba3 added data-loss guards + first-run state
- Bulk delete now routes through a server-rendered confirmation page
  listing the selected titles; a `confirmed=1` form field is required
  before pages are actually deleted. Mirrors the single-delete flow.
- Reset-template button gains a JS confirm() so stray clicks don't wipe
  the custom template.
- Homepage shows a short, neutral empty-state block when the index has
  zero pages and no query — just names what tinyweb is and links to
  /add, /style, and /subscriptions as equal options.
- /about gains a "your data" section explaining what lives in
  ~/.tinyweb/ (identity file, index.db), what losing each costs, and
  how /export differs from a full backup.
- README gains a "Backups" subsection mirroring the /about copy.
2026-06-05 05:29:36 +00:00
lichenblankie
8205db9bc3 tightened network defaults, squashed bugs
Security:
- Bind HTTP gateway to 127.0.0.1 by default; add --bind for LAN opt-in
- Restrict Reticulum mesh surface to GET /api/sites only (CSRF cannot
  authenticate mesh callers, so gate by whitelist)
- Cap request body size at 16 MiB to prevent memory DoS
- Redact /bookmark query strings from request logs so the bookmark token
  and URLs do not land in stdout / docker / journal logs
- Tighten FTS5 sanitizer: strip colon, drop AND/OR/NOT/NEAR operator words
- Expand .dockerignore; document trust model in README

Features:
- Add sharing mode toggle (share everything except private vs share only
  public-tagged) with /share/preview so users can see what subscribers
  would receive before enabling sharing

Bugs:
- handle_export() crashed on every call (missing query kwarg)
- Dead float16 decompression branch in embeddings.py silently corrupted
  the HNSW index when compress_embeddings was on
- GATEWAY_PORT staleness: --port and find_available_port had no effect
  on the actual bind
- semantic_search default mismatched between db.py ("1") and the rest of
  the app ("0"), causing embeddings to be generated when the UI said off
- Connection pool returned connections with uncommitted transactions to
  the next consumer
- Gateway POST body decode 502'd on non-UTF-8 input
- ensure_rns_config clobbered user-edited ~/.reticulum/config; now only
  rewrites files it authored (sentinel-tagged)
2026-06-05 05:29:36 +00:00
lichenblankie
e3aadf3947 added LoRa sync with settings UI
- Progressive retry in rns_client.py: fast timeout (15s) then slow (60s+)
  for LoRa/multi-hop links, with automatic fallback
- Background sync threads so subscriptions page returns immediately
  with syncing/error status indicators per subscription
- LoRa RNode configuration in settings page with serial port and
  expandable advanced radio settings (frequency, bandwidth, etc.)
- Internet transport now toggleable alongside LoRa — users can
  enable one, the other, or both
- Reticulum config auto-generated from settings on startup
2026-06-05 05:29:36 +00:00
lichenblankie
2dbbc5a538 fixed edge-case domains 2026-06-05 05:29:36 +00:00
lichenblankie
4064a46c8a added public/private toggle 2026-06-05 05:29:36 +00:00
lichenblankie
30bc61212f optimized storage, updated readme 2026-06-05 05:29:36 +00:00
lichenblankie
7946225030 added Docker setup docs 2026-06-05 05:29:36 +00:00
lichenblankie
468a286fee squashed a bunch of workflow build bugs 2026-06-05 05:29:36 +00:00
lichenblankie
7655748e8e added bulk ops + orphaned tag cleanup
- Bulk delete and retag from browse page with checkboxes
- Select all / deselect all toggle
- Delete confirmation shows count of selected pages
- Auto-cleanup orphaned tags on delete, edit, and bulk actions
2026-06-05 05:29:36 +00:00
lichenblankie
a9f426132e privacy pass: degoogle, CSP, referrer
- Replace Google Fonts with system font stacks across all themes
- Add Referrer-Policy, X-Content-Type-Options, X-Frame-Options, CSP headers
- Add rel="noreferrer noopener" on all outbound links
- Add no-referrer and dns-prefetch-control meta tags to all themes
- Clean tracking params on outbound links from trusted/remote sources
- Remove Google domains from CSP whitelists
2026-06-05 05:29:36 +00:00
lichenblankie
9738d28b60 added kodama2 theme
Adds pagination, meta, and success message styles, plus input
selectors for new form fields (edit page, manual entry, transport node).
2026-06-05 05:29:36 +00:00
lichenblankie
a1320ed4e4 disabled semantic search by default 2026-06-05 05:29:36 +00:00
lichenblankie
5b32d69863 added PyInstaller builds, AGPLv3, transport config
- Add pyinstaller.spec and GitHub/Forgejo CI workflows for cross-platform builds
- Add AGPLv3 license
- Move data storage to ~/.tinyweb/
- Add --version and --port CLI flags
- Add transport node selection in /style (smart regeneration preserves Reticulum config)
- Add discover more nodes link to rmap.world
2026-06-05 05:29:36 +00:00
lichenblankie
e6f77f0a55 tightened up the add form spacing 2026-06-05 05:29:36 +00:00
lichenblankie
7dbf6abf3b swapped to radio toggle for URL vs hash 2026-06-05 05:29:36 +00:00
lichenblankie
f1e7d7e26a added dropdown to switch add/subscribe 2026-06-05 05:29:36 +00:00
lichenblankie
b112ee3660 added reticulum hash option to add page 2026-06-05 05:29:36 +00:00
lichenblankie
a1358c1f3d added manual URL entry 2026-06-05 05:29:35 +00:00
lichenblankie
9bc5abd32f made semantic search optional, use meta snippets
- Add semantic_search setting to toggle AI-powered search on/off
- Skip embedding generation, hybrid search, and model preloading when disabled
- Use site owner's meta description as snippet instead of heuristic extraction
- Remove _generate_summary() and snippet() - no more generated snippets
- Show reranker/reindex controls grayed out when semantic search is off
- AI dependencies (onnxruntime, hnswlib, etc.) are now fully optional
2026-06-05 05:29:35 +00:00
lichenblankie
e72afbb22e improved snippet extraction (heuristic)
- Case-insensitive meta description extraction (fixes sites like Lemmy
  with capitalized "Description" meta name)
- Strip aside and noscript tags for cleaner body text
- Extract paragraph text separately for better sentence quality
- Prefer sentences mentioning the site name, then first quality
  paragraph, then title as fallback
- Skip meta descriptions under 20 chars (e.g. just "Lemmy")
- Remove embedding/centroid dependency from summary generation
2026-06-05 05:29:35 +00:00
lichenblankie
e8915fa381 stripped noscript tags from pages
Lemmy and other JS-heavy sites include noscript fallback text like
"Javascript is disabled" that pollutes the stored body text and
generated snippets/summaries.
2026-06-05 05:29:35 +00:00
lichenblankie
3f8ebdab1d fixed reindex, preserved summaries
Previously reindex skipped pages that already had chunks, leaving stale
embeddings in place. It also overwrote good meta description summaries
with auto-generated ones. Now it clears all chunks first so everything
is re-embedded, and only generates summaries for pages missing one.
2026-06-05 05:29:35 +00:00
lichenblankie
cf536a860c added junimo theme, bumped browse to 50 2026-06-05 05:29:35 +00:00
lichenblankie
5ded9f1339 added hybrid semantic search with reranking
Implements a three-stage search pipeline:
1. BM25 keyword search via FTS5 with column weights
2. Semantic search via Snowflake arctic-embed-s bi-encoder + HNSW index
3. Optional cross-encoder reranking (on by default, toggleable in settings)

Top 20 results are reranked for precision, next 10 appended from RRF
for coverage, giving 30 total results across 3 pages.

- New embeddings.py with ONNX Runtime inference, text chunking, HNSW
  index management, RRF fusion, and cross-encoder reranking
- Meta description extraction for authentic page snippets with centroid
  extractive fallback
- Stopword filtering in FTS5 queries to avoid overly strict matching
- /reindex page for batch embedding of existing pages
- Semantic embedding of remote pages during subscription sync
- ~125MB dependency footprint (onnxruntime, tokenizers, hnswlib, numpy)
- Models: 34MB bi-encoder + 22MB cross-encoder (downloaded on first use)
2026-06-05 05:29:35 +00:00
lichenblankie
212e9a017d fixed navbar disappearing on save
Browser textarea submissions convert \n to \r\n, causing the template
comparison against DEFAULT_TEMPLATE to always fail. This saved the bare
skeleton as a custom template, overriding the default navbar.
2026-06-05 05:29:35 +00:00
lichenblankie
59d2088498 redesigned subscriptions with card layout
Replace cramped table layout with card-based design that works
better in narrow viewports and across different themes.
2026-06-05 05:29:35 +00:00
lichenblankie
bfb8acf946 disabled share_instance for reliable announces
With share_instance = Yes, announces weren't being sent over TCP
in Docker environments. Setting it to No ensures each TinyWeb
instance manages its own Reticulum interfaces directly.
2026-06-05 05:29:35 +00:00
lichenblankie
f8f04ce4f2 added delay before announce for TCP readiness
The announce was firing before the TCP transport connection was fully
established, causing Docker instances to never announce over the mesh.
2026-06-05 05:29:35 +00:00
lichenblankie
d4d869312e added default transport node
New TinyWeb instances now auto-connect to reticulum.derickphan.com:4242
so users get internet mesh connectivity out of the box without any
manual Reticulum configuration. Env var overrides still supported.
2026-06-05 05:29:35 +00:00
lichenblankie
14aafad337 added entrypoint for Reticulum in Docker
Replaces static CMD with an entrypoint that generates RNS config from
environment variables (RNS_TCP_HOST/PORT), enabling TCP transport for
environments without LAN auto-discovery (e.g. Docker on macOS).
2026-06-05 05:29:35 +00:00
lichenblankie
e802ed4fe3 added Dockerfile + compose setup 2026-06-05 05:29:35 +00:00
lichenblankie
67084bbaed enabled WAL mode, pooling, pagination
WAL + pooling:
- Enable WAL journal mode for concurrent read/write support
- Add connection pool (size 4) with return_db() to reuse connections
  instead of opening/closing on every request

Pagination:
- Search results, /pages, and /tags/<name> now paginate at 50 per page
- Prev/next navigation links appear when results exceed one page

Delta sync:
- Pages table gains last_modified timestamp, set on insert/update
- /api/sites accepts ?since= param to return only changed pages
- Subscription sync uses last_sync timestamp for incremental fetches
- Remote pages upserted instead of delete-all/re-insert
- Full sync includes all_urls list for detecting remote deletions
2026-06-05 05:29:35 +00:00
lichenblankie
b574c4b7f5 normalized URLs to prevent dupes
clean_url() now canonicalizes: http→https, strips www., removes
trailing slashes, drops default ports, and sorts query params.
Prevents the same page from being indexed multiple times under
different URL variations.
2026-06-05 05:29:35 +00:00
lichenblankie
5d9b81db95 wrote README with setup + architecture 2026-06-05 05:29:35 +00:00
lichenblankie
6d649616ca fixed index_url page_id mismatch
lastrowid returns 0 when ON CONFLICT DO UPDATE fires on an existing
row, causing links to not be cleaned up or associated correctly on
re-index. Now fetches the actual row ID with a SELECT after upsert.
Also adds try/finally for connection safety.
2026-06-05 05:29:35 +00:00
lichenblankie
449174b0ca fixed SSRF bypass, tightened error handling
- SSRF: disable automatic redirects, manually follow up to 5 hops with
  IP re-validation at each step to prevent redirect-to-localhost bypass
- Identity file: enforce 0600 permissions on tinyweb_identity at load
  and creation to prevent other users from reading the private key
- Error messages: replace raw exception strings with generic messages
  to avoid leaking internal paths/hostnames to the UI
- DB connections: wrap all get_db() usage in try/finally to guarantee
  close() even when handlers throw mid-operation
2026-06-05 05:29:35 +00:00
lichenblankie
4899819597 added bookmark auth, CSP, per-session CSRF
- Bookmark endpoint now requires a secret token (stored in settings)
- Style reset moved from GET to POST with CSRF protection
- Open redirect prevention in _redirect() helper
- Import capped at 100 URLs to prevent abuse
- page_tags cleaned up on delete + PRAGMA foreign_keys enabled
- CSP, X-Frame-Options, X-Content-Type-Options on all responses
- CSRF tokens now per-session via double-submit cookie pattern
- Tag names URL-decoded for special characters
- Gateway forwards cookies in request data
2026-06-05 05:29:35 +00:00
lichenblankie
0981c2e0a9 hardened CSRF, SSRF, FTS5
- CSRF: Generate random token at startup, include as hidden field in
  all 11 POST forms, validate at top of POST dispatch (returns 403)
- SSRF: Block private/internal IP ranges (127/8, 10/8, 172.16/12,
  192.168/16, 169.254/16, ::1, fc00::/7) by resolving hostname before
  fetch. Remove verify=False from requests.get().
- DELETE: Change /delete/<id> from GET (instant delete) to GET
  (confirmation page) + POST (actual delete) to prevent accidental
  deletion from prefetchers/crawlers.
- FTS5: Wrap search input in double quotes to neutralize FTS5
  operators (AND, OR, NOT, *, column:). Add try/except fallback.
2026-06-05 05:29:35 +00:00
lichenblankie
104bb7ba2d created themes folder with kodama template
Save the custom kodama template to themes/kodama.html so it's
version-controlled as a file rather than only living in the database.
Stop tracking index.db since it's runtime data, not source code.
2026-06-05 05:29:35 +00:00
lichenblankie
17e804cc17 threw in the kodama tree spirit overlay
Add animated kodama (tree spirits from Princess Mononoke) to the
custom template as a canvas overlay. Each spirit has unique organic
proportions: rock-like blob head shapes, varied eye spacing/size,
optional mouths and arms, and a soft luminous glow. They fade in/out,
bob gently, and occasionally rattle their heads.

Also removed 3 orphaned remote_pages rows from deleted subscriptions.
2026-06-05 05:29:35 +00:00
lichenblankie
02450b0865 added custom template editor, cleaned up UI
- Replace CSS-only customization with full HTML template editing
- Users edit the entire page wrapper with {{content}} placeholder
- Add /style?reset escape hatch to recover from broken templates
- Move nav links to template, remove redundant nav from search page
- Delete remote pages when unsubscribing from an instance
2026-06-05 05:29:35 +00:00