Commit graph

8 commits

Author SHA1 Message Date
Derick Phan
6981d39ddd
Normalize URLs to prevent duplicate indexing
clean_url() now canonicalizes: http→https, strips www., removes
trailing slashes, drops default ports, and sorts query params.
Prevents the same page from being indexed multiple times under
different URL variations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:34:15 -07:00
Derick Phan
86e4c1f151
Fix index_url using wrong page_id after upsert
lastrowid returns 0 when ON CONFLICT DO UPDATE fires on an existing
row, causing links to not be cleaned up or associated correctly on
re-index. Now fetches the actual row ID with a SELECT after upsert.
Also adds try/finally for connection safety.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:24:01 -07:00
Derick Phan
c10aa7955c
Fix SSRF redirect bypass, identity permissions, error leakage, and DB connection leaks
- SSRF: disable automatic redirects, manually follow up to 5 hops with
  IP re-validation at each step to prevent redirect-to-localhost bypass
- Identity file: enforce 0600 permissions on tinyweb_identity at load
  and creation to prevent other users from reading the private key
- Error messages: replace raw exception strings with generic messages
  to avoid leaking internal paths/hostnames to the UI
- DB connections: wrap all get_db() usage in try/finally to guarantee
  close() even when handlers throw mid-operation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:18:47 -07:00
Derick Phan
d5f2d01651
Harden security: bookmark auth, CSP headers, per-session CSRF, and more
- Bookmark endpoint now requires a secret token (stored in settings)
- Style reset moved from GET to POST with CSRF protection
- Open redirect prevention in _redirect() helper
- Import capped at 100 URLs to prevent abuse
- page_tags cleaned up on delete + PRAGMA foreign_keys enabled
- CSP, X-Frame-Options, X-Content-Type-Options on all responses
- CSRF tokens now per-session via double-submit cookie pattern
- Tag names URL-decoded for special characters
- Gateway forwards cookies in request data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 11:10:37 -07:00
Derick Phan
9ddecf71db
Add security hardening: CSRF, SSRF, FTS5, and DELETE via POST
- CSRF: Generate random token at startup, include as hidden field in
  all 11 POST forms, validate at top of POST dispatch (returns 403)
- SSRF: Block private/internal IP ranges (127/8, 10/8, 172.16/12,
  192.168/16, 169.254/16, ::1, fc00::/7) by resolving hostname before
  fetch. Remove verify=False from requests.get().
- DELETE: Change /delete/<id> from GET (instant delete) to GET
  (confirmation page) + POST (actual delete) to prevent accidental
  deletion from prefetchers/crawlers.
- FTS5: Wrap search input in double quotes to neutralize FTS5
  operators (AND, OR, NOT, *, column:). Add try/except fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 10:54:22 -07:00
Derick Phan
62055a578d
Strip tracking params from URLs and add tags/collections
URLs are cleaned of tracking parameters (utm_*, fbclid, gclid, etc.)
before indexing. Tags can be added when saving or editing pages,
browsed at /tags, and are included in search results. Tags are shared
via /api/sites and preserved when syncing/importing from subscriptions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 23:15:28 -07:00
Derick Phan
9a9b5e0617
Add Reticulum-native subscriptions and sync-based distributed search
- Subscriptions now use Reticulum destination hashes instead of HTTP URLs
- All subscription syncing happens over encrypted RNS links (rns_client.py)
- Add remote_pages table for synced content from subscriptions
- Search results now include pages from synced subscriptions, grouped by source
- Remove HTTP dependency from subscription handlers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 22:51:22 -07:00
Derick Phan
f609f867ef
Migrate TinyWeb to Reticulum mesh network
Replace HTTP server with Reticulum-native architecture. The server
now speaks only Reticulum, with a client-side gateway providing
browser access by translating HTTP to/from RNS requests.

- Extract db layer (db.py), templates (templates.py), handlers (handlers.py)
- app.py is now the RNS server with persistent identity and destination
- gateway.py bridges HTTP on localhost:8080 to RNS link requests
- Add rns dependency, add .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 22:18:24 -07:00