259 lines
11 KiB
Markdown
259 lines
11 KiB
Markdown
# TinyWeb
|
|
|
|
A personal, decentralized search engine built on the [Reticulum](https://reticulum.network/) mesh network. Curate your own index of web pages, search it locally, and share collections with friends over an encrypted mesh. No algorithms, no ads, no tracking.
|
|
|
|
## Features
|
|
|
|
- **Personal search index** — Save pages you find valuable, search them with full-text search (SQLite FTS5)
|
|
- **Tagging** — Organize saved pages with comma-separated tags
|
|
- **Bookmarklet** — One-click indexing from any browser tab
|
|
- **Subscriptions** — Subscribe to friends' TinyWeb instances over Reticulum and search their indexes alongside yours
|
|
- **Custom templates** — Full HTML/CSS/JS template editor to personalize your instance
|
|
- **Import/export** — JSON-based backup and restore
|
|
- **Mesh-native** — Works over Reticulum without the internet; encrypted and decentralized by default
|
|
- **Forum plugin** — Optional link-sharing discussion board over the mesh (see Forum section below)
|
|
|
|
## Performance & Scale
|
|
|
|
### Search Speed
|
|
|
|
| Pages indexed | Search speed | Notes |
|
|
|--------------|-------------|-------|
|
|
| 1,000 | ~50ms | Fast local FTS5 |
|
|
| 10,000 | ~50-100ms | Full-text search |
|
|
| 100,000 | ~100-200ms | Combined BM25 + semantic |
|
|
| 500,000 | ~200-400ms | With semantic enabled |
|
|
| 1,000,000 | ~300-500ms | Hybrid search |
|
|
|
|
*Times are estimates for combined BM25 + semantic search. Actual performance varies by hardware, storage type (SSD/HDD), and search complexity.*
|
|
|
|
### Concurrent Connections
|
|
|
|
- Database pool: 16 simultaneous connections
|
|
- Suitable for single-user + a few subscriptions
|
|
|
|
### Export
|
|
|
|
- Paginated at 10,000 pages per request
|
|
- Use `?batch=N` to export in chunks: `/export?batch=0`, `/export?batch=1`, etc.
|
|
|
|
## Download (pre-built binaries)
|
|
|
|
Download the latest release for your platform from the [Releases](https://git.derickphan.com/lichenblankie/tinyweb/releases) page:
|
|
|
|
| Platform | File |
|
|
|----------|------|
|
|
| Windows | `TinyWeb-windows-x64.exe` |
|
|
| macOS | `TinyWeb-macos-arm64` |
|
|
| Linux | `TinyWeb-linux-x64` |
|
|
|
|
Run the downloaded file — no installation required.
|
|
|
|
## Docker
|
|
|
|
TinyWeb is distributed as source. Clone the repo, then build and run with Docker Compose:
|
|
|
|
```bash
|
|
git clone https://git.derickphan.com/lichenblankie/tinyweb.git
|
|
cd tinyweb
|
|
docker compose up -d
|
|
```
|
|
|
|
The bundled `docker-compose.yml` builds the image from source and persists your data in a named volume:
|
|
|
|
```yaml
|
|
services:
|
|
tinyweb:
|
|
build: .
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- tinyweb-data:/data
|
|
restart: unless-stopped
|
|
|
|
volumes:
|
|
tinyweb-data:
|
|
```
|
|
|
|
After the first build, the image is cached locally and subsequent `docker compose up -d` calls are instant. To update to the latest source:
|
|
|
|
```bash
|
|
git pull && docker compose up -d --build
|
|
```
|
|
|
|
If you're on macOS or need to reach a Reticulum node over TCP, uncomment the `RNS_TCP_HOST` / `RNS_TCP_PORT` block in `docker-compose.yml` and point it at a host running Reticulum. On Linux with LAN auto-discovery, leave it as-is (or switch to `network_mode: host`).
|
|
|
|
### Storage Estimates
|
|
|
|
Average web page content is ~15KB per page:
|
|
|
|
| Pages | Database | Embeddings* | Total |
|
|
|-------|----------|------------|-------|
|
|
| 10,000 | 150MB | 80MB | ~250MB |
|
|
| 100,000 | 1.5GB | 800MB | ~2.5GB |
|
|
| 500,000 | 7.5GB | 4GB | ~12GB |
|
|
| 1,000,000 | 15GB | 8GB | ~25GB |
|
|
|
|
*Embeddings require semantic search to be enabled. With compression enabled (Settings > Search > AI), embeddings use ~50% less storage.
|
|
|
|
Enable optional compression in Settings > Search > AI to reduce embedding storage by ~50%.
|
|
|
|
## Data storage
|
|
|
|
### Local (Python/binary)
|
|
|
|
Your data is stored in `~/.tinyweb/`:
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `index.db` | SQLite database with your indexed pages |
|
|
| `tinyweb_identity` | Your Reticulum identity (keep safe!) |
|
|
| `forum.db` | Forum plugin database (only if forum is enabled) |
|
|
| `models/` | Downloaded AI models for semantic search |
|
|
| `index.hnsw` | Semantic search index |
|
|
|
|
This allows your data to persist between upgrades and stay separate from the application.
|
|
|
|
### Backups
|
|
|
|
Back up the whole `~/.tinyweb/` directory periodically. The two files that matter:
|
|
|
|
- **`tinyweb_identity`** is your permanent mesh identity. If you lose it, your destination hash changes and every subscriber has to re-subscribe to the new one. Keep it somewhere you trust; the file is `0600` by default.
|
|
- **`index.db`** is your full reading history — every page, note, tag, and synced remote page. Losing it loses everything you've curated.
|
|
- **`forum.db`** (if the forum plugin is enabled) — all threads, posts, upvotes, and moderation settings. Losing it loses your forum data.
|
|
|
|
`models/` and `index.hnsw` are re-derivable (the model will re-download, and the HNSW index rebuilds from the database on next startup with semantic search enabled) so they don't need to be backed up.
|
|
|
|
The `/export` page produces a JSON dump of your pages. It's a migration aid — it doesn't preserve your identity file, your custom template, or subscription state. A full restore needs a copy of `~/.tinyweb/`.
|
|
|
|
### Docker
|
|
|
|
When you run via `docker compose up` (above), data is stored in the `tinyweb-data` named volume and persists across rebuilds. To inspect or back up:
|
|
|
|
```bash
|
|
docker compose exec tinyweb ls -la /data
|
|
docker compose down # stop without removing the volume
|
|
```
|
|
|
|
To reset everything (destroys your index and identity — back up first):
|
|
|
|
```bash
|
|
docker compose down -v
|
|
```
|
|
|
|
### Command line options
|
|
|
|
```bash
|
|
./TinyWeb --version # Show version
|
|
./TinyWeb -p 9000 # Use port 9000 instead of default 8080
|
|
./TinyWeb --bind 0.0.0.0 # Expose the web UI to your LAN (see warning below)
|
|
```
|
|
|
|
By default, the web UI binds to `127.0.0.1` and is only reachable from the machine running TinyWeb. **The UI has no authentication** — anyone who can reach the port can read, add, and delete entries, and change settings. Only pass `--bind 0.0.0.0` if you fully trust your network, or put TinyWeb behind an authenticating reverse proxy.
|
|
|
|
## Getting started
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
python app.py
|
|
```
|
|
|
|
This starts the Reticulum server and an HTTP gateway on `http://127.0.0.1:8080`. Open it in your browser. The UI is localhost-only by default; see `--bind` under *Command line options* if you want to reach it from another machine.
|
|
|
|
Your destination hash is printed on startup — share it with friends so they can subscribe to your index.
|
|
|
|
## Remote gateway
|
|
|
|
To browse a remote TinyWeb instance without running your own index:
|
|
|
|
```bash
|
|
python gateway.py <destination_hash>
|
|
```
|
|
|
|
This connects over Reticulum and serves the remote instance at `http://localhost:8080`.
|
|
|
|
## How it works
|
|
|
|
1. **Save pages** — Use the `/add` form or the bookmarklet (found on `/style`) to index any URL
|
|
2. **Search** — Full-text search across your saved pages, linked pages from trusted sites, and synced subscriptions
|
|
3. **Subscribe** — Add a friend's destination hash on `/subscriptions` to sync their shared index
|
|
4. **Customize** — Edit your site name, HTML template, and sharing settings on `/style`
|
|
|
|
## Forum plugin
|
|
|
|
TinyWeb ships with an optional [tinyweb-forum](https://git.derickphan.com/lichenblankie/tinyweb-forum) plugin — a decentralized link-sharing discussion board that runs in-process alongside TinyWeb.
|
|
|
|
### Install
|
|
|
|
```bash
|
|
pip install tinyweb-forum
|
|
```
|
|
|
|
Enable it on the `/style` page under "Forum". A "Forum" link will appear in the navigation bar.
|
|
|
|
### How it works
|
|
|
|
- Threads and posts are stored in `~/.tinyweb/forum.db` (separate from your search index)
|
|
- Instances are discovered automatically via mesh announces — no manual setup needed
|
|
- Sync is manual by default: click "sync now" on the forum page. Auto-sync every 5 minutes is optional (toggle on moderation page)
|
|
- At scale, sync uses epidemic gossip: 20 random peers per cycle, converging globally within ~O(log N) cycles
|
|
- Authors are identified by a short pseudonymous identity hash (no accounts, no sign-up)
|
|
- Auto-discovery can be disabled in the moderation page
|
|
- Threads are auto-pruned after 30 days (configurable, or set to 0 to keep everything)
|
|
- Moderation is local: block authors, mute threads, keyword filters, and gossip block lists with peers (auto-block after 3 peer reports)
|
|
|
|
For full feature docs, see the [tinyweb-forum README](https://git.derickphan.com/lichenblankie/tinyweb-forum).
|
|
|
|
## Project structure
|
|
|
|
```
|
|
app.py — Entry point: boots Reticulum, starts HTTP gateway
|
|
gateway.py — HTTP-to-RNS bridge (local or remote dispatch)
|
|
handlers.py — Route dispatcher and all request handlers
|
|
db.py — SQLite database, FTS5, URL fetching, SSRF protection
|
|
templates.py — HTML template rendering and escaping
|
|
rns_client.py — Reticulum client for fetching remote site lists
|
|
themes/ — Saved HTML templates (e.g. kodama.html)
|
|
```
|
|
|
|
## Security
|
|
|
|
**The web UI has no authentication.** It is bound to `127.0.0.1` by default, so only processes on the local machine can reach it. If you pass `--bind 0.0.0.0` (or run inside a container with a published port), anyone who can reach that address can fully control your instance — reading private entries, changing settings, and modifying the HTML template (which runs in your browser). Put TinyWeb behind a reverse proxy with auth before exposing it beyond localhost.
|
|
|
|
Other hardening measures:
|
|
|
|
- **CSRF protection** — All POST forms use per-session tokens via double-submit cookies
|
|
- **SSRF prevention** — URL fetching validates hostnames against private IP ranges, with redirect re-validation
|
|
- **FTS5 injection prevention** — Search queries are sanitized before passing to SQLite MATCH
|
|
- **Content Security Policy** — CSP headers on all HTML responses restrict script/style/frame sources
|
|
- **XSS escaping** — All user-supplied content is HTML-escaped before rendering
|
|
- **Bookmark authentication** — The bookmarklet endpoint requires a secret token
|
|
- **Identity file protection** — The Reticulum identity key is restricted to owner-only permissions (0600)
|
|
- **Forum caveats** — See [tinyweb-forum Security](https://git.derickphan.com/lichenblankie/tinyweb-forum#security) for forum-specific risks (voluntary retractions, block gossip manipulation, no rate limiting)
|
|
|
|
## Maintenance
|
|
|
|
### Database Vacuum
|
|
|
|
Over time, deleted pages leave empty space in the database. Run the vacuum tool periodically to reclaim space:
|
|
|
|
1. Go to `/style` in your browser
|
|
2. Click "vacuum database" at the bottom of the page
|
|
|
|
### Optional Compression
|
|
|
|
To reduce storage for semantic search embeddings (~50% savings):
|
|
|
|
1. Go to `/style` > Search > AI
|
|
2. Enable "compress embeddings"
|
|
3. Re-index your existing pages for the compression to apply to existing embeddings
|
|
|
|
## Dependencies
|
|
|
|
- [requests](https://docs.python-requests.org/) — HTTP fetching
|
|
- [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/) — HTML parsing and link extraction
|
|
- [rns](https://reticulum.network/) — Reticulum mesh networking
|
|
|
|
## Philosophy
|
|
|
|
TinyWeb is built for the slow web — intentionality over speed, human curation over algorithmic feeds, privacy over surveillance, and community over corporations. Every page in your index was saved because you found it valuable, not because an algorithm told you to click.
|