lichenblankie/tinyweb

Fork 0

lichenblankie 8ecb963be4

/ build (push) Successful in 2m19s

Details

Optimized storage and updated readme

2026-04-11 21:59:55 +00:00

6.9 KiB

Raw Blame History

TinyWeb

A personal, decentralized search engine built on the Reticulum mesh network. Curate your own index of web pages, search it locally, and share collections with friends over an encrypted mesh. No algorithms, no ads, no tracking.

Features

Personal search index — Save pages you find valuable, search them with full-text search (SQLite FTS5)
Tagging — Organize saved pages with comma-separated tags
Bookmarklet — One-click indexing from any browser tab
Subscriptions — Subscribe to friends' TinyWeb instances over Reticulum and search their indexes alongside yours
Custom templates — Full HTML/CSS/JS template editor to personalize your instance
Import/export — JSON-based backup and restore
Mesh-native — Works over Reticulum without the internet; encrypted and decentralized by default

Performance & Scale

Search Speed

Pages indexed	Search speed	Notes
1,000	~50ms	Fast local FTS5
10,000	~50-100ms	Full-text search
100,000	~100-200ms	Combined BM25 + semantic
500,000	~200-400ms	With semantic enabled
1,000,000	~300-500ms	Hybrid search

Times are estimates for combined BM25 + semantic search. Actual performance varies by hardware, storage type (SSD/HDD), and search complexity.

Concurrent Connections

Database pool: 16 simultaneous connections
Suitable for single-user + a few subscriptions

Export

Paginated at 10,000 pages per request
Use ?batch=N to export in chunks: /export?batch=0, /export?batch=1, etc.

Download (pre-built binaries)

Download the latest release for your platform from the Releases page:

Platform	File
Windows	`TinyWeb-windows-x64.exe`
macOS	`TinyWeb-macos-arm64`
Linux	`TinyWeb-linux-x64`

Run the downloaded file — no installation required.

Docker

Pull and run TinyWeb from the container registry:

docker run -p 8080:8080 registry.derickphan.com/tinyweb:latest

Or with a specific version:

docker run -p 8080:8080 registry.derickphan.com/tinyweb:v0.1.0

Docker Compose

services:
  tinyweb:
    image: registry.derickphan.com/tinyweb:latest
    ports:
      - "8080:8080"
    volumes:
      - tinyweb-data:/data

volumes:
  tinyweb-data:

Run with docker compose up -d.

Storage Estimates

Average web page content is ~15KB per page:

Pages	Database	Embeddings*	Total
10,000	150MB	80MB	~250MB
100,000	1.5GB	800MB	~2.5GB
500,000	7.5GB	4GB	~12GB
1,000,000	15GB	8GB	~25GB

*Embeddings require semantic search to be enabled. With compression enabled (Settings > Search > AI), embeddings use ~50% less storage.

Enable optional compression in Settings > Search > AI to reduce embedding storage by ~50%.

Data storage

Local (Python/binary)

Your data is stored in ~/.tinyweb/:

File	Description
`index.db`	SQLite database with your indexed pages
`tinyweb_identity`	Your Reticulum identity (keep safe!)
`models/`	Downloaded AI models for semantic search
`index.hnsw`	Semantic search index

This allows your data to persist between upgrades and stay separate from the application.

Docker

Data is stored in the /data volume inside the container. Use a volume mount to persist data:

docker run -p 8080:8080 -v tinyweb-data:/data registry.derickphan.com/tinyweb:latest

Or with docker-compose (see above) — data persists in the named volume.

Command line options

./TinyWeb --version    # Show version
./TinyWeb -p 9000      # Use port 9000 instead of default 8080

Getting started

pip install -r requirements.txt
python app.py

This starts the Reticulum server and an HTTP gateway on http://localhost:8080. Open it in your browser.

Your destination hash is printed on startup — share it with friends so they can subscribe to your index.

Remote gateway

To browse a remote TinyWeb instance without running your own index:

python gateway.py <destination_hash>

This connects over Reticulum and serves the remote instance at http://localhost:8080.

How it works

Save pages — Use the /add form or the bookmarklet (found on /style) to index any URL
Search — Full-text search across your saved pages, linked pages from trusted sites, and synced subscriptions
Subscribe — Add a friend's destination hash on /subscriptions to sync their shared index
Customize — Edit your site name, HTML template, and sharing settings on /style

Project structure

app.py          — Entry point: boots Reticulum, starts HTTP gateway
gateway.py      — HTTP-to-RNS bridge (local or remote dispatch)
handlers.py     — Route dispatcher and all request handlers
db.py           — SQLite database, FTS5, URL fetching, SSRF protection
templates.py    — HTML template rendering and escaping
rns_client.py   — Reticulum client for fetching remote site lists
themes/         — Saved HTML templates (e.g. kodama.html)

Security

TinyWeb includes several hardening measures:

CSRF protection — All POST forms use per-session tokens via double-submit cookies
SSRF prevention — URL fetching validates hostnames against private IP ranges, with redirect re-validation
FTS5 injection prevention — Search queries are sanitized before passing to SQLite MATCH
Content Security Policy — CSP headers on all HTML responses restrict script/style/frame sources
XSS escaping — All user-supplied content is HTML-escaped before rendering
Bookmark authentication — The bookmarklet endpoint requires a secret token
Identity file protection — The Reticulum identity key is restricted to owner-only permissions (0600)

Maintenance

Database Vacuum

Over time, deleted pages leave empty space in the database. Run the vacuum tool periodically to reclaim space:

Go to /style in your browser
Click "vacuum database" at the bottom of the page

Optional Compression

To reduce storage for semantic search embeddings (~50% savings):

Go to /style > Search > AI
Enable "compress embeddings"
Re-index your existing pages for the compression to apply to existing embeddings

Dependencies

requests — HTTP fetching
beautifulsoup4 — HTML parsing and link extraction
rns — Reticulum mesh networking

Philosophy

TinyWeb is built for the slow web — intentionality over speed, human curation over algorithmic feeds, privacy over surveillance, and community over corporations. Every page in your index was saved because you found it valuable, not because an algorithm told you to click.

6.9 KiB Raw Blame History

TinyWeb

Features

Performance & Scale

Search Speed

Concurrent Connections

Export

Download (pre-built binaries)

Docker

Docker Compose

Storage Estimates

Data storage

Local (Python/binary)

Docker

Command line options

Getting started

Remote gateway

How it works

Project structure

Security

Maintenance

Database Vacuum

Optional Compression

Dependencies

Philosophy

6.9 KiB

Raw Blame History