Optimized storage and updated readme
All checks were successful
/ build (push) Successful in 2m19s

This commit is contained in:
lichenblankie 2026-04-11 21:59:55 +00:00
parent 552311b730
commit 8ecb963be4
4 changed files with 172 additions and 29 deletions

View file

@ -12,6 +12,30 @@ A personal, decentralized search engine built on the [Reticulum](https://reticul
- **Import/export** — JSON-based backup and restore
- **Mesh-native** — Works over Reticulum without the internet; encrypted and decentralized by default
## Performance & Scale
### Search Speed
| Pages indexed | Search speed | Notes |
|--------------|-------------|-------|
| 1,000 | ~50ms | Fast local FTS5 |
| 10,000 | ~50-100ms | Full-text search |
| 100,000 | ~100-200ms | Combined BM25 + semantic |
| 500,000 | ~200-400ms | With semantic enabled |
| 1,000,000 | ~300-500ms | Hybrid search |
*Times are estimates for combined BM25 + semantic search. Actual performance varies by hardware, storage type (SSD/HDD), and search complexity.*
### Concurrent Connections
- Database pool: 16 simultaneous connections
- Suitable for single-user + a few subscriptions
### Export
- Paginated at 10,000 pages per request
- Use `?batch=N` to export in chunks: `/export?batch=0`, `/export?batch=1`, etc.
## Download (pre-built binaries)
Download the latest release for your platform from the [Releases](https://git.derickphan.com/lichenblankie/tinyweb/releases) page:
@ -55,6 +79,21 @@ volumes:
Run with `docker compose up -d`.
### Storage Estimates
Average web page content is ~15KB per page:
| Pages | Database | Embeddings* | Total |
|-------|----------|------------|-------|
| 10,000 | 150MB | 80MB | ~250MB |
| 100,000 | 1.5GB | 800MB | ~2.5GB |
| 500,000 | 7.5GB | 4GB | ~12GB |
| 1,000,000 | 15GB | 8GB | ~25GB |
*Embeddings require semantic search to be enabled. With compression enabled (Settings > Search > AI), embeddings use ~50% less storage.
Enable optional compression in Settings > Search > AI to reduce embedding storage by ~50%.
## Data storage
### Local (Python/binary)
@ -139,6 +178,23 @@ TinyWeb includes several hardening measures:
- **Bookmark authentication** — The bookmarklet endpoint requires a secret token
- **Identity file protection** — The Reticulum identity key is restricted to owner-only permissions (0600)
## Maintenance
### Database Vacuum
Over time, deleted pages leave empty space in the database. Run the vacuum tool periodically to reclaim space:
1. Go to `/style` in your browser
2. Click "vacuum database" at the bottom of the page
### Optional Compression
To reduce storage for semantic search embeddings (~50% savings):
1. Go to `/style` > Search > AI
2. Enable "compress embeddings"
3. Re-index your existing pages for the compression to apply to existing embeddings
## Dependencies
- [requests](https://docs.python-requests.org/) — HTTP fetching