self-hosted · daily ingest from a Galaxy S24

Albums

FastAPI + SQLite + CLIP + BM25 + Tailscale · React Native mobile

Built for:
People who want to delete their Google Photos and Google Drive — not migrate them, delete them — and keep everything on their own hardware with their own indexes.
Not built for:
Family-sharing or multi-user collaboration. Albums is single-user data sovereignty, full stop.

Photos that matter shouldn’t be hostages to a subscription, a scan-your-face-for-the-algorithm policy change, or a quarterly pricing rebrand. Albums is a self-hosted personal cloud — auto-ingest from the phone, full-text and semantic search, AI captioning, all running on hardware I already own.

§ I

The problem

Google Photos is excellent until it isn’t — until the next pricing change, the next policy update, the next time the “memories” feature surfaces something I don’t want surfaced. Drive is the same, with the added detail that I don’t actually know what its model trains on. Both are convenient because they’re cloud, and the cloud is convenient because it’s also someone else’s.

Albums replaces that pair with a small box in my closet. The phone uploads to it over Tailscale, the box runs the indexes, the mobile app browses everything, and I keep the originals — and the captions, and the embeddings, and the audit log of what was ingested when.

§ II

Decisions

  1. kept

    Tailscale, not a public IP. The whole reason I’m self-hosting is to take this off the open internet; punching a hole in my router would defeat that. Tailscale gives the phone a stable address into the LAN with zero exposure.

  2. kept

    CLIP for semantic search, BM25 for text. Two indexes, two strengths: CLIP nails “photos with sunsets,” BM25 nails “PDFs about that warranty letter.” Hybrid retrieval beats either alone.

  3. refused

    A cloud-hosted compute fallback for the AI captioning. The whole premise is that nothing leaves the box. If the box can’t caption fast enough, the answer is a faster box, not a remote API.

§ III

System

PHONES24React Nativecamera roll auto-syncTailscale 100.xLAN authBOXSelf-hostedINGESTFastAPICAPTIONlocal VLMSQLite indexes + flat-file originals on diskSEARCHCLIPsemantic vectorsSEARCHBM25keyword textread tunnelAPPbrowseread · searchno cloud egress · no public IP · originals never leave the boxIMPORTERGoogle Photos / Drivedeferred · pull once · then delete
FIGURE 1. The phone, the box, and the Tailscale tunnel between them — the only path in or out is one the user owns end to end.
Stack — current pins.
LayerImplementationPurpose
BackendFastAPI · Python 3.13Ingest · indexes · search · audit
StorageSQLite + flat-fileOriginals on disk, indexes in SQLite
SearchCLIP + BM25Hybrid: semantic + lexical
CaptionLocal VLMCaptions and tags applied at ingest
NetworkTailscalePhone ↔ box without an open port
MobileReact NativeBrowse · search · upload, single-user
albums/api/ingest.pypython · ingest endpoint
# Single-tenant by design — phone authenticates via Tailscale
# device identity, not a token in a database. Caption + embed
# happen in-process at ingest; nothing is queued for the cloud.
@router.post("/ingest", dependencies=[Depends(tailscale_only)])
async def ingest(file: UploadFile, captured_at: datetime = Form(...)):
    sha = await stream_to_disk(file, ORIGINALS_DIR)
    if exists(sha):
        return {"sha": sha, "status": "duplicate"}

    exif = read_exif(ORIGINALS_DIR / sha)
    caption = await caption_local(ORIGINALS_DIR / sha)   # local VLM
    clip_vec = embed_clip(ORIGINALS_DIR / sha)           # 512-d
    bm25_doc = caption + " " + (exif.get("description") or "")

    db.insert(Asset(
        sha=sha, captured_at=captured_at, exif=exif,
        caption=caption, clip=clip_vec, bm25=bm25_doc,
    ))
    return {"sha": sha, "status": "ingested", "caption": caption}
assets · sqlite rowjson
{
  "sha": "9c4a83e2f7…",
  "captured_at": "<iso8601-from-exif>",
  "exif": {
    "Make": "samsung",
    "Model": "SM-S921U",
    "GPSLatitude": 45.7341,
    "GPSLongitude": -122.6741
  },
  "caption": "Two children running on a wet beach at golden hour, evergreen forest in the background.",
  "clip_dims": 512,
  "bm25_terms": 23,
  "indexed_at": "<iso8601-utc>",
  "_origin": "tailscale://galaxy-s24"
}
FIGURE. Each ingest writes the original to disk, captions locally, embeds, and lands two indexes in one row. Nothing leaves the box.
Albums mobile app browse view — search bar with a local-only indicator, four category pills (all 8412, sunsets, family, docs 612), a 3-column grid of 12 photo thumbnails, bottom navigation with browse / search / upload / settings.
FIGURE 1. The phone’s view of 8,412 photos and 612 documents, served from a closet over Tailscale. The “0 cloud calls” pin in the search bar is the whole product, restated.
Albums single open photo with EXIF metadata — abstract warm interior composition, structured EXIF panel on the right with date / device / lens / aperture / ISO / geo / hash, and a CLIP-derived caption block of clickable mono tags.
FIGURE 2. One photo, opened. EXIF reads from the original; the captions and tags come from a local CLIP pass. The geo coordinates are real lat/lon; nothing about this photo is in any cloud.
Albums semantic search results — query of sunsets over water, a four-column grid of 12 chromatic thumbnails with descending relevance scores, filter pills showing BM25+CLIP and a local-only indicator.
FIGURE 3. Semantic search across the local index. BM25 over text fields plus CLIP over image embeddings, ranked together. The chromatic spread is the search result, not the curation — “sunsets over water” is what the index thinks the words mean.
Albums mobile ingest — vertical phone view of progress feed, twelve recent ingest rows with filename and SHA-256 fragments, a horizontal sepia progress bar at 25.5 percent, estimated completion time, Tailscale-connected indicator.
FIGURE 4. The phone hands the photos in. Each row is one file, each flag is one stage — D, I, C — downloaded, indexed, captioned. The bar is wall-clock; the estimate is the network and the captioning model arguing about who’s slower.
Albums year-and-month browse — vertical scroll with four month-section headers (April, March, February, January), each with a six-column grid of photo thumbnails, a year-jump rail on the right, and BROWSE active in the top nav.
FIGURE 5. The browse register. Year and month section headers, six-column grids underneath, year-jump rail on the right. The same scroll-and-skim shape Google Photos uses, none of the pricing rebrand risk.
§ V

What’s next

The Google Photos / Drive importer is on the next milestone — the goal is to pull everything down once, verify it landed, and then delete from Google. Albums is the destination; the importer is the bridge that lets the destination actually replace what came before.

Acknowledgments

Albums stands on FastAPI, SQLite, CLIP from OpenAI, the BM25 implementations in rank-bm25 and Tantivy, Tailscale, and the long lineage of self-hosting projects (Nextcloud, Immich, PhotoPrism) that made the case for owning your photos before I needed to make it for myself.

← Index