Frankenstein Compute Pool — Contribute Your Idle Machine


The Frankenstein composer needs distributed compute to make inference of merged LLMs usable. If you have an idle Mac, Linux box, or NVIDIA GPU sitting around, you can plug it in to a running pool with one command. Your machine becomes one segment of a pipeline-parallel inference chain — it holds a slice of the model's layers in RAM only, computes activations for incoming requests, and forwards results to the next node.


What you actually contribute


When you join the pool, your machine runs rpc-server from llama.cpp on port 50052. When an orchestrator (whoever runs the master llama-server) decides your machine should handle layers 16–24 of an 8B model, those layer weights are streamed to you over TCP at load time and held in your RAM. Per-token traffic is small (just activations, KB-scale).


Things this means concretely:


  • **No model files are written to your disk.** Weights live only in RAM while a model is loaded. When the orchestrator stops or swaps the model, your RAM is freed.
  • **You don't see anyone's prompts or completions.** Your node only sees the intermediate activations for your layer range, which are dense tensors with no recoverable content for any non-trivial layer.
  • **You don't need to be online 24/7.** If you go offline, the supervised loop restarts `rpc-server` when you come back. The orchestrator will route around you while you're down.
  • **You can stop at any time** by killing `supervised_rpc.sh` and `rpc-server`. Removal instructions at the bottom of this file.

  • What you need


  • Linux or macOS (Apple Silicon recommended; Intel works but slower)
  • `git`, `make`, `cmake` (Linux); Xcode Command Line Tools (macOS — `cmake` is auto-downloaded by the install script if absent)
  • 4–8 GB of free RAM per 8B-class model slice
  • A Tailscale account ([tailscale.com](https://tailscale.com)) so the orchestrator can reach you, or a public IP on port 50052 (Tailscale strongly preferred for security)
  • About 2 GB of disk for the llama.cpp build itself

  • Install in one line


    
    curl -fsSL https://charenix.com/Frankenstein/join_pool.sh | bash
    

    The script:


    1. Detects your OS and accelerator (CUDA / Metal / CPU only)

    2. Clones llama.cpp and builds rpc-server with the right backend (-DGGML_RPC=ON -DGGML_CUDA=ON or -DGGML_METAL=ON)

    3. Writes a supervisor loop (/tmp/supervised_rpc.sh) that auto-restarts rpc-server if it crashes

    4. On macOS, installs a launchd agent so the supervisor survives reboot

    5. On Linux, runs the supervisor in a detached setsid session (add to systemd or crontab @reboot if you want persistence)

    6. Prints your Tailscale IP for you to share


    The build step takes 5–10 minutes on a Mac mini and 3–5 minutes on a workstation. After that, your machine is a worker.


    Sharing your address


    After install finishes, the script prints something like:


    
    ================================================================
     You are now a Frankenstein compute pool worker.
      accelerator : Metal
      rpc address : 100.121.29.3:50052
      logs        : /tmp/rpc_supervised.log
    ================================================================
    

    Send that 100.x.x.x:50052 line to the pool orchestrator. They add it to the master's --rpc list and your machine starts receiving work on the next model load.


    Why this exists


    The Frankenstein composer (charenix.com/Frankenstein) lets anyone compose custom LLMs by merging two existing models. But composed models still cost real GPU/CPU time to actually serve. A 70B-class merge is interesting in theory but useless if it takes 30 seconds per token on a single CPU.


    Pipeline parallel inference splits the model's layers across N machines. Each machine runs only its slice. The latency cost is one TCP round-trip per layer boundary; the throughput multiplies. On a 4-node Tailscale pool, an 8B merge runs ~10x faster than the same model on the strongest individual node in the pool.


    The interesting structural observation: building this on top of llama.cpp's rpc-server means contributors don't need ML expertise. They run a binary. The orchestrator handles layer placement, batching, and model swaps. This is the same separation as BOINC / SETI@home / Folding@home twenty years ago — workers contribute cycles, the project lead defines the problem.


    Anatomy of a request (for the curious)


    When the orchestrator's llama-server receives a prompt:


    1. Tokenizer runs on the orchestrator (fast, local)

    2. Embedding layer runs on whichever worker owns layer 0

    3. Each transformer block runs on whichever worker owns that block

    4. Between blocks, activations are sent TCP to the next worker

    5. Final lm_head runs on whichever worker owns the last block

    6. Logits come back to orchestrator, next token sampled, repeat


    The KV-cache for each worker's layers stays on that worker. This means warm-cache requests are nearly as fast as if the model were local — only the per-token activation round-trips are added.


    Removing yourself


    To leave the pool:


    
    # macOS
    launchctl unload ~/Library/LaunchAgents/com.frankenstein.rpc-worker.plist
    rm ~/Library/LaunchAgents/com.frankenstein.rpc-worker.plist
    
    # Linux + both
    pkill -f supervised_rpc.sh
    pkill -f rpc-server
    

    Your machine's RAM is freed and the orchestrator's next health-check will see you're gone.


    Privacy and security


  • The `rpc-server` binary is from upstream llama.cpp, signed by no one. You should audit the source before running in production. The build script in `join_pool.sh` is the only Frankenstein-side code on your machine.
  • Tailscale provides authentication and encryption for the worker–orchestrator link. If you bypass Tailscale and expose `:50052` to the public internet, you have made your hardware a remote-code-execution surface for anyone who finds your IP. Don't.
  • The orchestrator's prompts and your contributed compute do not see each other's secrets, but a malicious orchestrator could theoretically reconstruct partial information from intermediate activations. Only join pools you trust.

  • Roadmap


  • [x] One-line install script (done)
  • [x] Supervised auto-restart (done)
  • [ ] Node registry: list yourself once and the orchestrator finds you
  • [ ] Credit system: contributed-cycles → inference-cycles ledger
  • [ ] Multi-orchestrator support: route the same node into multiple pools
  • [ ] Encrypted activation tunnel (defense against malicious orchestrator)

  • Open issues + PRs welcome at github.com/norika1207-lab/frankenstein-skeleton.