Service 01 · Live · Custom quote

Frankenstein

Domain-expert LLMs composed from open-source bases, tuned per vertical, deployed on hardware we control. You upload the data Claude won't see. You get back a model that knows your domain, with full benchmark report and downloadable weights.

Open live composer See pricing

Claude is brilliant. Your data isn't on it.

General-purpose LLMs are trained on the public internet. They've never seen your campaign performance logs, your client privileged docs, your patient records, your internal ticket history, or your government policy drafts. They can give you generic marketing advice; they cannot tell you why your Q3 conversion dropped 12%.

The standard fix is "RAG over your docs into ChatGPT". That works until your compliance officer notices the data goes to OpenAI servers. Then it stops being an option.

The buyer of our service isn't choosing between us and Claude. They're choosing between us and not having AI at all.

Composition, not training.

Training a model from scratch costs $20M+. Fine-tuning costs days of GPU. We use weight-level composition (mergekit) to fuse two existing open-source models into a hybrid optimized for your task — in three minutes per merge, with measurable lift over either parent.

STEP 1 · 3 min

Compose

Drag two base models onto our skeleton UI, pick a recipe (SLERP, DARE-TIES, TIES, Linear, Passthrough), tune blend params. Real mergekit runs on our pool. We've benchmarked 5 recipes head-to-head.

STEP 2 · 24h

Specialize

Upload your proprietary corpus (CSV, PDF, JSON). We build a private RAG index on a node dedicated to your tenant. Optional QLoRA fine-tune if you have labeled examples.

STEP 3 · ongoing

Maintain

Monthly: base model upstream updates re-merged, security patches applied, your RAG index refreshed against new data drift, benchmark report delivered. Without a subscription, the model degrades. We don't fake it.

Built for industries where Big Cloud isn't an option.

Vertical Customer pain Compliance pressure Price range
Marketing Ad performance data, CRM, A/B logs that competitors mustn't see Soft (competitive IP) Ask me →
Customer Service Ticket transcripts, product defect log, escalation patterns Soft (brand risk if leaked) Ask me →
Finance / Accounting GL entries, vendor invoices, treasury, AR/AP reconciliation Hard (SOX, internal audit) Ask me →
Legal / DD Privileged client docs, contracts, M&A files Hard (privilege) Ask me →
Medical / Imaging X-ray, CT, MRI, patient records, claims Absolute (HIPAA / 個資法 §6) Ask me →
Government / Defense Classified docs, policy drafts, cross-agency comms Absolute (national security) Ask me →

Marketing and CS go live first because the sales cycle is shorter. Finance, Legal, Medical, Government move on a 6–24 month cycle and require dedicated SOC 2 / ISO 27001 paperwork — we'll have those by Q4.

5 mergekit recipes, head-to-head on GSM8K-10.

First systematic comparison of mergekit recipes on the same base pair (Llama-3.1-8B-Instruct + DeepSeek-R1-Distill-Llama-8B), same token budget (350), same answer extractor, same eval set. Run on our 4-node federated compute pool.

ModelGSM8K-10 AccuracyReasoning markers / genNote
DARE-TIES merge 70% 3.60 🥇 Only recipe to lift over either parent
Hermes-3 baseline 60% 0.40 Extremely terse style
SLERP merge 60% 4.30 Preserves Llama; published in our DOI 20404139
Linear merge 50% 4.50 Naive averaging dilutes capability
Passthrough merge 50% 10.50 🔥 3× verbose — layer-stacking induces reasoning chatter
TIES merge 30% 3.10 Trim+vote degrades on this pair
DeepSeek-R1-Distill baseline 10% 3.50 Token budget caveat; reasoning chains don't fit in 350 tokens

n=10, single eval set. Larger n + multi-domain eval suite in progress.

Common challenges from prospective buyers.

Is your model better than Claude?

No, on general intelligence we lose by a wide margin. We're better at one specific thing: serving a domain your data can't leave. If "use Claude" is a viable option for your team, our service isn't for you.

How is my data protected?

Paid tenants run on dedicated machines isolated from the federated pool. Each customer's RAG index lives on a single physical node we own and audit. We never train on customer data unless explicitly contracted. See our security statement for specifics.

Can I self-host the model?

Pro tier and above include downloadable GGUF weights. You can run them on any llama.cpp-compatible runtime. No vendor lock-in.

What happens if I cancel?

Your service enters graceful degradation over 60 days: no more base-model upgrades, no security patches, no RAG refresh. After day 91 the endpoint stops serving. Data retained for 60 more days for re-activation, then permanently deleted per our privacy policy.

Why no per-token pricing?

Per-token pricing requires counting what's in your queries. We'd rather not. Flat monthly subscription with rate limits, unmetered within the limit.

Can you handle Mandarin / Japanese / Korean better than OpenAI?

On localized open-source bases (Llama + TAIDE for 繁中, Llama + Swallow for 日本語), yes, often by a large margin. We can deliver a 繁中-native merge that doesn't sound translated.

Try Frankenstein without an account.

The public composer at /Frankenstein/ lets you drag any two of our pre-loaded base models onto the skeleton, pick a recipe, and chat with the result via our 4-node fast inference pool. No signup, no card.

Open public composer

free tier: 100 generations / day · 4-node pool · phase1 SLERP loaded by default