Service 01 · Live · Custom quote

Frankenstein

Domain-expert LLMs composed from open-source bases, tuned per vertical, deployed on hardware we control. You upload the data Claude won't see. You get back a model that knows your domain, with full benchmark report and downloadable weights.

Open live composer → See pricing

01 The problem why generic LLMs fail your vertical

Claude is brilliant. Your data isn't on it.

General-purpose LLMs are trained on the public internet. They've never seen your campaign performance logs, your client privileged docs, your patient records, your internal ticket history, or your government policy drafts. They can give you generic marketing advice; they cannot tell you why your Q3 conversion dropped 12%.

The standard fix is "RAG over your docs into ChatGPT". That works until your compliance officer notices the data goes to OpenAI servers. Then it stops being an option.

The buyer of our service isn't choosing between us and Claude. They're choosing between us and not having AI at all.

02 How it works composition + isolation

Composition, not training.

Training a model from scratch costs $20M+. Fine-tuning costs days of GPU. We use weight-level composition (mergekit) to fuse two existing open-source models into a hybrid optimized for your task — in three minutes per merge, with measurable lift over either parent.

STEP 1 · 3 min

Compose

Drag two base models onto our skeleton UI, pick a recipe (SLERP, DARE-TIES, TIES, Linear, Passthrough), tune blend params. Real mergekit runs on our pool. We've benchmarked 5 recipes head-to-head.

STEP 2 · 24h

Specialize

Upload your proprietary corpus (CSV, PDF, JSON). We build a private RAG index on a node dedicated to your tenant. Optional QLoRA fine-tune if you have labeled examples.

STEP 3 · ongoing

Maintain

Monthly: base model upstream updates re-merged, security patches applied, your RAG index refreshed against new data drift, benchmark report delivered. Without a subscription, the model degrades. We don't fake it.

03 Six verticals choose your domain

Built for industries where Big Cloud isn't an option.

Vertical	Customer pain	Compliance pressure	Price range
Marketing	Ad performance data, CRM, A/B logs that competitors mustn't see	Soft (competitive IP)	Ask me →
Customer Service	Ticket transcripts, product defect log, escalation patterns	Soft (brand risk if leaked)	Ask me →
Finance / Accounting	GL entries, vendor invoices, treasury, AR/AP reconciliation	Hard (SOX, internal audit)	Ask me →
Legal / DD	Privileged client docs, contracts, M&A files	Hard (privilege)	Ask me →
Medical / Imaging	X-ray, CT, MRI, patient records, claims	Absolute (HIPAA / 個資法 §6)	Ask me →
Government / Defense	Classified docs, policy drafts, cross-agency comms	Absolute (national security)	Ask me →

Marketing and CS go live first because the sales cycle is shorter. Finance, Legal, Medical, Government move on a 6–24 month cycle and require dedicated SOC 2 / ISO 27001 paperwork — we'll have those by Q4.

04 Benchmark measured, not claimed

5 mergekit recipes, head-to-head on GSM8K-10.

First systematic comparison of mergekit recipes on the same base pair (Llama-3.1-8B-Instruct + DeepSeek-R1-Distill-Llama-8B), same token budget (350), same answer extractor, same eval set. Run on our 4-node federated compute pool.

Model	GSM8K-10 Accuracy	Reasoning markers / gen	Note
DARE-TIES merge	70%	3.60	🥇 Only recipe to lift over either parent
Hermes-3 baseline	60%	0.40	Extremely terse style
SLERP merge	60%	4.30	Preserves Llama; published in our DOI 20404139
Linear merge	50%	4.50	Naive averaging dilutes capability
Passthrough merge	50%	10.50	🔥 3× verbose — layer-stacking induces reasoning chatter
TIES merge	30%	3.10	Trim+vote degrades on this pair
DeepSeek-R1-Distill baseline	10%	3.50	Token budget caveat; reasoning chains don't fit in 350 tokens

n=10, single eval set. Larger n + multi-domain eval suite in progress.

05 FAQ honest answers

Common challenges from prospective buyers.

Is your model better than Claude?

No, on general intelligence we lose by a wide margin. We're better at one specific thing: serving a domain your data can't leave. If "use Claude" is a viable option for your team, our service isn't for you.

How is my data protected?

Paid tenants run on dedicated machines isolated from the federated pool. Each customer's RAG index lives on a single physical node we own and audit. We never train on customer data unless explicitly contracted. See our security statement for specifics.

Can I self-host the model?

Pro tier and above include downloadable GGUF weights. You can run them on any llama.cpp-compatible runtime. No vendor lock-in.

What happens if I cancel?

Your service enters graceful degradation over 60 days: no more base-model upgrades, no security patches, no RAG refresh. After day 91 the endpoint stops serving. Data retained for 60 more days for re-activation, then permanently deleted per our privacy policy.

Why no per-token pricing?

Per-token pricing requires counting what's in your queries. We'd rather not. Flat monthly subscription with rate limits, unmetered within the limit.

Can you handle Mandarin / Japanese / Korean better than OpenAI?

On localized open-source bases (Llama + TAIDE for 繁中, Llama + Swallow for 日本語), yes, often by a large margin. We can deliver a 繁中-native merge that doesn't sound translated.

Try Frankenstein without an account.

The public composer at /Frankenstein/ lets you drag any two of our pre-loaded base models onto the skeleton, pick a recipe, and chat with the result via our 4-node fast inference pool. No signup, no card.

Open public composer →

free tier: 100 generations / day · 4-node pool · phase1 SLERP loaded by default