Domain-expert LLMs composed from open-source bases, tuned per vertical, deployed on hardware we control. You upload the data Claude won't see. You get back a model that knows your domain, with full benchmark report and downloadable weights.
General-purpose LLMs are trained on the public internet. They've never seen your campaign performance logs, your client privileged docs, your patient records, your internal ticket history, or your government policy drafts. They can give you generic marketing advice; they cannot tell you why your Q3 conversion dropped 12%.
The standard fix is "RAG over your docs into ChatGPT". That works until your compliance officer notices the data goes to OpenAI servers. Then it stops being an option.
The buyer of our service isn't choosing between us and Claude. They're choosing between us and not having AI at all.
Training a model from scratch costs $20M+. Fine-tuning costs days of GPU. We use weight-level composition (mergekit) to fuse two existing open-source models into a hybrid optimized for your task — in three minutes per merge, with measurable lift over either parent.
Drag two base models onto our skeleton UI, pick a recipe (SLERP, DARE-TIES, TIES, Linear, Passthrough), tune blend params. Real mergekit runs on our pool. We've benchmarked 5 recipes head-to-head.
Upload your proprietary corpus (CSV, PDF, JSON). We build a private RAG index on a node dedicated to your tenant. Optional QLoRA fine-tune if you have labeled examples.
Monthly: base model upstream updates re-merged, security patches applied, your RAG index refreshed against new data drift, benchmark report delivered. Without a subscription, the model degrades. We don't fake it.
| Vertical | Customer pain | Compliance pressure | Price range |
|---|---|---|---|
| Marketing | Ad performance data, CRM, A/B logs that competitors mustn't see | Soft (competitive IP) | Ask me → |
| Customer Service | Ticket transcripts, product defect log, escalation patterns | Soft (brand risk if leaked) | Ask me → |
| Finance / Accounting | GL entries, vendor invoices, treasury, AR/AP reconciliation | Hard (SOX, internal audit) | Ask me → |
| Legal / DD | Privileged client docs, contracts, M&A files | Hard (privilege) | Ask me → |
| Medical / Imaging | X-ray, CT, MRI, patient records, claims | Absolute (HIPAA / 個資法 §6) | Ask me → |
| Government / Defense | Classified docs, policy drafts, cross-agency comms | Absolute (national security) | Ask me → |
Marketing and CS go live first because the sales cycle is shorter. Finance, Legal, Medical, Government move on a 6–24 month cycle and require dedicated SOC 2 / ISO 27001 paperwork — we'll have those by Q4.
First systematic comparison of mergekit recipes on the same base pair (Llama-3.1-8B-Instruct + DeepSeek-R1-Distill-Llama-8B), same token budget (350), same answer extractor, same eval set. Run on our 4-node federated compute pool.
| Model | GSM8K-10 Accuracy | Reasoning markers / gen | Note |
|---|---|---|---|
| DARE-TIES merge | 70% | 3.60 | 🥇 Only recipe to lift over either parent |
| Hermes-3 baseline | 60% | 0.40 | Extremely terse style |
| SLERP merge | 60% | 4.30 | Preserves Llama; published in our DOI 20404139 |
| Linear merge | 50% | 4.50 | Naive averaging dilutes capability |
| Passthrough merge | 50% | 10.50 | 🔥 3× verbose — layer-stacking induces reasoning chatter |
| TIES merge | 30% | 3.10 | Trim+vote degrades on this pair |
| DeepSeek-R1-Distill baseline | 10% | 3.50 | Token budget caveat; reasoning chains don't fit in 350 tokens |
n=10, single eval set. Larger n + multi-domain eval suite in progress.
No, on general intelligence we lose by a wide margin. We're better at one specific thing: serving a domain your data can't leave. If "use Claude" is a viable option for your team, our service isn't for you.
Paid tenants run on dedicated machines isolated from the federated pool. Each customer's RAG index lives on a single physical node we own and audit. We never train on customer data unless explicitly contracted. See our security statement for specifics.
Pro tier and above include downloadable GGUF weights. You can run them on any llama.cpp-compatible runtime. No vendor lock-in.
Your service enters graceful degradation over 60 days: no more base-model upgrades, no security patches, no RAG refresh. After day 91 the endpoint stops serving. Data retained for 60 more days for re-activation, then permanently deleted per our privacy policy.
Per-token pricing requires counting what's in your queries. We'd rather not. Flat monthly subscription with rate limits, unmetered within the limit.
On localized open-source bases (Llama + TAIDE for 繁中, Llama + Swallow for 日本語), yes, often by a large margin. We can deliver a 繁中-native merge that doesn't sound translated.
The public composer at /Frankenstein/ lets you drag any two of our pre-loaded base models onto the skeleton, pick a recipe, and chat with the result via our 4-node fast inference pool. No signup, no card.
Open public composer →free tier: 100 generations / day · 4-node pool · phase1 SLERP loaded by default