We do not train on customer data. Here is what that actually means. – Kerf

Every AI vendor in B2B software now claims they “do not train on customer data.” That claim has become so universal it has lost meaning. This post lays out exactly what Kerf does with the data customers send us, what we do not do, and the technical and contractual controls that make the distinction enforceable rather than aspirational.

1. Customer data does not leave the customer’s tenant boundary.

Every Kerf customer runs on dedicated infrastructure. The data plane — the database, the agent runtime, the model gateway, the integration connectors — is provisioned per-customer in a private VPC. There is no shared agent runtime that sees data from multiple customers. There is no shared vector store. The deployment topology is documented in our SSP and confirmed in pen-test reports, both available under mutual NDA.

2. Foundation models we use are not fine-tuned on customer data.

Kerf uses foundation models from Anthropic and OpenAI for natural-language reasoning inside agent workflows. Both vendors operate under our master service agreements with explicit “no training on customer inputs” terms — we have validated those terms with their respective security teams in writing, and we monitor those agreements quarterly. None of our customer data is used to train, fine-tune, or evaluate any model that is shared across our customer base or with model vendors.

3. Per-customer fine-tuning, when used, stays in the customer’s tenant.

For customers whose vendor-tone calibration or disposition-matrix recognition benefits from a fine-tuned model adapter, the adapter is trained on that customer’s data alone, stored in their tenant, and deleted on contract termination. The adapter is never accessible to Kerf employees beyond the named deployment engineering team for that customer.

4. Aggregate operational metrics — yes; customer content — no.

We do collect aggregate operational telemetry — agent latency, error rates, action-acceptance rates by agent class. We use it to improve the platform. The telemetry is structured numerical data: it does not contain RFQ contents, drawing files, vendor names, NCR descriptions, ECO scopes, or any customer-identifying information. It is the equivalent of a stripe.com latency dashboard.

5. ITAR-aware customers run a stricter deployment.

Customers handling ITAR-controlled technical data run on a single-tenant US-only deployment with US-person-only support. No foreign-national engineering access. No data exits a US-region cloud boundary. Foundation-model traffic for ITAR customers routes only to model vendors whose deployment region and operational personnel meet the ITAR exemption criteria — currently Anthropic Claude on AWS US-Gov-East and the Microsoft-hosted OpenAI deployment in US-Gov.

6. Auditable, not just trustworthy.

The above is enforceable. Customers can audit the data-flow diagram, the subprocessor list, the per-tenant deployment topology, and the foundation-model contractual terms. We do not ask any customer to trust us on a marketing claim. We will sit with your security team and walk through the production environment.

If your security team has a question that is not answered above, email security@kerf.com. We respond within one business day.

— Priya Sundaresan, CTO