01What fine-tuning actually does

A foundation model like GPT-4 or Claude is trained on vast quantities of general text. It develops broad capability but no specific expertise in any particular domain, style, or task. Fine-tuning takes this model and runs additional training on a curated dataset specific to your needs.

If you fine-tune on examples of your organisation's writing style, the model learns to write in that style. If you fine-tune on examples of successful customer service conversations in your industry, the model learns the patterns of effective responses in that context. If you fine-tune on medical literature and clinical case notes, the model develops more specialised medical knowledge.

The result is a model that performs better on the specific tasks represented in the fine-tuning dataset. The tradeoff is that fine-tuning is expensive (it requires GPU compute time and ML engineering expertise), the fine-tuned model requires maintenance as the underlying model is updated, and the improvements are specific to the fine-tuned domain rather than general.

02When fine-tuning is the right approach

Fine-tuning is justified in a limited set of circumstances. It makes sense when you have a high-volume, repetitive AI task where a small improvement in output quality has significant cumulative business value, when the task requires consistent adherence to a highly specific style, format, or domain convention that prompting alone cannot reliably achieve, and when you have a substantial, high-quality dataset of examples that represent the task well.

Good examples where fine-tuning is often justified: a financial institution fine-tuning a model on regulatory reporting documents to improve the quality and consistency of compliance reports; a law firm fine-tuning on examples of their successful legal drafting to produce house-style documents; a customer service platform fine-tuning on thousands of successful support interactions to improve first-response quality.

03When fine-tuning is not the right approach

Many situations that organisations think require fine-tuning are actually better addressed by RAG (Retrieval-Augmented Generation), better prompting, or a different AI design approach.

If the issue is that the AI does not know your specific policies or information, RAG is almost always better than fine-tuning. Fine-tuning embeds information in model weights that are difficult to update; RAG puts information in a knowledge base that is easy to update and audit.

If the issue is that the AI produces inconsistent output, structured prompting and output format instructions are often sufficient without the cost and complexity of fine-tuning.

If you are considering fine-tuning primarily because it sounds like a sophisticated AI approach, that is not a good reason to do it. The most appropriate AI approach is the one that solves your business problem at the lowest cost and complexity, not the most technically advanced one.

04What boards should understand about fine-tuning decisions

Fine-tuning decisions involve several considerations that boards should be aware of when they arise in AI investment discussions.

Cost: fine-tuning at enterprise scale involves significant compute cost and ML engineering time. It should be justified by a business case that shows the improvement in output quality is worth that investment relative to alternatives.

Data: fine-tuning requires a high-quality dataset, which means someone needs to curate, label, and review the training examples. This is often underestimated in cost and time.

Maintenance: a fine-tuned model requires ongoing maintenance. When the underlying foundation model is updated (which happens regularly), the fine-tuning may need to be repeated.

Vendor terms: fine-tuning through Azure OpenAI Service and similar platforms involves specific data handling terms that legal and data protection teams should review. Understanding who has access to the training data and how it is used is a board-level data governance responsibility.

Key Takeaways

1.Fine-tuning trains an existing AI model on a curated domain-specific dataset to improve performance on specific tasks or to establish consistent style.
2.Fine-tuning is justified for high-volume repetitive tasks where consistent quality is critical and you have substantial high-quality training data.
3.RAG (grounding in a knowledge base) is usually better than fine-tuning for making AI knowledgeable about your organisation's specific information.
4.Better prompting often addresses apparent fine-tuning needs at a fraction of the cost; fine-tuning should not be the default response to AI performance issues.
5.Fine-tuning involves significant cost, data requirements, and maintenance overhead; the business case should justify this relative to simpler alternatives.

References & Further Reading

[1]
Azure OpenAI: Fine-Tuning GuideMicrosoft
[2]
Anthropic: Fine-Tuning ClaudeAnthropic

Want to discuss this with an expert?

Book a strategy call to explore how these insights apply to your organisation.

Book a Strategy Call