LLMs 101: Choosing between open-source and closed-source models

Large-language-model APIs: from OpenAI, Anthropic, or newcomers such as Deepseek; are now only a click away. Yet each month a “GPT-killer” seems to launch, and the obvious question for any decision-maker is:

“Do we adopt the shiniest new closed-source API or run an open-source model ourselves?”

Large-language-model APIs: from OpenAI, Anthropic, or newcomers such as Deepseek; are now only a click away. Yet each month a “GPT-killer” seems to launch, and the obvious question for any decision-maker is:

“Do we adopt the shiniest new closed-source API or run an open-source model ourselves?”

Large-language-model APIs: from OpenAI, Anthropic, or newcomers such as Deepseek; are now only a click away. Yet each month a “GPT-killer” seems to launch, and the obvious question for any decision-maker is:

“Do we adopt the shiniest new closed-source API or run an open-source model ourselves?”

To cut through the noise, we spoke with Yoann Veny, Data-Science Manager at Agilytic. Below is a condensed playbook that skips the generic AI primer and goes straight to what matters: open vs. closed source, plus when a hybrid stack makes sense.

1. What exactly counts as an “open-source” LLM?

Level of openness	What you get
Fully Open (Rarest)	Weights + code + detailed dataset list or raw data. Rare for modern, frontier-scale LLMs due to copyright.
Open Model	Weights and training recipes. Data often scrubbed but documented. I.e. Mistral
Open Weights (Most common)	Downloadable model weights + inference code. Training data may be redacted or partially released. I.e. DeepSeek R1, Llama

For this article, “open source” means at least open weights plus a license that permits commercial self-hosting. Anything that hides the weights (e.g., GPT-4o, Claude 3) we label closed source.

2. Why teams pick open-source LLMs

“If the vendor decides one day to tweak price or performance, you find out once your bill comes in. Self-hosting avoids unexpected fluctuations.” -Yoann

Key advantages

Full Customization & Bias Control: Fine-tune on internal data and inspect layers when results look odd.
Data Sovereignty: No text leaves your firewall, easing GDPR, HIPAA, or bank-secrecy audits.
Predictable costs: Once the hardware amortized, high-volume usage is often cheaper than per-token APIs.

Case in point: (hypothetical but commonplace)

A boutique law firm must summarize thousands of highly-confidential PDFs. They can fine-tune an open-source model on-prem, add a retrieval layer, and achieve sufficiently accurate clause extraction, without a single document touching a provider’s cloud.

Why open wins: Data never leaves site · Legal jargon fine-tuning · Stable hardware costs

3. Why teams pick closed-source LLMs

Closed APIs shine when speed and staff capacity outweigh control concerns.

Key advantages

Days-to-Prod: No hardware or model training required.
Lower cost R&D: Vendors iterate frequently; you inherit the gains.
Elastic Pricing: Pay only for what you call (handy for spikes, iterating or testing).

→ Case in point: Automated Invoice Processing

A logistics company suffered late fees and compliance issues from manual invoice handling. By adopting Azure’s closed-source LLMs, we automated their data extraction and validations. The solution reduced labor costs by €100k/year and cut processing time by 70%.

Why Closed Source Won Here:

Fast Deployment: Existing APIs integrated directly with the company’s systems.
Enterprise Support & SLAs: Microsoft’s reliability and security provisions were critical to the client.
Scalability: Azure cloud handled traffic spikes without needing on-premise hardware upgrades.

4. Side-by-side cheat-sheet

Factor	Open-Source	Closed-Source
Customization	Full fine-tune, weight surgery	Prompt-only or limited adapters
Up-front effort	GPU infra, Training and Dev	Minimal
Ongoing cost	Flat (hardware + power)	Variable (per-token)
Compliance	You control locality & logs	Rely on vendor attestations
Road-map risk	DIY upgrades	Vendor lock-in / price shifts

5. Making the call

As Yoann neatly sums it up: “It’s always case by case”.

Go open if data can never leave, you have data science muscle, and workloads are steady.
Go closed for rapid pilots, spiky demand, or when best-in-class accuracy trumps transparency.

To cut through the noise, we spoke with Yoann Veny, Data-Science Manager at Agilytic. Below is a condensed playbook that skips the generic AI primer and goes straight to what matters: open vs. closed source, plus when a hybrid stack makes sense.

1. What exactly counts as an “open-source” LLM?

Level of openness	What you get
Fully Open (Rarest)	Weights + code + detailed dataset list or raw data. Rare for modern, frontier-scale LLMs due to copyright.
Open Model	Weights and training recipes. Data often scrubbed but documented. I.e. Mistral
Open Weights (Most common)	Downloadable model weights + inference code. Training data may be redacted or partially released. I.e. DeepSeek R1, Llama

For this article, “open source” means at least open weights plus a license that permits commercial self-hosting. Anything that hides the weights (e.g., GPT-4o, Claude 3) we label closed source.

2. Why teams pick open-source LLMs

“If the vendor decides one day to tweak price or performance, you find out once your bill comes in. Self-hosting avoids unexpected fluctuations.” -Yoann

Key advantages

Full Customization & Bias Control: Fine-tune on internal data and inspect layers when results look odd.
Data Sovereignty: No text leaves your firewall, easing GDPR, HIPAA, or bank-secrecy audits.
Predictable costs: Once the hardware amortized, high-volume usage is often cheaper than per-token APIs.

Case in point: (hypothetical but commonplace)

A boutique law firm must summarize thousands of highly-confidential PDFs. They can fine-tune an open-source model on-prem, add a retrieval layer, and achieve sufficiently accurate clause extraction, without a single document touching a provider’s cloud.

Why open wins: Data never leaves site · Legal jargon fine-tuning · Stable hardware costs

3. Why teams pick closed-source LLMs

Closed APIs shine when speed and staff capacity outweigh control concerns.

Key advantages

Days-to-Prod: No hardware or model training required.
Lower cost R&D: Vendors iterate frequently; you inherit the gains.
Elastic Pricing: Pay only for what you call (handy for spikes, iterating or testing).

→ Case in point: Automated Invoice Processing

A logistics company suffered late fees and compliance issues from manual invoice handling. By adopting Azure’s closed-source LLMs, we automated their data extraction and validations. The solution reduced labor costs by €100k/year and cut processing time by 70%.

Why Closed Source Won Here:

Fast Deployment: Existing APIs integrated directly with the company’s systems.
Enterprise Support & SLAs: Microsoft’s reliability and security provisions were critical to the client.
Scalability: Azure cloud handled traffic spikes without needing on-premise hardware upgrades.

4. Side-by-side cheat-sheet

Factor	Open-Source	Closed-Source
Customization	Full fine-tune, weight surgery	Prompt-only or limited adapters
Up-front effort	GPU infra, Training and Dev	Minimal
Ongoing cost	Flat (hardware + power)	Variable (per-token)
Compliance	You control locality & logs	Rely on vendor attestations
Road-map risk	DIY upgrades	Vendor lock-in / price shifts

5. Making the call

As Yoann neatly sums it up: “It’s always case by case”.

Go open if data can never leave, you have data science muscle, and workloads are steady.
Go closed for rapid pilots, spiky demand, or when best-in-class accuracy trumps transparency.

To cut through the noise, we spoke with Yoann Veny, Data-Science Manager at Agilytic. Below is a condensed playbook that skips the generic AI primer and goes straight to what matters: open vs. closed source, plus when a hybrid stack makes sense.

1. What exactly counts as an “open-source” LLM?

Level of openness	What you get
Fully Open (Rarest)	Weights + code + detailed dataset list or raw data. Rare for modern, frontier-scale LLMs due to copyright.
Open Model	Weights and training recipes. Data often scrubbed but documented. I.e. Mistral
Open Weights (Most common)	Downloadable model weights + inference code. Training data may be redacted or partially released. I.e. DeepSeek R1, Llama

For this article, “open source” means at least open weights plus a license that permits commercial self-hosting. Anything that hides the weights (e.g., GPT-4o, Claude 3) we label closed source.

2. Why teams pick open-source LLMs

“If the vendor decides one day to tweak price or performance, you find out once your bill comes in. Self-hosting avoids unexpected fluctuations.” -Yoann

Key advantages

Full Customization & Bias Control: Fine-tune on internal data and inspect layers when results look odd.
Data Sovereignty: No text leaves your firewall, easing GDPR, HIPAA, or bank-secrecy audits.
Predictable costs: Once the hardware amortized, high-volume usage is often cheaper than per-token APIs.

Case in point: (hypothetical but commonplace)

A boutique law firm must summarize thousands of highly-confidential PDFs. They can fine-tune an open-source model on-prem, add a retrieval layer, and achieve sufficiently accurate clause extraction, without a single document touching a provider’s cloud.

Why open wins: Data never leaves site · Legal jargon fine-tuning · Stable hardware costs

3. Why teams pick closed-source LLMs

Closed APIs shine when speed and staff capacity outweigh control concerns.

Key advantages

Days-to-Prod: No hardware or model training required.
Lower cost R&D: Vendors iterate frequently; you inherit the gains.
Elastic Pricing: Pay only for what you call (handy for spikes, iterating or testing).

→ Case in point: Automated Invoice Processing

A logistics company suffered late fees and compliance issues from manual invoice handling. By adopting Azure’s closed-source LLMs, we automated their data extraction and validations. The solution reduced labor costs by €100k/year and cut processing time by 70%.

Why Closed Source Won Here:

Fast Deployment: Existing APIs integrated directly with the company’s systems.
Enterprise Support & SLAs: Microsoft’s reliability and security provisions were critical to the client.
Scalability: Azure cloud handled traffic spikes without needing on-premise hardware upgrades.

4. Side-by-side cheat-sheet

Factor	Open-Source	Closed-Source
Customization	Full fine-tune, weight surgery	Prompt-only or limited adapters
Up-front effort	GPU infra, Training and Dev	Minimal
Ongoing cost	Flat (hardware + power)	Variable (per-token)
Compliance	You control locality & logs	Rely on vendor attestations
Road-map risk	DIY upgrades	Vendor lock-in / price shifts

5. Making the call

As Yoann neatly sums it up: “It’s always case by case”.

Go open if data can never leave, you have data science muscle, and workloads are steady.
Go closed for rapid pilots, spiky demand, or when best-in-class accuracy trumps transparency.

LLMs 101: Choosing between open-source and closed-source models

LLMs 101: Choosing between open-source and closed-source models

1. What exactly counts as an “open-source” LLM?

2. Why teams pick open-source LLMs

Key advantages

Case in point: (hypothetical but commonplace)

3. Why teams pick closed-source LLMs

Key advantages

4. Side-by-side cheat-sheet

5. Making the call

1. What exactly counts as an “open-source” LLM?

2. Why teams pick open-source LLMs

Key advantages

Case in point: (hypothetical but commonplace)

3. Why teams pick closed-source LLMs

Key advantages

4. Side-by-side cheat-sheet

5. Making the call

1. What exactly counts as an “open-source” LLM?

2. Why teams pick open-source LLMs

Key advantages

Case in point: (hypothetical but commonplace)

3. Why teams pick closed-source LLMs

Key advantages

4. Side-by-side cheat-sheet

5. Making the call

Ready to reach your goals with data?

Get started

Ready to reach your goals with data?

Get started

Ready to reach your goals with data?

Get started

Ready to reach your goals with data?

Get started

Insights. Actions. Results.

Join our newsletter

Insights. Actions. Results.

Insights. Actions. Results.

Join our newsletter