LLMs 101: Choosing between open-source and closed-source models

LLMs 101: Choosing between open-source and closed-source models

Large-language-model APIs: from OpenAI, Anthropic, or newcomers such as Deepseek; are now only a click away. Yet each month a “GPT-killer” seems to launch, and the obvious question for any decision-maker is:

“Do we adopt the shiniest new closed-source API or run an open-source model ourselves?”

Large-language-model APIs: from OpenAI, Anthropic, or newcomers such as Deepseek; are now only a click away. Yet each month a “GPT-killer” seems to launch, and the obvious question for any decision-maker is:

“Do we adopt the shiniest new closed-source API or run an open-source model ourselves?”

Large-language-model APIs: from OpenAI, Anthropic, or newcomers such as Deepseek; are now only a click away. Yet each month a “GPT-killer” seems to launch, and the obvious question for any decision-maker is:

“Do we adopt the shiniest new closed-source API or run an open-source model ourselves?”

To cut through the noise, we spoke with Yoann Veny, Data-Science Manager at Agilytic. Below is a condensed playbook that skips the generic AI primer and goes straight to what matters: open vs. closed source, plus when a hybrid stack makes sense.

1. What exactly counts as an “open-source” LLM?

Level of openness

What you get

Fully Open (Rarest)

Weights + code + detailed dataset list or raw data. Rare for modern, frontier-scale LLMs due to copyright.

Open Model

Weights and training recipes. Data often scrubbed but documented. I.e. Mistral

Open Weights (Most common)

Downloadable model weights + inference code. Training data may be redacted or partially released. I.e. DeepSeek R1, Llama



For this article, “open source” means at least open weights plus a license that permits commercial self-hosting. Anything that hides the weights (e.g., GPT-4o, Claude 3) we label closed source.

2. Why teams pick open-source LLMs

“If the vendor decides one day to tweak price or performance, you find out once your bill comes in. Self-hosting avoids unexpected fluctuations.” -Yoann

Key advantages

  1. Full Customization & Bias Control: Fine-tune on internal data and inspect layers when results look odd.

  2. Data Sovereignty: No text leaves your firewall, easing GDPR, HIPAA, or bank-secrecy audits.

  3. Predictable costs: Once the hardware amortized, high-volume usage is often cheaper than per-token APIs.

Case in point: (hypothetical but commonplace)

A boutique law firm must summarize thousands of highly-confidential PDFs. They can fine-tune an open-source model on-prem, add a retrieval layer, and achieve sufficiently accurate clause extraction, without a single document touching a provider’s cloud.

Why open wins: Data never leaves site · Legal jargon fine-tuning · Stable hardware costs

3. Why teams pick closed-source LLMs

Closed APIs shine when speed and staff capacity outweigh control concerns.

Key advantages

  1. Days-to-Prod: No hardware or model training required.

  2. Lower cost R&D: Vendors iterate frequently; you inherit the gains.

  3. Elastic Pricing: Pay only for what you call (handy for spikes, iterating or testing).

→ Case in point: Automated Invoice Processing

A logistics company suffered late fees and compliance issues from manual invoice handling. By adopting Azure’s closed-source LLMs, we automated their data extraction and validations. The solution reduced labor costs by €100k/year and cut processing time by 70%.

Why Closed Source Won Here:

  • Fast Deployment: Existing APIs integrated directly with the company’s systems.

  • Enterprise Support & SLAs: Microsoft’s reliability and security provisions were critical to the client.

  • Scalability: Azure cloud handled traffic spikes without needing on-premise hardware upgrades.

4. Side-by-side cheat-sheet

Factor

Open-Source

Closed-Source

Customization

Full fine-tune, weight surgery

Prompt-only or limited adapters

Up-front effort

GPU infra, Training and Dev

Minimal

Ongoing cost

Flat (hardware + power)

Variable (per-token)

Compliance

You control locality & logs

Rely on vendor attestations

Road-map risk

DIY upgrades

Vendor lock-in / price shifts

5. Making the call

As Yoann neatly sums it up: “It’s always case by case”.

  • Go open if data can never leave, you have data science muscle, and workloads are steady.

  • Go closed for rapid pilots, spiky demand, or when best-in-class accuracy trumps transparency.

To cut through the noise, we spoke with Yoann Veny, Data-Science Manager at Agilytic. Below is a condensed playbook that skips the generic AI primer and goes straight to what matters: open vs. closed source, plus when a hybrid stack makes sense.

1. What exactly counts as an “open-source” LLM?

Level of openness

What you get

Fully Open (Rarest)

Weights + code + detailed dataset list or raw data. Rare for modern, frontier-scale LLMs due to copyright.

Open Model

Weights and training recipes. Data often scrubbed but documented. I.e. Mistral

Open Weights (Most common)

Downloadable model weights + inference code. Training data may be redacted or partially released. I.e. DeepSeek R1, Llama



For this article, “open source” means at least open weights plus a license that permits commercial self-hosting. Anything that hides the weights (e.g., GPT-4o, Claude 3) we label closed source.

2. Why teams pick open-source LLMs

“If the vendor decides one day to tweak price or performance, you find out once your bill comes in. Self-hosting avoids unexpected fluctuations.” -Yoann

Key advantages

  1. Full Customization & Bias Control: Fine-tune on internal data and inspect layers when results look odd.

  2. Data Sovereignty: No text leaves your firewall, easing GDPR, HIPAA, or bank-secrecy audits.

  3. Predictable costs: Once the hardware amortized, high-volume usage is often cheaper than per-token APIs.

Case in point: (hypothetical but commonplace)

A boutique law firm must summarize thousands of highly-confidential PDFs. They can fine-tune an open-source model on-prem, add a retrieval layer, and achieve sufficiently accurate clause extraction, without a single document touching a provider’s cloud.

Why open wins: Data never leaves site · Legal jargon fine-tuning · Stable hardware costs

3. Why teams pick closed-source LLMs

Closed APIs shine when speed and staff capacity outweigh control concerns.

Key advantages

  1. Days-to-Prod: No hardware or model training required.

  2. Lower cost R&D: Vendors iterate frequently; you inherit the gains.

  3. Elastic Pricing: Pay only for what you call (handy for spikes, iterating or testing).

→ Case in point: Automated Invoice Processing

A logistics company suffered late fees and compliance issues from manual invoice handling. By adopting Azure’s closed-source LLMs, we automated their data extraction and validations. The solution reduced labor costs by €100k/year and cut processing time by 70%.

Why Closed Source Won Here:

  • Fast Deployment: Existing APIs integrated directly with the company’s systems.

  • Enterprise Support & SLAs: Microsoft’s reliability and security provisions were critical to the client.

  • Scalability: Azure cloud handled traffic spikes without needing on-premise hardware upgrades.

4. Side-by-side cheat-sheet

Factor

Open-Source

Closed-Source

Customization

Full fine-tune, weight surgery

Prompt-only or limited adapters

Up-front effort

GPU infra, Training and Dev

Minimal

Ongoing cost

Flat (hardware + power)

Variable (per-token)

Compliance

You control locality & logs

Rely on vendor attestations

Road-map risk

DIY upgrades

Vendor lock-in / price shifts

5. Making the call

As Yoann neatly sums it up: “It’s always case by case”.

  • Go open if data can never leave, you have data science muscle, and workloads are steady.

  • Go closed for rapid pilots, spiky demand, or when best-in-class accuracy trumps transparency.

To cut through the noise, we spoke with Yoann Veny, Data-Science Manager at Agilytic. Below is a condensed playbook that skips the generic AI primer and goes straight to what matters: open vs. closed source, plus when a hybrid stack makes sense.

1. What exactly counts as an “open-source” LLM?

Level of openness

What you get

Fully Open (Rarest)

Weights + code + detailed dataset list or raw data. Rare for modern, frontier-scale LLMs due to copyright.

Open Model

Weights and training recipes. Data often scrubbed but documented. I.e. Mistral

Open Weights (Most common)

Downloadable model weights + inference code. Training data may be redacted or partially released. I.e. DeepSeek R1, Llama



For this article, “open source” means at least open weights plus a license that permits commercial self-hosting. Anything that hides the weights (e.g., GPT-4o, Claude 3) we label closed source.

2. Why teams pick open-source LLMs

“If the vendor decides one day to tweak price or performance, you find out once your bill comes in. Self-hosting avoids unexpected fluctuations.” -Yoann

Key advantages

  1. Full Customization & Bias Control: Fine-tune on internal data and inspect layers when results look odd.

  2. Data Sovereignty: No text leaves your firewall, easing GDPR, HIPAA, or bank-secrecy audits.

  3. Predictable costs: Once the hardware amortized, high-volume usage is often cheaper than per-token APIs.

Case in point: (hypothetical but commonplace)

A boutique law firm must summarize thousands of highly-confidential PDFs. They can fine-tune an open-source model on-prem, add a retrieval layer, and achieve sufficiently accurate clause extraction, without a single document touching a provider’s cloud.

Why open wins: Data never leaves site · Legal jargon fine-tuning · Stable hardware costs

3. Why teams pick closed-source LLMs

Closed APIs shine when speed and staff capacity outweigh control concerns.

Key advantages

  1. Days-to-Prod: No hardware or model training required.

  2. Lower cost R&D: Vendors iterate frequently; you inherit the gains.

  3. Elastic Pricing: Pay only for what you call (handy for spikes, iterating or testing).

→ Case in point: Automated Invoice Processing

A logistics company suffered late fees and compliance issues from manual invoice handling. By adopting Azure’s closed-source LLMs, we automated their data extraction and validations. The solution reduced labor costs by €100k/year and cut processing time by 70%.

Why Closed Source Won Here:

  • Fast Deployment: Existing APIs integrated directly with the company’s systems.

  • Enterprise Support & SLAs: Microsoft’s reliability and security provisions were critical to the client.

  • Scalability: Azure cloud handled traffic spikes without needing on-premise hardware upgrades.

4. Side-by-side cheat-sheet

Factor

Open-Source

Closed-Source

Customization

Full fine-tune, weight surgery

Prompt-only or limited adapters

Up-front effort

GPU infra, Training and Dev

Minimal

Ongoing cost

Flat (hardware + power)

Variable (per-token)

Compliance

You control locality & logs

Rely on vendor attestations

Road-map risk

DIY upgrades

Vendor lock-in / price shifts

5. Making the call

As Yoann neatly sums it up: “It’s always case by case”.

  • Go open if data can never leave, you have data science muscle, and workloads are steady.

  • Go closed for rapid pilots, spiky demand, or when best-in-class accuracy trumps transparency.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

Ready to reach your goals with data?

If you want to reach your goals through the smarter use of data and A.I., you're in the right place.

© 2025 Agilytic

© 2025 Agilytic

© 2025 Agilytic