Enabling Private LLM Execution: Trusted Execution Environments and Encrypted Containers

Introduction

The deployment of Large Language Models in production environments confronts a fundamental tension: models derive value from data, yet regulatory frameworks, competitive considerations, and ethical obligations demand that sensitive data never leave controlled security boundaries.

This is not an academic problem. Healthcare organizations cannot send patient records to external APIs for diagnosis assistance. Financial institutions cannot expose transaction patterns to third-party model providers. Government agencies cannot process classified documents through commercial LLM services.

The conventional response - "keep models and data on-premises" - only pushes the problem inward. Even within organizational boundaries, who has access to plaintext data during inference? System administrators? Cloud providers managing the virtualization layer? Third-party vendors running the ML infrastructure?

The hard boundary is this: how do we allow models to compute over private data without ever exposing that data outside a cryptographically verified, hardware-isolated secure domain?

Two complementary technologies converge into a deployable solution: Trusted Execution Environments (TEEs) and encrypted container runtimes. Together, they create a security architecture where data remains encrypted everywhere except within hardware-attested secure enclaves during computation.

This article explains how this architecture works, why it outperforms alternative privacy-preserving approaches in real-world AI systems, and where hybrid techniques further strengthen guarantees.

The Trust Boundary Problem in AI Systems

Traditional computing security models assume a hierarchy of trust: applications trust the operating system, the OS trusts the hypervisor, the hypervisor trusts the hardware. At each layer, code at higher privilege levels has unrestricted visibility into lower-privilege execution.

This model breaks down for confidential AI workloads. Consider a typical cloud-hosted LLM inference pipeline:

Application layer: LLM inference service receives encrypted query.
Operating system: Manages memory, schedules processes, has full visibility into application memory.
Hypervisor: Orchestrates virtual machines, can inspect VM memory contents.
Cloud provider infrastructure: Operators with physical access can potentially extract data from running systems.

At every layer above the application, there exist privileged actors who can observe plaintext data during computation. Even if data is encrypted at rest and in transit, it must be decrypted to perform inference - and once decrypted in traditional environments, it becomes visible to the entire software stack.

What We Need

An execution environment where:

Data is decrypted only within a hardware-isolated boundary that even privileged software (OS, hypervisor, cloud operators) cannot penetrate.
The integrity of code running inside that boundary can be cryptographically verified before sensitive data is provisioned.
Performance overhead is practical for real-time inference and large-scale fine-tuning workloads.

This is precisely what Trusted Execution Environments provide.

Trusted Execution Environments: Hardware-Enforced Isolation

Trusted Execution Environments are hardware-backed security primitives that create isolated execution contexts, called enclaves, where code and data are protected from all other software - including the operating system, hypervisor, and firmware.

How TEEs Work: Core Mechanisms

1. Memory Encryption

TEE-enabled processors encrypt memory regions associated with secure enclaves. Encryption keys are managed by the CPU itself and are never exposed to software. This means:

Data written to RAM by enclave code is automatically encrypted.
Data read from RAM into the enclave is automatically decrypted.
Any attempt to read enclave memory from outside the enclave yields only ciphertext.

2. Attestation

Before provisioning sensitive data to an enclave, a client can request remote attestation - a cryptographic proof that:

The enclave is running on genuine TEE-enabled hardware.
The code loaded into the enclave matches an expected cryptographic hash.
The platform has not been compromised.

This attestation is signed by hardware-rooted keys that cannot be forged, creating a chain of trust from silicon to application.

3. Isolation Enforcement

Enclave code executes in a protected address space. Even if an attacker compromises the OS or hypervisor, they cannot:

Read enclave memory directly.
Modify enclave code or data.
Observe intermediate computation results.

TEE Implementations for AI Workloads

Intel SGX (Software Guard Extensions)

Supports enclaves up to hundreds of megabytes (recent versions extend this with paging).
Strong isolation but limited enclave memory size can constrain large model execution.
Best suited for inference on smaller models or parameter-efficient fine-tuning workflows.

AMD SEV-SNP (Secure Encrypted Virtualization - Secure Nested Paging)

Encrypts entire virtual machines, not just application enclaves.
Supports full system memory, enabling large-scale LLM inference and fine-tuning.
Attestation covers the entire VM, including OS and application stack.

Confidential GPU Computing (NVIDIA H100 with Confidential Computing)

Extends TEE concepts to GPU memory and computation.
Critical for transformer-based models where GPU acceleration is non-negotiable.
Encrypts GPU memory and provides attestation for GPU-executed code.

ARM TrustZone

Widely deployed in mobile and edge devices.
Provides secure world / normal world partitioning.
Relevant for on-device LLM inference in privacy-sensitive mobile applications.

Why TEEs Matter for LLMs

Unlike purely cryptographic techniques (which we'll discuss later), TEEs operate at near-native performance. Memory encryption introduces overhead - typically 5-15% depending on workload - but this is vastly more practical than orders-of-magnitude slowdowns from fully homomorphic encryption.

For LLM inference, this means:

Real-time response latencies remain feasible.
Batch processing throughput is preserved.
Fine-tuning on large datasets does not become prohibitively expensive.

Encrypted Containers: Defense in Depth

TEEs protect data during execution, but what about data at rest? Container images often include:

Pre-trained model weights (which may themselves be proprietary).
Dataset samples for validation or bootstrapping.
Configuration secrets (API keys, encryption keys).

If container images are stored unencrypted, an attacker with filesystem access can extract these artifacts even if runtime execution is protected.

Apptainer Encrypted Container Support

Apptainer (formerly Singularity) supports encrypted container images where:

The container filesystem is encrypted using symmetric keys.
Decryption occurs only inside the TEE enclave at runtime.
The encryption key is provisioned via attestation - the enclave proves its identity before receiving the key.

How the Pipeline Works

1. Build Phase

Create a container image with the LLM, dependencies, and application logic.
Encrypt the container image using Apptainer's encryption tools.
Store encrypted image in a container registry (public or private).

2. Provisioning Phase

Orchestrator (Kubernetes, cloud scheduler) requests deployment.
TEE enclave initializes and generates an attestation report.
Key management service verifies attestation and provisions decryption key to enclave.

3. Runtime Phase

Enclave decrypts container image in protected memory.
Application loads LLM and begins inference/fine-tuning.
Input data is decrypted inside the enclave, processed, and outputs are re-encrypted before leaving.

4. Cleanup Phase

On completion, enclave memory is wiped.
No plaintext artifacts persist outside the secure boundary.

Why This Matters

Even if an attacker gains access to:

Container registry (they get encrypted blobs, unusable without keys).
Host filesystem (container image remains encrypted).
Hypervisor or OS (enclave memory is encrypted, keys are hardware-protected).

They cannot reconstruct the model, data, or intermediate computation results.

Architecture Pattern: End-to-End Confidential LLM Pipeline

Here's how the components integrate into a production-ready system:

Data Flow

1. Data Ingestion

Client encrypts dataset using a public key associated with the TEE enclave.
Encrypted dataset is uploaded to cloud storage or edge device.

2. Enclave Initialization

TEE enclave starts and requests attestation from the platform.
Attestation report is sent to a key management service (KMS).

3. Key Provisioning

KMS verifies attestation (code hash, platform integrity).
If valid, KMS provisions:
- Decryption key for the dataset.
- Decryption key for the encrypted container.

4. Model Execution

Enclave decrypts container and loads LLM into protected memory.
Enclave decrypts dataset batches as needed for inference/fine-tuning.
All computation occurs within the encrypted memory boundary.

5. Output Handling

Results are re-encrypted using client's public key before exiting the enclave.
Encrypted outputs are returned to the client or stored in encrypted form.

Trust Anchors

Hardware root of trust: CPU-generated attestation keys.
Code integrity: Cryptographic hash of enclave binaries.
Key management: Separate KMS that only provisions keys to verified enclaves.

No single actor - cloud provider, system administrator, or application developer - can unilaterally access plaintext data.

Performance Considerations: TEEs vs Alternatives

Why not use other privacy-preserving techniques like Secure Multi-Party Computation (MPC) or Fully Homomorphic Encryption (FHE)?

Secure Multi-Party Computation (MPC)

Concept: Split computation across multiple parties such that no single party sees the complete data.

Strengths:

No single point of failure for data exposure.
Well-suited for aggregation tasks (e.g., federated learning parameter averaging).

Limitations for LLMs:

Latency: MPC protocols involve multiple rounds of network communication. For transformer inference with billions of parameters, this introduces seconds to minutes of overhead per query.
Throughput: Cannot leverage GPU acceleration effectively due to coordination overhead.
Complexity: Requires multiple non-colluding parties, which may not be available or practical in many deployment scenarios.

Fully Homomorphic Encryption (FHE)

Concept: Perform computations directly on encrypted data without ever decrypting it.

Strengths:

Strongest theoretical privacy guarantee: data never exists in plaintext during computation.

Limitations for LLMs:

Performance: Current FHE schemes introduce 1000x to 1,000,000x slowdowns for arithmetic operations. This makes real-time LLM inference computationally infeasible.
Memory overhead: Encrypted representations are orders of magnitude larger than plaintext, exceeding memory capacity for large models.
Maturity: While FHE research is advancing rapidly, production-ready libraries for deep learning workloads remain limited.

TEE Advantages

Near-native performance: 5-15% overhead vs 100-100,000x for MPC/FHE.

GPU compatibility: Confidential computing extensions support accelerator offloading.

Simpler deployment: Single-party model - no need to coordinate multiple non-colluding nodes.

Immediate practicality: Production-grade hardware (AMD EPYC, Intel Xeon with SGX, NVIDIA H100) is available today.

Hybrid Approaches: Strengthening TEE Security

TEEs are not a panacea. They face known threats:

Side-Channel Attacks

Attackers with physical access or co-located VMs can potentially infer information by observing:

Cache timing: Patterns in memory access can leak information about data-dependent branches.
Power consumption: Differential power analysis on enclave execution.
RowHammer: Memory corruption techniques that exploit DRAM behavior.

While modern TEEs include mitigations (cache partitioning, memory scrambling), residual risks remain in high-threat environments.

Hybrid TEE + Cryptographic Techniques

1. TEE + Differential Privacy

Apply differential privacy noise to outputs before they leave the enclave. This ensures that even if side-channels leak partial information, formal privacy guarantees are preserved.

Use case: Medical diagnosis LLM where outputs must be statistically indistinguishable from outputs on neighboring datasets.

2. TEE + MPC for Key Management

Use MPC to split decryption keys across multiple key servers. The enclave must request key shares from multiple parties and reconstruct the key inside the enclave. This prevents a single compromised KMS from exposing data.

Use case: Multi-jurisdictional deployments where no single entity should have unilateral key access.

3. TEE + Federated Learning

Combine TEEs with federated learning: each data silo runs inference/fine-tuning in a local TEE, then aggregates model updates using secure aggregation protocols. This limits data movement while maintaining local privacy.

Use case: Cross-hospital clinical research where patient data cannot leave institutional boundaries.

Real-World Deployment Scenarios

Scenario 1: Healthcare AI Diagnostics

Challenge: Hospital needs to run diagnostic LLM on patient records without exposing records to cloud providers or model vendors.

Architecture:

Deploy AMD SEV-SNP virtual machines in a confidential cloud (Azure Confidential Computing, AWS Nitro Enclaves).
Encrypt container images containing diagnostic LLM using Apptainer.
Hospital provisions encrypted patient records to enclave after verifying attestation.
Diagnostic results are re-encrypted with hospital's key before returning.

Compliance: Meets HIPAA, GDPR requirements for data confidentiality and access control.

Scenario 2: Financial Fraud Detection

Challenge: Bank wants to use transaction data for fraud detection LLM fine-tuning without exposing transaction patterns to third-party ML platform.

Architecture:

Intel SGX enclaves running on bank's on-premises infrastructure.
Encrypted containers hold fraud detection model and fine-tuning logic.
Transaction data decrypted only inside enclave during training.
Updated model weights encrypted and stored in bank's secure storage.

Compliance: Satisfies PCI-DSS data protection requirements.

Scenario 3: Edge AI for IoT

Challenge: Smart devices need on-device LLM inference for voice assistants without sending audio to cloud.

Architecture:

ARM TrustZone on mobile processors.
Encrypted container with lightweight LLM.
Audio processed entirely in secure world, no plaintext leaves device.

Compliance: Addresses GDPR right to data minimization and on-device processing.

Operational Considerations

Key Management

Secure key provisioning is the linchpin of TEE-based systems. Best practices include:

Separation of Duties: KMS operators cannot access enclave keys directly; provisioning is automated based on attestation verification.

Key Rotation: Periodically re-encrypt datasets and rotate enclave encryption keys to limit exposure windows.

Revocation: Maintain attestation revocation lists to block compromised platform configurations from receiving keys.

Monitoring and Logging

TEE enclaves are opaque by design, which complicates observability. Solutions:

Structured Logging: Enclaves emit encrypted logs that are decrypted only by authorized audit systems.

Attestation Audit Trails: Record all attestation requests and key provisioning events for compliance audits.

Performance Metrics: Non-sensitive telemetry (latency, throughput, error rates) can be exported without breaking confidentiality.

Disaster Recovery

Encrypted datasets and containers must be recoverable if keys are lost. Strategies:

Escrow Keys: Split recovery keys using Shamir's Secret Sharing and distribute to multiple custodians.

Redundant KMS: Deploy KMS across multiple availability zones with synchronized attestation policies.

Limitations and When to Use Alternatives

When TEEs Are Ideal

Real-time inference where latency matters.
Large-scale fine-tuning where throughput is critical.
Cloud deployments where data owners do not trust infrastructure providers.
Regulated industries (healthcare, finance) requiring hardware-backed confidentiality.

When to Consider Alternatives

Use Federated Learning when:

Data cannot be centralized at all, even in encrypted form.
Multiple independent data silos require collaborative model training.

Use Differential Privacy when:

Formal statistical privacy guarantees are required.
Outputs must provably reveal no information about individual data points.

Use FHE when (emerging use case):

Extreme threat models where even TEE hardware is untrusted.
Compute-intensive preprocessing steps where slowdown is acceptable.

The Path Forward: Maturity and Adoption

Current State

Hardware availability: TEE-enabled CPUs and GPUs are shipping in volume (AMD EPYC, Intel Xeon Scalable, NVIDIA H100).
Cloud support: Major providers offer confidential computing services (Azure Confidential VMs, AWS Nitro Enclaves, Google Confidential GKE).
Tooling maturity: Apptainer, Gramine, Occlum provide container runtimes for TEEs.

Emerging Developments

1. Confidential Kubernetes

Extending Kubernetes to orchestrate TEE workloads, with automated attestation and key provisioning integrated into pod lifecycle management.

2. TEE-Aware LLM Frameworks

Libraries that optimize model execution for enclave memory constraints (parameter offloading, gradient checkpointing in encrypted storage).

3. Standardization

Industry groups (Confidential Computing Consortium) are working toward interoperable attestation protocols and portability across TEE implementations.

Barriers to Adoption

Complexity: Setting up attestation infrastructure, encrypted key management, and monitoring requires specialized expertise.

Performance tuning: Optimizing memory access patterns to minimize encryption overhead is non-trivial.

Cost: TEE-enabled hardware and confidential cloud instances carry premium pricing.

Key Takeaways

TEEs provide hardware-isolated execution environments where data is decrypted only within cryptographically attested secure enclaves, invisible to OS, hypervisor, and cloud providers.
Encrypted containers extend protection to data at rest, ensuring model weights and datasets remain confidential even when stored in untrusted registries or filesystems.
Remote attestation creates a chain of trust from hardware to application, allowing clients to verify code integrity before provisioning sensitive data.
Performance is practical for production AI workloads - 5-15% overhead compared to 100-100,000x for alternatives like MPC or FHE.
Hybrid architectures combining TEEs with differential privacy, MPC, or federated learning strengthen security in high-threat environments while preserving usability.
Real-world deployments span healthcare, finance, and edge AI, enabling compliant, confidential LLM inference and fine-tuning where traditional cloud APIs are prohibited.

TEEs and encrypted containers bridge the gap between theoretical privacy-preserving cryptography and practical, high-throughput AI systems. They transform "keep data on-premises" from a limiting constraint into an architectural pattern that enables confidential computation anywhere - cloud, edge, or hybrid environments - without sacrificing performance or scalability.

The future of private AI is not about choosing between utility and confidentiality. It's about building systems where both are guaranteed by hardware, cryptography, and verifiable trust.

Enabling Private LLM Execution: Trusted Execution Environments and Encrypted Containers

Introduction

The Trust Boundary Problem in AI Systems

What We Need

Trusted Execution Environments: Hardware-Enforced Isolation

How TEEs Work: Core Mechanisms

TEE Implementations for AI Workloads

Why TEEs Matter for LLMs

Encrypted Containers: Defense in Depth

Apptainer Encrypted Container Support

How the Pipeline Works

Why This Matters

Architecture Pattern: End-to-End Confidential LLM Pipeline

Data Flow

Trust Anchors

Performance Considerations: TEEs vs Alternatives

Secure Multi-Party Computation (MPC)

Fully Homomorphic Encryption (FHE)

TEE Advantages

Hybrid Approaches: Strengthening TEE Security

Side-Channel Attacks

Hybrid TEE + Cryptographic Techniques

Real-World Deployment Scenarios

Scenario 1: Healthcare AI Diagnostics

Scenario 2: Financial Fraud Detection

Scenario 3: Edge AI for IoT

Operational Considerations

Key Management

Monitoring and Logging

Disaster Recovery

Limitations and When to Use Alternatives

When TEEs Are Ideal

When to Consider Alternatives

The Path Forward: Maturity and Adoption

Current State

Emerging Developments

Barriers to Adoption

Key Takeaways

Frederico Vicente