Kubernetes on Oracle Cloud Free Tier

October 9th, 2025/

12 min read

#kubernetes #oracle-cloud #kustomize #flux

Oracle Cloud Infrastructure (OCI) provide a very generous free tier that is capable of running a highly available Kubernetes cluster.

Preface
Free Tier Resources
Terraform/OpenTofu
- Budget Alerts
- OKE
Applications

Preface

Back in August, I wanted to try out Tailscale, a mesh VPN built on WireGuard. Unfortunately, they didn't use traditional username/password logins. Instead, it required an identity provider, such as Google, Microsoft, GitHub, Apple, or OIDC. My first instinct was to use my own OIDC (Authelia), but that created a paradox; if I'm not home and the server fails, my OIDC fails too, locking me out of the VPN I would need to fix it.

Free Tier Resources

OCI accounts, whether free or paid, have a set of resources that are free of charge in the home region of the tenancy, for the life of the account. Using the Always Free resources, you can provision a virtual machine (VM) instance, the networking, load balancing, and storage resources needed to support the applications that you want to build. With these resources, you can run a small-scale Kubernetes cluster. The search for this solution led me to discover the generous free tier from Oracle Cloud for hosting.

OCI Kubernetes Engine (OKE)

OCI Kubernetes Engine (OKE) is the managed Kubernetes service provided by Oracle, similar to GKE/AKS/EKS provided by the other major cloud providers. With a managed Kubernetes service, the control plane is handled by the cloud provider while the user manages the nodes and pods.

OKE provides basic and enhanced clusters. From the pricing page, a basic cluster is free of charge, while enhanced clusters are $0.10 per hour. The differences between them are covered in the OCI docs.

Compute

Two compute instances shapes are listed as always free:

2 x VM.Standard.E2.1.Micro: AMD 1/8 OCPU and 1 GB memory
1 to 4 x VM.Standard.A1.Flex: ARM Ampere A1 cores and 24 GB of memory

The AMD instances are too small and not supported by OKE anyway. The ARM instances, however, provide a total of 4 ARM vCPUs and 24 GB RAM that can be split across 1 to 4 instances. It makes the most sense to either do two (2 CPU/12 GB) or four (1 CPU/6 GB) nodes within Kubernetes for high availability and even distribution of resources.

Storage

Up to 2 block volumes, 200 GB total is included in the Always Free resources. Furthermore, 20 GB of object storage is also included.

The 200 GB total applies to both boot volumes (50 GB minimum) and block volumes combined.

The "up to 2 block volumes" limit refers to additional block volumes, not including the boot volumes that are required for the compute instances. With a managed Kubernetes service, each PersistentVolume creates its own cloud-backed block volume, which could easily take it over the limit. This can be overcome by maximising the boot volume (i.e. 100 GB for two instances) and using Ceph/Rook, OpenEBS or Longhorn (see below) to use it as a PersistentVolume.

Networking

Two Virtual Cloud Networks (VCNs) can be created in the Always Free tier. This includes all its associated resources such as Subnets (unlimited), NAT Gateways (1), Service Gateways (1), Internet Gateways (1) and Security Lists (5).

Data ingress is free, like all other cloud providers. Data egress is free for the first 10 TB each month. This is practically unlimited for the vast majority of users, equivalent to ~333 GB per day or over 800 hours of 4K video each month.

Unfortunately, there is only a single Availability Domain for most OCI regions. The entire region operates as a single large data centre, but with multiple Fault Domains.

Load Balancer

One Flexible Network Load Balancer (L3/L4) and one regular Load Balancer (L4/L7) are provided in the Always Free tier.

Managed Kubernetes services often create a new load balancer for each ingress/gateway. Using a controller like ingress-nginx or Traefik (see below) routes everything through a Service with type LoadBalancer, which only requires a single resource.

Terraform/OpenTofu

The Terraform/OpenTofu code for the examples below can be found in my GitHub repo calvinbui/infra.

Budget Alerts

Launching the ARM instances with a free account is a hassle as the compute is in high demand, leading to an "Out of Host Capacity" error. I upgraded my account to Pay As You Go to get the instances to launch, but this meant I could be charged if resources went over the Always Free limits.

The code for this can be found in my GitHub repo calvinbui/infra.

To be cautious, I created a $1 budget and two 1% alert rules to be notified whenever the account begins accruing costs.

Budget: oci_budget_budget
Budget Alert Rule: oci_budget_alert_rule

This has already saved me when I recreated my ingress-nginx controller, but for some reason, it didn't delete the orphaned Network Load Balancer.

OKE

Here's my attempt at a draw.io diagram to show all the free resources used to run a Kubernetes cluster in OCI:

These are the resources required (in order):

Virtual Cloud Network (VCN) (oci_core_vcn): software-defined network.
Route Tables (oci_core_route_table): specify how traffic should be routed.
Security Lists (oci_core_security_list): allow ingress/egress to and from the cluster and nodes.
Subnets (oci_core_subnet): subdivisions of the VCN, ideally divided into public and private.
NAT Gateway (oci_core_nat_gateway): internet access for resources in the private subnet (i.e. nodes)
Service Gateway (oci_core_service_gateway): allows the nodes and control plane to interact privately.
Internet Gateway (oci_core_internet_gateway): allows access from the internet to resources in the public subnet (i.e. control plane and load balancers).
Cluster (oci_containerengine_cluster): OKE cluster.
Node Pool (oci_containerengine_node_pool): OKE node pool.
Bastion (oci_bastion_bastion): I've included a Bastion host for secure and public access to the node pools. It is also a free-tier resource.

The code for this can be found in my GitHub repo calvinbui/infra.

Applications

After setting up the free cluster, my priorities changed. I no longer needed it for Tailscale. Instead, I identified two new, better uses for it.

Host a status page to keep users of my Plex, Immich and Bitwarden instances informed about uptime and maintenance.
The clusters I use at work are stable and robust, but that also means they're less dynamic. This cluster is a place for experimentation and pushing beyond my daily work.

Beyond the usual deployments of external-dns, cert-manager, metrics-server and descheduler, these are the applications I've experimented with and will continue using.

The code for my Kubernetes deployments can be found in my GitHub repo calvinbui/k8s.

Flux, Kustomize & SOPS

I experimented with a couple of deployment methods tools (like Helmfile), and landed on a combination of Flux, Kustomize and SOPS.

Flux is a continuous delivery tool to automate deployments. It's primarily a GitOps tool, but I'm using it in Gitless mode primarily for its Helm Controller.
Kustomize is a Kubernetes native configuration management tool. It's made for layering environment-specific changes on top of a common set of base YAML files. I use it to group all my manifest files together in a deployment (i.e. namespace, helm release, CRDs, etc.)
SOPS (Secrets OPerationS) is an editor that encrypts values directly in YAML files, so they can be safely stored in Git. Sensitive values are encrypted with age, a modern alternative to PGP/GPG. With SOPS, I encrypt secret tokens and domain names.

This combination creates a streamlined and secure deployment workflow with these benefits:

Gitless and CLI-driven: No GitOps required. Everything is initiated from the command line, and the feedback loop is almost instant.
One command (kustomize build --enable-alpha-plugins --enable-exec . | kubectl apply -f -): Not the prettiest command, but it can be made into an alias. All applications are deployed in the same consistent way, instead of using different tools. I wanted to approach this like only having to use ansible-playbook for running Ansible Playbooks.
Deploy Helm Charts and regular manifests: Uses regular YAML files to deploy resources as well as Helm charts using the Flux HelmRelease CRD. All deployments are consistent and packaged using Kustomize.
Standard CLI tools (kubectl and kustomize): Kubernetes native and cross-platform. kubectl is the universal language of Kubernetes, and kustomize is now built into kubectl as well. This ensures long-term compatibility and avoids vendor-specific tools or complex plugins.
Code is public and open source: With SOPS handling the encryption, my code can be made public like all my other repos, such as this website, Ansible playbooks and Terraform/OpenTofu infrastructure.

To deploy a new application, these are the steps I perform, using cert-manager as an example. The example below can be found in my GitHub repo calvinbui/k8s.

Start by creating a new folder with the name of the application and a namespace file if required.

namespace.yaml

 ---

 apiVersion: v1
 kind: Namespace
 metadata:
   name: cert-manager

Create a Flux OCIRepository or HelmRepository resource depending on how the Helm chart is distributed. Flux's Source Controller will fetch the chart to be used for the Helm release.

oci-repository.yaml

 ---

 apiVersion: source.toolkit.fluxcd.io/v1
 kind: OCIRepository
 metadata:
   name: cert-manager
 spec:
   interval: 24h
   url: oci://quay.io/jetstack/charts/cert-manager
   ref:
     tag: v1.18.2
   layerSelector:
     mediaType: application/vnd.cncf.helm.chart.content.v1.tar+gzip

Create the HelmRelease to install the Helm chart. It references the previously created OCIRepository as its source. The chart version and values can also be provided.

helm-release.yaml

 ---

 apiVersion: helm.toolkit.fluxcd.io/v2
 kind: HelmRelease
 metadata:
   name: cert-manager
 spec:
   interval: 5s
   chartRef:
     kind: OCIRepository
     name: cert-manager
   values:
     crds:
       enabled: true

     prometheus:
       enabled: false

     config:
       enableGatewayAPI: true

To use cert-manager, a ClusterIssuer is required. It is a CRD resource that can be written in YAML and similarly deployed using this method.

clusterissuer.yaml

 ---

 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 metadata:
   name: letsencrypt
 spec:
   acme:
     server: https://acme-v02.api.letsencrypt.org/directory
     privateKeySecretRef:
       name: cert-manager-letsencrypt
     solvers:
- dns01:
           cloudflare:
             apiTokenSecretRef:
               name: cert-manager-cloudflare-token
               key: apiToken

The ClusterIssuer references a Kubernetes secret. This is when SOPS comes in. At the top level of my repo, I have a .sops.yaml file which contains the public key and what keys to encrypt.

.sops.yaml

 ---

 creation_rules:
-
     path_regex: '.*\.ya?ml$'
     age: "age1abcdefghijklmnopqrstuvwxyz"
     encrypted_regex: "^(data)$"

Next, create the secret file. I like to suffix it with .sops.yaml to indicate that it's an encrypted file.

secret-cloudflare-token.sops.yaml

   ---

   apiVersion: v1
   kind: Secret
   metadata:
     name: cert-manager-cloudflare-token
   type: Opaque
   data:
     apiToken: c2VjcmV0Cg==

Afterwards, I encrypt the file by running sops -e -i secret-cloudflare-token.sops.yaml. Afterwards, if there are any changes I want to make to it, I use sops secret-cloudflare-token.sops.yaml to edit the file instead of decrypting first and encrypting it again afterwards.

encrypted secret-cloudflare-token.sops.yaml

   ---

   apiVersion: v1
   kind: Secret
   metadata:
     name: cert-manager-cloudflare-token
   type: Opaque
   data:
     apiToken: ENC[AES256_GCM,data:jbQMtl0H+55Tt12fFvYHvppn5I0QFyiSQqbmPDgWECwRoI8T8eNoqU9IxyuZ7EwB70/Rl48S2=,iv:FVdDMIpi4a+8I6IIgnwbkUHLYB+Gp+ZH70kwLu6MVcU=,tag:BWC12irGO3zuVfhKUeNHRw==,type:str]
   sops:
     age:
- recipient: age1abcdefghijklmnopqrstuvwxyz
         enc: |
-----BEGIN AGE ENCRYPTED FILE-----
321SlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB2M1BHc3dEaW1WQnR1dVQ5
TjVDV1Q4bGNWRm92b05oMUFGa0M3YsasSVCmxtV0asdtZnBIL2ljcGNMM3M2cC9Y
NjcyU3VtUmFGRnl1NWtvWkIwMmJpMWMKlqCm0GPTVeqnkZ28zmUzF58iEVMrWECw
UOI/t0NNpO9G9HwuXynt/b2fjmTeA/dTlyLEGZi7NEZ5jRbEYMyUsQ==
-----END AGE ENCRYPTED FILE-----
     lastmodified: "2025-10-09T12:26:30Z"
     mac: ENC[AES256_GCM,data:Q76lczr/T1jM137TRwvCqYUZwRG8dG7lhieCgoUO0ez3d4QTVEi2/Te52s1Q94e2/FG1MR8QL2/lcenpJ6+3YckXsnGpLDIbbiX58ZxyZjBHg6CQ7f/fm+cKEKAvv9cV1e0bQROC7PRBEvYDY/MK9MM+vEliFmAJ/pwbX4U3irE=,iv:hjSncvBg4BhAT5tBHByNW7rlAYYxQ1qsHjxNXl82rQc=,tag:P14057IZpECFojUteImfqA==,type:str]
     encrypted_regex: ^(data|hostname)$
     version: 3.10.2

Kubernetes cannot decode SOPS files and won't let you apply an encrypted file either. To decrypt files for deployment, I use KSOPS, a Kustomize plugin by Viaduct. It acts as a generator to decrypt SOPS-encrypted files when running kustomize build.

secret-generator.yaml

   ---

   apiVersion: viaduct.ai/v1
   kind: ksops
   metadata:
     name: secret-generator
     annotations:
       config.kubernetes.io/function: |
exec:
path: ksops
   files:
- cloudflare-token.sops.yaml

The final file is the kustomization file, which packages all our other files together.

kustomization.yaml

 ---

 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 namespace: cert-manager
 generators:
- secret-generator.yaml
 resources:
- namespace.yaml
- helm.yaml
- cluster-issuer.yaml

To test everything works, I run kustomize build --enable-alpha-plugins --enable-exec . and check its output. To apply my changes, I pipe it to kubectl, which looks like:

$ kustomize build --enable-alpha-plugins --enable-exec . | kubectl apply -f -

namespace/cert-manager configured
secret/cert-manager-cloudflare-token configured
clusterissuer.cert-manager.io/letsencrypt configured
helmrelease.helm.toolkit.fluxcd.io/cert-manager configured
ocirepository.source.toolkit.fluxcd.io/cert-manager configured

Sometimes, the command will fail as CRDs may not exist yet. Wait a little bit and run the command again.

Longhorn

Longhorn is a cloud native distributed block storage for Kubernetes. It takes the boot volume of each node into highly available persistent storage. Data is synchronously replicated between nodes automatically to ensure that if one node fails, data is still immediately available from replicas on other healthy nodes, with no data loss.

The total available space is around 60GB. As previously mentioned, the free tier is limited to 200GB. Disk space is also reserved for the operating system and container images automatically by Longhorn. As data is replicated, this cuts the total in half again.

Total data loss is possible if both my nodes go down at the same time. The most critical rule is to never perform maintenance on multiple nodes simultaneously. Always upgrade nodes one by one, and before starting the next, wait for Longhorn to fully sync and rebuild the data replicas from other healthy nodes in the cluster.

Gateway API

Gateway API is the upcoming replacement for Ingress. It's been generally available since October 2023, and there's a full list of controllers that support it.

I started using nginx-ingress, but it will soon be deprecated when InGate is available.

I chose to go with Traefik as my Gateway Controller, as I was familiar with it for my own home server. The only issue I haven't resolved is getting HTTP3 working, as it is not possible to listen on TCP and UDP on the same Kubernetes Service with type LoadBalancer.

OneUptime

Oneuptime is the status page application I currently run on the cluster.

I wanted a service that could support a 'degraded' status, which only Oneuptime and Kener supported. Kener required writing JavaScript to evaluate API/website responses, while Oneuptime was a few clicks. Other tools I looked into include Uptime Kuma, gatus, Checkmate, Peekaping and CheckCle.

Oneuptime is a very resource-heavy application; its system requirements recommend a computer with 8 CPU cores, 16GB of RAM, and 400 GB of disk space. The first time I deployed it without knowing this, both nodes reached 100% memory usage. Obviously, this is beyond our limits, but I've made a few pull requests and changes that keep it much more usable:

Use the Recreate strategy (PR #2023), so a second full installation of Oneuptime doesn't kill the cluster.
Disable the auto-generated date label (PR #2024), so that each deployment doesn't cause all its pods to restart too.
Don't enable every service. Even services that aren't used chew up a lot of resources. These are the services I have safely disabled:
- API Reference
- Docs
- Fluent Ingest
- Home (only Dashboard is needed when self-hosting)
- Incoming Request Ingest
- Isolated VM
- Open Telemetry Collector
- Open Telemetry Ingest
- Server Monitor Ingest
- Worker
- Workflow
Use CloudNativePG to run the PostgreSQL pod. The one provided with the chart is not as thorough and uses more resources.