Automating Proxmox & Ceph with Ansible

2026/01/05

Setting up a high-availability Proxmox VE (PVE) cluster with Ceph storage is a rite of passage for many homelab enthusiasts. While the GUI is fantastic, automating the process ensures reproducibility and saves time on rebuilds.
In this post, I’ll walk through an Ansible playbook I wrote to configure a 3-node PVE cluster, deploy Ceph (including a specific partition setup for NVMe), and prepare the system for Terraform/OpenTofu provisioning.
You can find the full repository here.

Prerequisites

Before running the playbook, I assume a few things about the environment:

1. Inventory and Secrets

First, we define our cluster in inventory/cluster.ini:

pve-01 ansible_host=192.168.1.11
pve-02 ansible_host=192.168.1.12
pve-03 ansible_host=192.168.1.13

[cluster]
pve-01
pve-02
pve-03

To handle sensitive data like the PVE root password and the Terraform user password, I use Ansible Vault.
First, create a directory for your secrets (outside the repo is safer) and create the vault password file:

mkdir -p ~/ansible_secrets
echo "MY_VAULT_PASSWORD" > ~/ansible_secrets/vault
chmod 600 ~/ansible_secrets/vault

(Replace MY_VAULT_PASSWORD with your actual secure password)
Next, generate the encrypted strings for your variables.

  1. Generate the encrypted Proxmox VE password (Use the root password you set during installation):
ansible-vault encrypt_string 'MY_PVE_PASS' --name pvepass --vault-password-file ~/ansible_secrets/vault
  1. Generate the encrypted Terraform provisionner password (This user will be used later for managing VMs via Terraform/OpenTofu):
ansible-vault encrypt_string 'MY_TOFU_PASS' --name terraform_prov_user --vault-password-file ~/ansible_secrets/vault

Finally, edit inventory/group_vars/all.yml and paste the encrypted strings for pvepass and terraform_prov_user. You should also adjust the other variables to match your environment.

pvepass: !vault |
          $ANSIBLE_VAULT;1.1;AES256...
public_network: 192.168.1.0/24
ceph_osd_disk: /dev/nvme0n1p4
ceph_part_start: 537GB
pool_name: mypool
terraform_prov_user: !vault |
          $ANSIBLE_VAULT;1.1;AES256...

ceph_part_start: This variable is crucial for my setup. It defines exactly where the Ceph partition begins on the disk, allowing me to share the NVMe drive between the OS and Ceph.

2. System Preparation

The first role of the playbook, pve, prepares the OS. Since this is a homelab, we typically don’t have an Enterprise subscription.
The playbook automatically:

3. Creating the PVE Cluster

Automation gets tricky when dealing with cluster joins because the command usually requires interactive password input.

The First Node

On the first node (pve-01), we initialize the cluster named “mycluster”:

- name: create cluster on master node
  ansible.builtin.command: pvecm create mycluster
  args:
    creates: /etc/pve/corosync.conf
  when: inventory_hostname == groups['cluster'][0]

Joining Nodes

For the other nodes, I use a shell task with a heredoc to pass the password (which was decrypted from Ansible Vault) to pvecm add.

- name: join cluster on the nodes 2
  ansible.builtin.shell: |
    pvecm add "{{ hostvars['pve-01'].ansible_host }}" << EOF
    {{ pvepass }}
    yes
    EOF
  args:
    executable: /usr/bin/bash
    creates: /etc/pve/corosync.conf
  when: inventory_hostname == groups['cluster'][1]

The playbook also includes checks (until: ... retries: 10) to ensure the nodes have successfully joined and are visible in pvecm nodes before proceeding.

4. Deploying Ceph

This is the most complex part of the setup. The playbook handles everything from network initialization to OSD creation.

Network and Packages

We install the ceph and ceph-common packages and initialize the Ceph network on the public_network defined in our variables.

- name: configure ceph
  ansible.builtin.command: pveceph init --network "{{ public_network }}"
  args:
    creates: /etc/pve/priv/ceph.client.admin.keyring
  when: inventory_hostname == groups['cluster'][0]

Monitors and Managers

The playbook iterates through the nodes to create Monitors (pveceph mon create) on all three nodes and Managers (pveceph mgr create) on nodes 2 and 3.

Custom Disk Partitioning

Instead of giving Ceph the whole disk, I use the parted module to create a specific partition (partition 4) starting at 537GB. This allows me to use the beginning of the NVMe drive for the Proxmox OS and other data, while dedicating the rest to Ceph.

- name: create partition disk for ceph
  community.general.parted:
    device: /dev/nvme0n1
    number: 4
    part_start: "{{ ceph_part_start }}"
    part_end: "100%"
    label: gpt
    state: present
  when: not ansible_check_mode

We then tell Proxmox to use this specific partition (/dev/nvme0n1p4) as an OSD:

- name: create ceph osd
  ansible.builtin.command: pveceph osd create "{{ ceph_osd_disk }}"
  args:
    creates: /run/systemd/system/ceph-osd.target.wants/ceph-osd@*.service

Pools and Filesystems

Finally, the playbook creates a resource pool (mypool), sets it to autoscale, and creates a CephFS filesystem. It even creates a subvolume group named kubernetes, making the storage ready for a K8s CSI driver immediately.

5. Preparing for Terraform/OpenTofu (IaC)

To manage the virtual machines inside the cluster using Terraform/OpenTofu later, we need a dedicated user with specific privileges.
The ansible/roles/pve/tasks/telmate.yml task file handles this:

6. Usage

Now that the configuration is ready, we can deploy the cluster.

  1. Clone the Repository

Clone this repository to your local machine:

git clone https://github.com/richardpct/pve-homelab.git
cd pve-homelab
  1. Set Up the Environment

It is recommended to use a Python virtual environment to manage dependencies and avoid conflicts with your system packages.
Navigate to the ansible directory:

cd ansible

Create and activate the virtual environment:

python3 -m venv venv
source venv/bin/activate
  1. Install Dependencies

Install the Ansible collections specified in the requirements file:

pip install -r requirements.txt
  1. Run the Playbook:

Execute the main playbook using your inventory file and the vault password file you created earlier:

ansible-playbook -i inventory/cluster.ini pve_cluster.yml --vault-password-file ~/ansible_secrets/vault
  1. Access the GUI

Once the playbook finishes successfully, your cluster is ready.

Conclusion

By automating this process, I can rebuild my entire homelab cluster in minutes rather than hours. This playbook takes three fresh Proxmox installations on consumer hardware (like the Beelink EQR7) and turns them into a production-grade hyperconverged infrastructure.

The cluster is fully prepped for the next phase: Infrastructure as Code. With the dedicated TerraformProv user and role in place, I am now ready to start deploying Virtual Machines with Terraform/OpenTofu to build my Kubernetes cluster.

>> Home