Resolving Proxmox Web Interface Issues Due to Corrupted Cluster Database
Recently, I encountered an issue where my Proxmox node's web interface became inaccessible. After manually SSH-ing into the affected node, I started diagnosing the issue.
Symptoms and Initial Troubleshooting
Upon checking the status of the pve-firewall service, I saw multiple errors:
Jun 27 12:44:15 hv02 pve-firewall[503543]: ipcc_send_rec[1] failed: Connection refused
Jun 27 12:44:15 hv02 pve-firewall[503543]: ipcc_send_rec[2] failed: Connection refused
Jun 27 12:44:15 hv02 pve-firewall[503543]: ipcc_send_rec[3] failed: Connection refused
Additionally, the pveproxy service reported issues with SSL certificates:
Jun 29 03:52:52 hv03 pveproxy[3462962]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)
Jun 29 03:52:52 hv03 pveproxy[3462963]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)
These errors pointed toward potential issues with the Proxmox Cluster File System (pmxcfs).
Identifying the Root Cause
Running the command /usr/bin/pmxcfs revealed a critical issue:
[database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x000000000303AD52...) vs. B:(inode = 0x000000000306F1A7...)
[database] crit: DB load failed
[main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
This indicated corruption in the pmxcfs configuration database (config.db), with duplicate inode entries causing the database to fail to load.
Resolving the Issue
To inspect the problematic entries, I executed:
sqlite3 /var/lib/pve-cluster/config.db 'SELECT inode,mtime,name FROM tree WHERE parent = 0x000000000303AD50'
This confirmed two duplicate entries named lxc. To safely fix the issue, I first made a backup of the database:
cp /var/lib/pve-cluster/config.db /var/lib/pve-cluster/config.db.bk
Initially, I deleted the entry using:
sqlite3 /var/lib/pve-cluster/config.db 'DELETE FROM tree WHERE parent = 50786727 OR inode = 50786727'
This action restored access to the web interface but resulted in all VMs and containers being missing from pct list. Recognizing this unintended consequence, I restored from the backup:
cp /var/lib/pve-cluster/config.db.bk /var/lib/pve-cluster/config.db
I then carefully targeted the correct inode causing the duplicate entry:
sqlite3 /var/lib/pve-cluster/config.db 'DELETE FROM tree WHERE parent = 50572626 OR inode = 50572626'
After executing this, I restarted the essential Proxmox services:
systemctl restart pve-cluster pveproxy pvestatd
Successful Outcome
This final approach successfully resolved the issue completely. The Proxmox web interface became accessible, and all VMs and containers reappeared correctly.
Resources used:
https://forum.proxmox.com/threads/vm-status-unknown-grey-question-mark.92359/
https://forum.proxmox.com/threads/pve-cluster-fails-to-start.82861/
https://nramkumar.org/tech/blog/2023/07/08/proxmox-fixing-your-database-after-a-host-name-change/
https://forum.proxmox.com/threads/the-etc-pve-directory-disappeared.128117/
https://forum.proxmox.com/threads/unable-to-load-access-control-list-connection-refused.72245/page-2
https://www.reddit.com/r/Proxmox/comments/1bx92wv/web_admin_not_loading_after_reboot/

