Disk Migration Retro

Brain Dump From Migrations

-[Recording](MDClone tech retro-20241119_123541-Meeting Recording.mp4

Iteration 1
- Could not boot VMs
- /var/log not on root partition
- cannot use mixed disk types
Iteration 2
- Original plan was to recreate all the nodes to place managed disks using a copy of OS disks
- Kubernetes did not like the OS disk swap
- Spawning additional namespaces
  - Root cause could not be determined
Iteration 3
- add a node
- repair a node or replace it
Note: 09/25 is deadline for all managed disks
CW4 -
Must move entire MDClone infrastructure off of managed disk
If iteration 3 is successful
- We can detach CF nodes, rebuild and reattach them
Rebuilding CM nodes on current version is potentially a full week outage
Upgrade will require a complete rebuild
MDClone v7 to v10
MAJOR LESSON: add nodes instead of upgrade nodes
- Cannot mix disk types managed/unmanaged
/mdclone - NFS share if is over 95% usage it impacts the app
- add monitoring to this storage location

Future

New version will require a complete rebuild

Improvements

Add documentation and diagrams
Add monitoring for storage
Add splunk logging in next version
Critical - Current OS alerts for CM3 NFS share
- Might be able to monitor via cloudera

Questions

How are we notified of major version updates

Table of Contents

Updated on August 7, 2025