Disk Migration Retro
Brain Dump From Migrations
-[Recording](MDClone tech retro-20241119_123541-Meeting Recording.mp4
-
Iteration 1
- Could not boot VMs
- /var/log not on root partition
- cannot use mixed disk types
-
Iteration 2
- Original plan was to recreate all the nodes to place managed disks using a copy of OS disks
- Kubernetes did not like the OS disk swap
- Spawning additional namespaces
- Root cause could not be determined
-
Iteration 3
- add a node
- repair a node or replace it
-
Note: 09/25 is deadline for all managed disks
-
CW4 -
-
Must move entire MDClone infrastructure off of managed disk
-
If iteration 3 is successful
- We can detach CF nodes, rebuild and reattach them
-
Rebuilding CM nodes on current version is potentially a full week outage
-
Upgrade will require a complete rebuild
-
MDClone v7 to v10
-
MAJOR LESSON: add nodes instead of upgrade nodes
- Cannot mix disk types managed/unmanaged
-
/mdclone - NFS share if is over 95% usage it impacts the app
- add monitoring to this storage location
Future
- New version will require a complete rebuild
Improvements
- Add documentation and diagrams
- Add monitoring for storage
- Add splunk logging in next version
- Critical - Current OS alerts for CM3 NFS share
- Might be able to monitor via cloudera
Questions
- How are we notified of major version updates