Disk Migration Retro

Brain Dump From Migrations

-[Recording](MDClone tech retro-20241119_123541-Meeting Recording.mp4

  • Iteration 1

    • Could not boot VMs
    • /var/log not on root partition
    • cannot use mixed disk types
  • Iteration 2

    • Original plan was to recreate all the nodes to place managed disks using a copy of OS disks
    • Kubernetes did not like the OS disk swap
    • Spawning additional namespaces
      • Root cause could not be determined
  • Iteration 3

    • add a node
    • repair a node or replace it
  • Note: 09/25 is deadline for all managed disks

  • CW4 -

  • Must move entire MDClone infrastructure off of managed disk

  • If iteration 3 is successful

    • We can detach CF nodes, rebuild and reattach them
  • Rebuilding CM nodes on current version is potentially a full week outage

  • Upgrade will require a complete rebuild

  • MDClone v7 to v10

  • MAJOR LESSON: add nodes instead of upgrade nodes

    • Cannot mix disk types managed/unmanaged
  • /mdclone - NFS share if is over 95% usage it impacts the app

    • add monitoring to this storage location

Future

  • New version will require a complete rebuild

Improvements

  • Add documentation and diagrams
  • Add monitoring for storage
  • Add splunk logging in next version
  • Critical - Current OS alerts for CM3 NFS share
    • Might be able to monitor via cloudera

Questions

  • How are we notified of major version updates

Updated on August 7, 2025