Web lists-archives.com

Other's opinions about best/safest method to migrate LUN's while online




Howdy all,

We are planning on migrating several LUN's we have on an oracle box to a new NetApp all flash storage backend. We've gone through a few tests ourselves to ensure that we don't cause impact to the box and everything has been successful so far. The server will remain up during the migration and we are not planning on bring down any services. I just wanted to see if others had any similar experience and wouldn't mind sharing. Particularly, does anyone see any steps that might cause impact, halt the box, or cause path's to fail where the storage itself becomes unavailable. This is our current high level steps:

  1. Zone the host to include the new HA NetApp pair [no impact, no server changes, only SAN fabric additions (very safe)]
  2. Create a volume on the destination HA NetApp pair [no impact, no server changes (very safe)]
  3. Validate the portset on NetApp to include the destination HA pair [no impact, no changes, verification only (very safe)]
  4. Add reporting nodes to LUN [no impact, no server changes, NetApp additions only (very safe)]
  5. LUN scan each HBA individually ($echo "- - -" > /sys/class/scsi_host/host3/scan && sleep 5 && echo "- - -" > /sys/class/scsi_host/host4/scan) [Should not cause impact (generally safe)]
  6. Validate that 8 new non-optimized paths now appear on the server ($multipath -ll) [no impact, command does not make changes (very safe)]
  7. Validate the new paths are secondary ($sanlun lun show -p -v) [no impact, command does not make changes (very safe)]
  8. Perform the NetApp LUN move [no impact, no server changes (very safe)]
  9. Remove the reporting nodes from LUN [no impact, no server changes, NetApp deletion only (generally safe)]
  10. Validate the 8 original paths are now failed ($multipath -ll) [no impact, command does not make changes (very safe)]
  11. Validate that Linux automatically sees 4 optimized paths among the 8 new paths ($sanlun lun show -p -v) [no impact, command does not make changes (very safe)]
  12. Delete the failed paths (echo 1 > /sys/block/sdX/device/delete) [Should not cause impact (generally safe)]
My only concern is related to part of some Red Hat documentation I came across [1] that states the following:

    "interconnect scanning is not recommended when the system is under memory pressure. To determine the level of memory pressure, run the command vmstat 1 100; interconnect scanning is not recommended if free memory is less than 5% of the total memory in more than 10 samples per 100. It is also not recommended if swapping is active (non-zero si and so columns in the vmstat output). The command free can also display the total memory."

These oracle boxes typically have all their memory used (I see the cached 39G).

    [root@oraspace01 ~]# free -g
                 total       used       free     shared    buffers     cached
    Mem:           188        187          1          0          0         39
    -/+ buffers/cache:        147         41
    Swap:           79          0         79

I'm not an oracle DBA so I don't know a lot of specifics about their inter-workings, but from what I understand some oracle systems/processes can use all the memory a machine has, no matter how much you give it. I've seen ZFS and VMWare do this as well. They claim a large amount of memory, but aren't using it until they actually need it. It's more efficient and allows for higher throughput and processing. So the fact that free thinks the machine is low on memory isn't really an issue for me, I'm just concerned with the documentation shown earlier.

Does anyone know if running a scan on the SCSI bus while the system thinks there isn't much available memory would cause issues? Has anyone done similar types of migrations (doesn't have to be with NetApp). In essence all we are doing is presenting additional paths temporarily, moving the storage, then deleting the old paths. Is there a better way to delete paths? A rescan of the SCSI bus only adds paths (at least from what I found and read). Anybody have some nifty or cleaver step to add that makes things easier/safer/better/faster/etc?

Thanks,
Joshua Schaeffer

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/scanning-storage-interconnects.html