SAN Testing

A few years ago, when I started working for IBM as a Co-op (Intern), I discovered this whole new world of enterprise storage. I just saw on my acceptance letter…“SAN TEST.” My first reaction was “What is SAN?” My next reaction was “What does test involve?” I honestly had no idea (being a college student) that companies thoroughly tested products….as crazy as that may sound. But, as a consumer, do you really think about those kinds of things? No. Most of us just buy a product and don’t even think about the kind of work that went into that product to get it into the state that it is when it hits the market. Because believe it or not…programmers aren’t perfect either. 🙂 So, what does SAN Testing include? There is entirely too much to even begin to describe. The problem is that any little tweak could cause a number of other things to break. At IBM, I was testing fibre channel switches…so, if a vendor released a new firmware or driver update for A switch, we would test it with pretty much ALL of the IBM storage/server products to make sure that the change didn’t affect the full operation of the SAN. Just take a second and think about the number of combinations involved in that. Not only that, but I wasn’t the only one testing…there were many many groups testing…so things really get tested thoroughly. Now, testing eliminates a lot of problems, but there are just some instances/scenarios that we can’t simulate and bugs do appear. But, the idea is to keep those to an absolute minimum and try to get those resolved ASAP..again without breaking anthing else. Now, my job is strictly with Windows. Instead of focusing on the switch firmware/driver, we (more so) focus on the HBA (Host Bus Adapter) firmware and Windows driver. With our storage, I do various driver installations, upgrading/downgrading HBA firmware levels, link speed negotiation with the switch, basic LUN provisioning, basic I/O (making sure we are able to read and write data to the disk without ANY errors), fault injection (performing cluster failover/failbacks, panics, reboots, high I/O load, etc. to make sure host is not affected AT ALL), SAN Booting, and various timeout value tests. The timeout value tests are important because they affect the way everything would behave in the event of a failure. If any of you readers out there made it to this point and have any questions or comments about SAN Testing, feel free to post a reply or a message in the forum…I’d be happy to make an attempt to answer your question.

[tags]netapp, ibm, san, test, fibre, lun, fcp[/tags]

James Burke
Frisco, TX