Dell PowerEdge 2850 RAID Failure

About three weeks ago one of my newest servers had a major failure. The server runs a very critical business web application so uptime is very important. For this reason we configured a very reliable server.

Dell PowerEdge 2850
Dual Xeon 2.8GHz CPU
2GB ECC DDR-SDRAM
RAID 5 LSI RAID Controller
6 x 73GB Maxtor SCSI Disks

Below is the complete saga with Dell broken down.

On 4/27/2006 around 10AM we attempted to log into our clients production web server which hosts their critical business application. Our logins were successful, but we were immediately being bumped back to the login screen. We checked the servers drive state and noticed it had rejected a hard drive from the array. To resolve the problem we attempted to restart the server, however it was unable to fully boot into Windows. FAIL at this point the server is completely unresponsive and will not bring up a login screen.

After some research we found the RAID control was corrupting the data which was being written to the disk. This had been going on for some time, and appears to have corrupted the winlogon.exe.

Within a short amount of time we were able to bring their corporate website back online on the backup server. However their business application took significantly longer to bring back up because of the frequently changing data. To retrieve the live data off the server we booted using a restore tool, and copies the files / database onto a spare hard drive. This was to ensure we had the 100% most recent version of data from any morning transaction. We were able to bring everything back online by 3:30PM on the backup server.

We immediately called Dell who recommended we upgrade the firmware server RAID controller. Dell pointed out that this specific machine had shipped with a firmware which had known problems. First thing in the morning on 4/28/2006 we upgraded the firmware to the recommend version. We also upgraded the motherboard BIOS firmware as recommended by Dell. After letting the server run we scheduled a turn-up for Tuesday May 2nd. This was intended to give the server time to burn-in and ensure the firmware fixed the problem. We also had to completely reinstall Windows 2003 Server + MS SQL server 2000 to bring the server back online.

On 05/02/2006 at 8PM we attempted to bring all the data back over to the production machine. Immediately we checked the servers drive state and noticed it had rejected another hard drive from the array. This is a sign that the problem was not fixed from our firmware update.

The next morning I called Dell back and they shipped overnight a new RAID “Key” chip, controller card memory, and a new backplane for the drives to mount into. We replaced all of this equipment and let the server “burn-in” to ensure this fix would work. We scheduled another launch date for 05/09/2006 after letting the server run over the weekend. At 7PM we meet at the office and moved the site files and database back over. At approximately 9PM, after a final reboot we noticed the server rejected another hard drive. To be safe we immediately moved the site back to the backup server.

On 05/16/2006 after heavy lobbying Dell shipped a new server which seems to have resolved the problem. After further inspection I noticed they changed SCSI hard disk vendors. It’s my theory there is something wrong between MAXTOR + LSI RAID, but at this point I cannot prove anything. The replacement Seagate’s seem to resolve the problem.

This is intended to be a heads up for anyone dealing with the same issue. Level 1 Dell server support seemed to have failed us here, however once the problem was escalated they took action quickly to ship a new server.

No Comments

OpenTTD – My Addiction

You may be wondering why I haven’t posted anything to this blog recently. Although I am busy getting married and taking care of a puppy; I must admit something. I am completely addicted to OpenTTD. OpenTTD is based on the original computer game Transportation Tycoon Deluxe. It’s a complete rewrite and has major improvements over the original. Although not widely played in the United States, there is a large following in Europe. There are always servers available to play on. For anyone who played the original, this is something you should definitely check out. I’ve wasted a massive amount of my time on this game, so I thought it’s high time I pass this addition on to others.

No Comments

Palm Treo 700w Review

I must admit, I have been a bit reluctant to jump on the smart phone bandwagon. The idea of checking my e-mail anytime and place just didn’t seem like the best idea. At work I was offered a Treo 700w so I could learn how they work before they were issued to all our sales staff. It didn’t take me long to fall in love. The Treo 700w is only available from Verizon Wireless however I’m sure it will eventually be a very common device. The biggest improvement over older smart phones is the ability to connect to an exchange server. The functionality however is only available if you’re running Microsoft Exchange SP2.

My first handheld computer was the Compaq iPaq 3850. I was happy with the device, but it didn’t offer much connectivity options back to home base. Also the device was a tad bulky to carry around in a pocket. It always seemed silly to carry around a cell phone and iPaq. The Treo 700w solves that problem by combined an amazing cell phone with all the Windows Mobile features you need. Although there is less screen real estate, the device seems much faster then my iPaq. They have definitely made major improvements in mobile computing. This palm device uses Microsoft Windows Mobile (scary). I’d prefer the traditional palm interface, but the ability to synchronize with Exchange without any special configuration too nice to pass up. I am willing to deal with some interface short-comings for this extra functionality.

Some of the most important features are:

  • ActiveSync with Exchange server
  • Slimmer profile then Treo 650. More like a 90’s cell phone then a handheld computer
  • Clear bright screen
  • Improved layout on desktop page

After using the Treo 700w for just a few days I would highly recommend it to anyone who wants to access all of their change contact, calander, and e-mail on the road. This is not only the best solution, but also one of the only solutions which works directly with Exchange. I am sure more smart phones will be released using this new technology, but for now this is by far the best handheld and phone I’ve ever owned.

No Comments

Dell Expands In India

“NEW DELHI – Computer maker Dell Inc. said Monday it planned to add 5,000 jobs in India over the next two years, bringing its work force in the country to 15,000. Dell is also looking to set up a manufacturing center in India, a move that could help boost the sale of Dell computers here, President and CEO Kevin Rollins told reporters after a meeting with Indian Prime Minister Manmohan Singh. “

As if it wasn’t hard enough to understand their support staff. Although this article points out this is just a manufacturing center. It will take Dell one step closer to finally outsourcing all of their technical / support staff. There is no doubt this will have an impact on their quality of service. I believe a company as big as Dell should have support safe in all different regions of the country. This would help do a number of things:

  • Give employees a normal shift, and stretch support centers across the country to allow for more hours. How excited can a support rep. be at 2am fixing grandma’s printer? Even worse at the end of a 10 hour shift!
  • Support for different dialects. If you’ve ever had to sit on the phone with someone from the deep south you’ll understand it can be just as bad as trying to understand someone in a foreign country. This subject matter is hard enough, don’t make language the barrier.
  • Local support helps everyone. Many communities desperately need technical jobs. Just because something is cheaper doesn’t make it better. If you played a role in building something you’ll be significantly more likely to buy it and recommend it to other people.
  • For more information about Dell checkout my article – Dell Laptops Suck

    No Comments

    Getting Married

    I have started a new website called Wright Family to document the upcoming wedding. Sorry for not posting as much information lately. If you’d like to follow up on the progress visit out photo journal website.

    No Comments

    NTLDR Missing – Fix

    Last night I had a Windows 2003 web server become compromised. It appears the attacker deleted the boot.ini / NTLDR files to prevent the system from starting up. The problem was a little tricky to troubleshoot, but with the right tools I was able to resolve the issue relatively quickly. Incase anyone runs into the same problems below are a good set of steps to troubleshoot.

  • Test the hard drive. Often the source of a NTLDR error is simply that the files have been corrupted by a dead / dying hard drive. If this is the cause *pound head on desk*
  • To test the drive I recommend using Hiren’s BootCD. This is like the “killer appâ€? for any PC tech. It has a tool which will allow you to test any type of hard disk, and will also allow you to browse the NTFS partitions.
  • This would be a good time to copy any mission critical data off the server. Incase we’re unsuccessfully completely restoring the system you should be able to get your files off.
  • In this situation have a backup server is ideal. On backup server running Windows 2003, search for ntldr / ntdetect.com. Copy these to a floppy disk and move them to the root partition of your server using the Hiren’s BootCD
  • Now create a file called boot.ini with the following information in it, and move it to the root partition of the down server.

  • [boot loader]
    timeout=30
    default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS

    [operating systems]
    multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows Server 2003, Standard" /fastdetect

  • You should be able to reboot and the system will come back online as if nothing happened. At least that is how things went more me. Feel free to add comments about your experiences.
  • I believe I dodged a bullet here. The web server was still had all the site files, and all the data. The individual beyond this could have easy do more damage then they did.

    No Comments

    Webtrends Small Business Reviewed (cont.)

    Sometimes you just can’t get enough of a bad thing. Here is another rant about page view licenses, specifically those implemented in Web Trends 7.5b. After continually fighting with their support staff I came to the realization framed pages are counted as individual pages. The site we’re attempting to process contains 9 pages inside of a framed page. This means for every single page view, we’re getting 9 page views against our licensing. The site should be well under our 2,400,000 million yearly licenses, but with this calculation we’re using over a million per month!

    Once I found the source of my frustration I contacted Webtrends for a solution to this problem. They referred me to their professional development department to pay to have an application written to remove the pages from my log files. It sounds like this has been a problem with web trends in the past. This custom programming from webtrends was going to be very expensive. Why not development this small “frame stripper� application and integrate it into Webtrends!

    Why should I spend thousands more dollars, and tons more man hours when webtrends has already written this application before. Why not simply include their completed code with the final product? I seriously doubt it would be hard to add-on to webtrends. My opinion is they are attempting to create another revenue stream. Customers either need to upgrade to WAY more page view licenses then they need to, OR pay for custom programming to fix the application. It’s a lose – lose situation.

    I completely understand their desire to make money. I also believe if you have a giant website you should pay more for analytics. I feel as a customer I’ve fallen through the cracks and been overlooked. The situation which I am in is not that out of the ordinary. The same way a website needs to be accessible using different browsers, analytics programs need to be functional on different style websites, including frames.

    No Comments

    Webtrends Small Business Reviewed

    Project Overview
    For one of our larger clients we’ve decided to move their website / database / analytics to a dedicated group of servers. While this project was fairly straight forward on the surface it became a real pain when it came time to setup their analytics software. For this project the poison of choice was Webtrends 7.5b. Working for a marketing driven company I’ve had the chance to use nearly all analytics at one point or another, however this was my first experiencing using the newest version of Webtrends. It didn’t take long before I learned to hate this application, just like its parent programs. There is one word which all computer people have learned to hate. Licensing. Nothing can drive a person insane more then having all the tools to fix something, but not having the “permissions� to use them. The worst part is we’ve spent over $1000 to be back to square one. The problem is how Webtrends handles licensing.

    Annual Page Views (and why they suck)
    For our client we’ve ordered WebTrends 7 Small Business with an add-on of 1,000,000 annual page views to get them started. This should be plenty of page views per year right? WRONG. According to WebTrends you need to have licenses for all the page views you’re going to generate in a given year, which means, if I process logs from 2003 in 2005 those work against my 2,000,000 cap. Their solution is to call their sales staff and have them give me unlimited licenses while we import all the old data. Sure this might resolve the problem, but imagine how many people are out there ripping their hair out trying to import old data? This is the old application I know of which handles licensing is such an odd way. I believe WebTrends is more concerned about making money on their larger clients, then making their software user-friendly.

    Interface Reviewed
    Once I got the software working correctly, things started to look better. They have definitely created a very user friendly analytics solution. The user management seems to be top notice. I can tell they’ve pulled some ideas from their competition however. I see some similarities to both NetTracker & Urchin. The reporting layout is easy to follow, and makes better sense then many of the applications on the market. You can definitely tell WebTrends has been doing this for awhile. I can give them no fault on the application itself. They have definitely got their act together since WebTrends 6.0 log analyzer (which I still use for some websites).

    Conclusion
    Although the Interface is nice, the licensing is very poorly done. With that said, I cannot recommend this software solution. There are a number of other applications on the market which are more capable for significantly less investment. If you’re looking to buying WebTrends 7 Small Business I recommend you try NetTracker. If you still want to buy WebTrends be ready for a fight over licensing.

    3 Comments

    Top 10 System Administrator Truths

    Vo0k writes “What are your top ten system administrator truths? We all know them already, but it’s still fun re-telling them. Stuff like “90% of all hardware-related problems come from loose connectors”, even though you already know it’s true, may save you from replacing the “faulty” motherboard if you recall it at the right time.”

    A few others my from experience:
    1.) Computers don’t delete files, people do.
    2.) It’s always your fault.
    3.) Money can fix problems.
    4.) Working on computers is only fun while they are working.
    5.) Marketing people will install spyware.
    6.) Sales people will install viruses.
    7.) Production people will delete important files.
    8.) E-Mail is the most important thing in the world, to everyone except you.
    9.) No matter how hard you work it will be there waiting for you in the morning.
    10.) Hard drives will die.

    No Comments

    AWStats Review

    After months of evaluating commercial analytics programs we found no better value for our client then AWStats. This is a fantastic open-source application written in Perl. Most of the major players are moving away from bulk hosting analytics. For example web trends now requires you pay licensing per visitor in your reports. Basically the more successful your site, the more money they want from you. Sounds like a scam doesn’t it?

    AWStats is completely free, and you’re welcome to make changes to fit your needs. For example we can generate lists of bandwidth users without having to go into each profile. This summery page has been in invaluable asset. Although not the prettiest interface in the world, AWStats offers most of the information my clients want in an organized fashion. Below is a list of the best feature over our old solution web trends 6.0.

    Daily Incremental Updates
    This is probably the biggest change for our clients. Now rather then waiting for a report every month, clients will be able to check statistics every day. Also we can process logs right on the server since AWStats on reads the part of the log it needs.

    Efficiency / Speed
    This application is really just a huge Perl script and has very little overhead. We’re able to process logs for 300 popular websites in a few minutes every night.

    3rd Party Plug-ins
    There are tons of extra features you can add to AWStats including tracking users by city and state using geo plugs. There are also tooltip options to make things easier for clients to understand.

    Management
    Once we create a profile for a specific website, everything is automatic. Using webtrends 6.0 we’d have to make sure the FTP worked so it could download / upload logs and reports. We also don’t need a dedicated server to run reports. Everything can be done on the web server.

    In closing I recommend awstats to anyone looking for a stats application. There are other free solutions available, but none come close to the functionality, usability, and efficiency of AWStats

    No Comments