Tuesday, March 17, 2009

A thousand and one reasons for a generic error message.

I've spent a couple of hours trying to troubleshoot a clustered SQL Server 2008 installation. All I know is that it throws this error message after the installation process and does not give me any clue at all

The cluster resource ‘SQL Server (MSSQLSERVER)’ could not be brought online. Error: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0×8007139F)

Now, this might look like a dependency issue not working correctly but when I checked the Failover Cluster Management console on Windows Server 2008, all the dependencies are online and working as expected. As always, I started searching the Internet for related errors and couldn't find anything really specific except for the same thing - dependency issue. Now, here's what I found out. Since all of the dependencies - disks, MSDTC, IP and virtual server name - are all online, maybe it doesn't have anything to do with them after all. So the first thing I did was to do a PING test to the virtual server name for my clustered SQL Server instance and guess what I found out - there is another IP registered on the DNS server with the same FQDN (maybe a previous installation that wasn't cleaned up properly). I logged in to the DNS server and updated the IP address of my clustered SQL Server instance, ran ipconfig /flushdns on the node on which I am logged in and started the service in Failover Cluster Management. It worked! It just tells you that you should think outside of the box every now and then. It really pays to have that background in network and systems infrastructure every once in a while.

No comments:

Google