Monday, July 13, 2009

WARNING:the CID values for both test machines are the same

There are a thousand or more issues that can come up with virtualization if not done properly. One of which is having the same machine SID in the domain in case you decide to join them to one. This happens a lot of times especially when a virtual machine is created from a template or cloned from another virtual machine without using the appropriate tools like Sysprep from Microsoft

One unique incident occurred to me while trying to troubleshoot a MS DTC communication issue between SQL Server instances. A client of ours requested for assistance to fix a distributed query that uses MS DTC. Apparently, communication between the two SQL Server instances is not happening. My usual round of troubleshooting started with a series of network connectivity tests, ranging from PING to TELNET to NETSTAT to whatever is necessary to make sure that communications between the servers are working fine. That led me to look for ways to check for connectivity specifically with MSDTC. One tool from Microsoft is DTCPing, a utility to help troubleshoot MS DTC firewall issues. While I know for a fact that firewall is not an issue in this particular case, I've decided to give it a shot. Running the DTCPing utility on both servers gave me this error message in the log

WARNING:the CID values for both test machines are the same

A quick Google search led me to this blog post and made me think that the servers might have been cloned. Sure enough, when I asked the customers about the history of the servers, they were indeed cloned VMWare images. They didn't use Sysprep to prepare the images after the cloning process, hence, the reason for having the same CID values. There's nothing wrong with VMWare here. It's just the process that's pretty screwed up. What are the chances of two machines having the same GUID values which are supposed to be globally unique across the enterprise? Very slim unless they are inappropriately cloned.

I followed the steps outlined in the blog post to fix the CID values
  • Use Add/Remove Windows Components to remove Network DTC.
  • Run MSDTC -uninstall in the command-line
  • Delete the MSDTC keys in in the registry
HKLM/Software/Microsoft/Software/MSDTC
HKLM/System/CurrentControlSet/Services/MSDTC
HKEY_CLASSES_ROOT\CID
  • Reboot the server
  • Run MSDTC -install in the command
  • Use Add/Remove Windows Components to add the Network DTC back.
  • Restart the Distributed Transaction Coordinator service
Following these steps helped solve the MSDTC issue but sure enough, another issue surfaced. Since SQL Server uses MSDTC in a few of its processes like executing distributed queries, the installation got screwed up big time. When we used the server to test a disaster recovery process for the entire SQL Server instance, restoring the master database became a real pain. I spent hours trying to restore the master database but to no avail. The resolution was to simply uninstall and re-install SQL Server. Only then was I able to restore the master database successfully. Lesson learned: if the foundation is screwed up, anything built on top of it will surely be the same. That applies to just about anything, whether you're building a server or developing a character.
Google