Monday, 25 July 2011

Debugging Hudson Git plugin hanging during fetch on Windows slave.

Ran into a tricky problem today after switching a couple of our Hudson jobs from subversion to Git. The git plugin was able to clone the repository in a clean workspace, but after that simply hung in the fetch step.

The job log did not provide much information
Started by user hlh005
Building remotely on selenium-xp2
Checkout:TestVerification / C:\hudson\workspace\TestVerification- hudson.remoting.Channel@6e7a8d98:selenium-xp2
Using strategy: Default
Last Built Revision: Revision b6d046ffede89c4bcdd5b71b7c6b703583cb6e18 (origin/master)
Checkout:TestVerification / C:\hudson\workspace\TestVerification - hudson.remoting.LocalChannel@7835ec
Fetching changes from the remote Git repository
Fetching upstream changes from git@forge.example.com:my-repo
Looking at the slave workspace, it was clear that it had actually managed to clone the repository, as there was both .git folder and the working directory content. So in some sense it must be able to contact the server.

A Google search for the problem returned some results but in the majority people got a error message and not a hanging job. It did however reveal that a incorrect HOME environment variable could be the cause.

In order to find out our current environment variables for the node I went to "Manage Hudson" -> "Manage Nodes" -> (name of the node) -> "System information". Here you can find both system properties, environment variables and thread dump.

Besides confirming that we did not have a HOME environment variable, the thread dump revealed something very interesting. One of the thread had a thread name showing the exact git command line being executed and this thread was waiting for input reading on a socket.

Trying this command on the slave in a normal command promt, confirmed the fact that the git command could not find the key and thus failed to talk to the remote repository. Don't use the git bash you get with msysgit,  Hudson does not use this. Use a standard "cmd" promt opened from the start menu.

The solution was to set the HOME environment variable and restart the Hudson slave, but at least one question remains....

How was Hudson able to clone the repository when the keys could not be found. Is this another git plugin weirdness or does our gitolite setup have a error ?

I have created a Hudson issue have it provide better feedback see: Git plugin hangs instead of providing error message when it cannot find the ssh keys (windows slave)

No comments:

Post a Comment