Wednesday 3 June 2009

Distributing builds using Hudson PXE

I attended a really good talk (TS-5301) today at JavaOne 2009 from Kohsuke and Jesse; about running Hudson in the Cloud. See their wiki for upcoming meet-ups.

We've been using Hudson at our company more recently (quick thanks to Aleksey Maksimov for setting up our project quick-start!) and so it's my first look Hudson - coming from a Cruisecontrol background - but suffice to say, I'm really impressed!

Hudson PXE enables environments to be configured in a Master and deployed to Slaves. You can essentially setup an Unbuntu Server as a slave; get the jdk version you want, get a container such as Glassfish (and whatever) and another slave running Redhat with Apache HTTP etc. The only slave requirement is a 170kb JAR; you can even run Windows slaves, all the master needs is a hostname. The key thing is this is all automated for you; configure it once and then let the slaves be provisioned according to your policy.

They have really thought about network segregation. What I mean is, the problem with a Master - Slave when there are firewalls between them - is that the initiator of the connection is now important. Either initiation model is possible with Hudson as the Slave can use JNLP to initiate the connection back to the Master - having 'clicked-once' Hudson will let you set that up as a service for Windows slaves.

Having established a cluster we now get a new set of challenges related to the state of those nodes and how to manage the distribution of jobs to slaves. The latter part is solved in Hudson using Labels meaning that we can use friendly terms like Ubuntu to relate to n number of Slave configurations/distributions. In the former, Hudson actively monitors a number of Slave parameters such as disk-free, memory, tmp free and network. One of the issues we frequently get with remote builds even with a couple of machines is whether we've filled-up /var with old source. Here, Hudson will keep a watch on these things and report in the Hudson console. Hudson will also help clean-up any runaway processes after a build and show you load statistics...

By intercepting load statistics, we can let Hudson setup additional Slaves in the cloud. Hudson GUI supports AWS/EC2 and at 10c/hour we have a really cost effective way of dealing with the 5.30pm spikes we frequently get for the last builds of the day. There was a great demo showing how this was done and so hopefully you just check out the slides/mp4 for more details.

Whilst there are great advantages to this (cheap, deal with spikes, no need to tear-down instances properly) the authors point to problems such as the check-out time; tests may require behind firewall connections to systems and (most importantly the presenter states) the lack of multi-threaded and cross-VM unit testing framework.

You can get round some of the problems by perhaps running your master in the Cloud but you'll now be paying for EBS as well (amazon's storage: Elastic Band Storage).

Final thoughts (one that I'll be looking at again) is that they talked about:
1) Using Hudson Hadoop (their are AMIs for Hadoop already on Amazon) and that way you can turn every Hudson Slave into a Hadoop node - thereby provisioning on the fly based on load requirements. This gives options in terms of future directions for Hudson in terms of analysis of old code artifacts (are we getting better over time?)
2) Hudson Selenium Plug-in. So you can now have a grid of Selenium slaves each running different platform/browser configs all of which talk to the Selenium Hub and report their tests. Sounds a great opportunity for dev shops to save money here!

I have to thank the guys here and my article is a reasonable (if rather newbie) summary of the great work that's been put in..
See for more details or go along to BOF-5105.

No comments:

Post a Comment