Long time no blog!
So, what have I been up to over the last seven-eight months? Well, shortly after blogging about reconstructing the Swiss watch of Sire (and co-incidentally having been given a Swiss watch as a birthday present!) I spent 3-4 months in job limbo while I was applying for and waiting for news about a position in Germany. During that time I travelled to Germany twice (and also travelled to Paris to talk about Sire), and got very distracted. Long story short, while I was very committed to going for the job in Germany, a position opened up at Bristol of which I was previously unaware, and that I really wanted, just before I heard back about the German position. So at the beginning of July I finally gained some security and now have a stable position at Bristol, at least for the next 3-4 years (which means that Sire will be finished, and I will have the time to support it - so good news!).
Then, after a much-needed months holiday in August, I got back to some serious work in September. Sire has now been pieced back together, and can now support full protein simulations (with a little encouragement!) and the core code is now complete (save for a little debugging). The multi-processor code is now in place (both for SSE vectorisation and OpenMP multicore parallelisation) and the multi-node MPI message passing code has also been written. What's cool now, is that I can load up a simulation on the head node, and then beam the data for the simulation across to a worker node in an MPI message, and have the worker node run the simulation, and beam back the result in another MPI message. This means that replica exchange simulations can be run directly in Sire via MPI, rather then requiring messing around with a dedicated Python script and playing with SSH and NFS or network sockets. This means should make replica exchange simulations more accessible, particularly to industrial users, as adapting the replica exchange python scripts to a new cluster queueing system is quite specialised work (as I know a friend of mine currently working on an industrial placement is experiencing!), while MPI cluster jobs are standard.
There are, however, problems. MPI, while being standard, does seem to be a little gray. What I mean by this is that different MPI implementations can behave in different ways, and don't always do what I expect. For example, for QM/MM jobs I run Molpro using a system call executed via a QProcess object. However, when I use the OpenMPI MPI implementation, I find that my system calls block if I am waiting on an MPI receive in another thread. This then locks up the simulation. This doesn't happen with MPICH, but in this case, I am finding that some MPI sends are not finishing their Send operation, despite the corresponding MPI receive completing, and so that are not entering into an MPI receive to complement the other processes MPI send, and so I am sometimes deadlocking. Oh its complicated! Part of my problem is that I want to use multithreaded MPI, with different send and receive threads, and to effectively use active messages (e.g. sending a message that contains a simulation, and then running that simulation and returning a response). I just need to do a lot more debugging to get this working properly...
Something else I have to do is to update this website. I've been meaning for a while to upgrade to Drupal 6, and to completely rearrange the front page to better display the contents. It is all a matter of finding the time... I am now in a rushed phase trying to hammer out all of the bugs so that I can get all of my protein applications finished for Christmas (I've been given a deadline of submitting a paper on all of this before Christmas), and time is short. Fun though. But short.
I'll finish on a reminder - while I can sometimes be slow in updating this blog (very slow in the current case!), the development log of Sire is updated almost daily. The comments associated with every subversion commit are sent to a publically visible mailing list, which is available here, or get the RSS feed here. I'm currently on commit 705, so there's plenty of messages there to read if you are interested. They are a bit boring however, but they give quite a good view of the day-to-day development of the code.