I am now *so* close to having Sire working again. I've spent the last few months with quite broken code because I reworked the SireMol library (which provides all of the data structures that actually define a molecule). Well, SireMol is now nearly finished, and the update is definitely worth it! As I've blogged before, the updated data structure allows pretty much any piece of data in a molecule to be attached as a property. This includes such basic concepts as the element types of the atoms, their coordinates and their identifications. This means that a molecule has now become an incredibly flexible container for molecular information (and metadata). I'm just now finishing the last piece of this - the Editor - that allows this flexible data structure to be edited easily by the user. The editing classes work on the same principle as the other controller type classes (e.g. Mover, Evaluator, Selector) in that they work on a copy of a view of the molecule data (as MoleculeData contains all of the data in a molecule, while Molecule, Atom, Residue etc. are merely different views of that data).
Editor works in the same way - or it least to the user it appear to. Actually, underneath it has to use a different model data structure as the assumptions of MoleculeData (that the molecule structure is constant, thereby leading to efficiency and space optimisations) are no longer true when the molecule is being edited. So Editor, behind the scenes, splits into two parts. When the user changes things that don't change the structure of the molecule, e.g. names of atoms, or numbers of residues, then a simple Editor class performs the change directly and efficiently on the MoleculeData data structure (e.g. via Editor or Editor classes). However, when a structural change is requested, Editor converts the MoleculeData structure into the EditMolData structure, and returns a StructureEditor derived class that can then manipulate that structure (e.g. AtomStructureEditor, ResStructureEditor). This is all completely transparent to the user (at least in python) as all they will see is the same editing API - it just flips between the two types of editing class. To demonstrate how easy it is, here is how to create a water molecule;
water = Molecule()
water = water.edit().rename("water")
.add( ResName(\"WTR\") )
.add( AtomName(\"O00\") )
.setProperty(\"coordinates\", [0,0,0])
.add( AtomName(\"H01\") )
.setProperty(\"coordinates\", [1,1,0])
.add( AtomName(\"H02\") )
.setProperty(\"coordinates\", [0,1,1])
.commit()
The above two (logical) lines of code involve several editing classes (Editor, ResStructureEditor and AtomStructureEditor), yet you wouldn\'t know it by looking :-)
The editing functionality is the last piece in the puzzle, and with its completion I can put Sire back together again (i.e. merge this all in with the existing forcefield and simulation libraries). The work is a little like putting a Swiss watch back together again after all the pieces have been pulled out. Hopefully it will still be able to tell the correct time once I have finished...
Still, as I said, it has been worth it. The update to SireMol has fixed all of the major problems I've seen over the last few years, and fix all of the problems that I anticipate running into as I start applying the code to new problems (in particular the multiscale, multiensemble methods that I\'m keen to research). The major problems solved are;
(1) Fixed the grouping of molecules. Molecules and parts of molecules, and parts of the same molecule can now be grouped together under a friendly API via the Molecules, MoleculeGroup and MoleculeGroups classes. Even better, MoleculeGroups and MoleculeGroup now provide the foundation of the forcefield and system classes, so all of the code required to index, store and search for molecules is now in one place.
(2) Speaking of indexing, searching and indexing of molecules and bits of molecules is now possible through the friendly search and indexing API. This allows the user to find arbitrary molecules or bits of molecules using searches based on names, numbers, indicies, parents, and any combination of these, e.g. atom = mol.select( ResNum("3") + AtomName("CA") ). This works for finding groups within a molecule, and also, via the MoleculeGroup(s) interface for molecules or parts of molecules in groups, forcefields or systems.
(3) Subdividing a molecule is now sorted. A Molecule can be broken down into Segments, Chains, Residues, CutGroups and Atoms, with the only requirement being that the molecule must contain at least one atom, and every atom must be in a CutGroup. This means that concepts of residue, chain and segment are optional. This breaks Sire away from a biologically rigid concept of molecules being composed of residues, which are made from atoms. Also, as all subgroups use the same search API, same underlying MoleculeData data structure and use the same controller classes, there is complete consistency between the interfaces of each of these groups.
(4) Everything (pretty much!) is a property. Atomic coordinates are properties (meaning that my Atoms don't have to exist in boring 3D space - you can give the atom a coordinates property that has 2 or 4 or any number of dimensions - indeed you can give an atom several different coordinates properties). This means that all of the forcefield parameters, atom definitions, group definitions etc. are all arbitrary properties, complete with associate metadata, thereby allowing complete flexibility of how the user represents their problem. What's cool is that this is tied into everything above, thereby allowing this flexibility to be used by SireIO via the Editor classes so that this can be used while loading and saving molecules with different file formats. The EditMolData data structure has also been designed with an XML layout in mind, so this should allow the full flexibility of this data structure to be written to and read from disk.
There's more - but it is getting late and I'm going to have to stop...