Utimate Flexibility

Well, I've finally finished sorting out the new ID scheme that is used to identify objects in a simulation (e.g. atoms, residues, chains, molecules). The scheme is incredibly flexible, and is best understood by demonstration;

//select the residue ALA 39
res = mol.select( ResName("ALA") + ResNum(39) )

//what if there are two ALA 39 residues (maybe in two chains)
// Lets select the first one
res = mol.select( (ResName("ALA")+ResNum(39))[0] )

// Lets select the one that is the chain called "A"
res = mol.select( ChainName("A") + ResName("ALA") + ResNum(39) )

// Or, lets select both of these residues together
ala39s = mol.selectAll( ResName("ALA") + ResNum(39) )

//we can then do things like;
ala39s = ala39s.move(0).translate( (1,2,3) )  //moves the first ala39
ala39s = ala39s.move(1).translate( (5,6,7) )  //moves the second one

//how about translating all of the CA atoms in the molecule?
mol = mol.select( AtomName("CA") ).move().translate( (1,2,3) )

//now lets just move the fifth CA atom to the second to last CA atom
mol = mol.select( AtomName("CA")(5,-2) ).move().translate(1,2,3)

//how about calculating the mass of the "MA" segment of the molecule?
mass = mol.select( SegName("MA") ).evaluate().mass()

What makes this scheme clever is that it is relatively intuitive and consistent (e.g. (Selection)[0] *always* returns the first item that matches Selection, and (selection0 + selection1)(i,j) *always* matches the range of items from i to j (which can be negative indicies) that match both selection0 and selection1. *And* despite being powerful, the design has been created to simplify the coding of classes that use these IDs, and to minimise the number of overloaded functions (so I now only provide a .select(const AtomID &atomid) function, rather than functions that take lots of different types of AtomID).

Finally, while it is not yet implemented, there is scope in this design to allow negated matching (e.g. ResName("ALA") - ResNum(35) would mean all "ALA" residues that are not number 35), case-insensitive matching (e.g. ResName("ALA", CASE_INSENSITIVE) will match "ala", "AlA", "aLA" etc.) and also "or" matching (e.g. ResName("ALA") or (ResNum(23) or ResNum(24)). Finishing the implementation of these extensions is of course added to my large list of things that need to be done...

As well as getting the ID scheme sorted, I am also now cleaning up the molecule design. All views of a molecule now derive from MoleculeView, and all of the molecule classes now break down into three groups; view classes, e.g. Molecule, Segment, Chain, Residue, Atom, PartialMolecule, CutGroup, container view classes, e.g. SegmentsInMol, AtomsInMol, ResiduesInMol, and manipulator classes, which are built on top of the view classes, and are the only classes that can actually modify the views, e.g. Mover, Editor, Evaluator, Selector. The purpose for this decision is twofold; one is to now fully enforce the split between constant view classes (that are const, unchanging views) and editing classes (that are the only classes that can make changes), and the other is to sort the API into groups, e.g. groups of moving functions are now in Move, while groups of selecting functions are in Selector, while editing functions are in Edit. You can see the use of these classes above, e.g. mol.move() returns a Move, which has the .translate() function (as well as other moving functions), and Move is only able to move the atoms that are part of T, i.e. Move can only move the atoms in the Residue, while Move can move all of the arbitrarily selected atoms from AtomsInMolecule. This leads to code consolidation, as now I don't have separate "translate" functions for residues, atoms, molecules etc., and also it lends itself to a consistent interface (so x.move() always returns the same interface, regardless of the identity of x, which should help new users more quickly pick up this code).

As well as Move, there is Evaluate, which is used to evaluate properties that can be calculated from the molecule, e.g. mol.evaluate().mass(), res.evaluate().center(), chain.evaluate().dipole("charges", chain.evaluate().center())

(the argument "charges" is the name of the property to use to get the atomic charges on the atoms - more about properties later!)

There is also Selector which is used to play around with selections, e.g. select everything that is not the residue, res.selection().invert(), or combine selections, chain.selection().intersection(res).

Most importantly though, there is Editor, which is used to edit a molecule object. So we can do mol.edit().rename("calix") or res.edit().rename("ALA") (note how .rename() knows what it is renaming - this is because Editor is a template). Editor is the replacement for EditMol and EditRes, which were a good idea, but way to limited for my current use. Editor is still in the design phase, and my current plan is to use the same SQL database code as for the parameters to place the molecule info into a database, and to then have Editor work with this. The reason for using a db is that the definition of the different groups can change a lot through even simple changes (e.g. changing the residue of a single atom can significantly reorder things). A database can capture this automatically without me having to write lots of bug-filled code. Also, I think that editing can afford to be a little slower than the other code.

Anyway, on to the biggest change in the molecule design - and one I have alluded to before - I have finally changed molecule over so that *everything* is a property. Even the connectivity and coordinates of the atoms! The only constant guarantee in a molecule is its info() object that gives the names and layout of the atoms (e.g. how many residues, which atoms are in each residue, which residues are in each chain etc.). Everything else is optional, including the coordinates! This means that you can now associate as many sets of coordinates as you wish with a molecule, and even associate coordinates that are not three-dimensional! This is now built and supported right in the heart of MoleculeView, so can be used by the whole code. This means that you could have different sets of coordinates for a molecule and use different representations in different forcefields (perhaps there are two systems, and the molecule is being gradually scaled out of one simulation box and into another?). All you have to do is tell the forcefield the name of the property that holds the coordinates that you want it to use (or tell the forcefield nothing and it will assume the default name, "coordinates"). Equally you can pass different coordinates and connectivity to the move classes, so your MC move could change "mc_coords" while your MD move could move "md_coords". If these coordinates were used in different forcefields then you could imagine a simulation where a single molecule is split into two copies by the different types of move (or you could imagine both types of move occuring in parallel, and then perhaps an MC test choosing which of the two to carry forward to the next move - this would be useful in a parallel MC algorithm, which I think that I've blogged about before).

So there's been a lot of changes, a lot of development, but still so much more to do...!