Wednesday, November 30, 2011

Cut'n'Frag: Fragment smart using SMARTS


In our research group, part of the research we do is method development. To facilitate these new methods, tools are often required to setup various calculations or treat output files. Some of this research[1] is in fragment based methods[2] where a large system is divided into several smaller pieces and the total property of the whole system can be assembled from the individual fragments.

To help facilitate easier setup of fragment based calculations, we've made a tool (still under development) called FragIt which we use to fragment large molecules and prepare input files for GAMESS[3]. This post is about one key aspect of FragIt: How to use Open Babel[4] to figure out where to cut bonds in a protein.

For starters, one needs to load a molecule from a file, this is accomplished by the following lines of code[*]

import openbabel
filename="1UAO.pdb"
obmol = openbabel.OBMol()
obconv = openbabel.OBConversion()
obconv.SetInFormat(filename[-3:])
obconv.ReadFile(obmol, filename)

which basically tells OpenBabel that we like to open a file with a fileformat specified by the extension of the filename("pdb").

Open Babel includes functionality to do SMARTS[5] pattern matching[6]. Basically, you write a pattern to search for a substructure in a molecule. The way we want to use it is to identify an atom A on one side of a bond, and an atom B on the other side of that bond. Thus, using the following SMARTS pattern on a protein

[$(CN)][$(C=O)]

we can search for A: 'find carbon connected to nitrogen on one side' and B: 'find carbon connected to an oxygen on the other side'. Results are only returned if the above pattern mathes on both sides of a bond at the same time. The code to do this is

obpat.Init(pattern)
obpat.Match(obmol)
matches = [m for m in obpat.GetUMapList()]

where the last line converts some horrible <openbabel.vectorvInt; proxy ... > SWIG code into a list of tuples with matching atom ID's, i.e.

print matches

gives

[(2, 3), (11, 12), (32, 33), (44, 45), (58, 59), (73, 74), (87, 88), (94, 95), (108, 109), (132, 133)]

Each tuple has information about the found atoms as (A, B).

The real beauty of this is of course that you can make your own patterns and match whatever you want in whichever molecule you chose to play with.

You can find this example as well as others on my github page.

[*] We opted for the full fledged API of OpenBabel, thus it takes 5 lines of code to actually open a file. The more pythonic library pybel[7] which is also included in OpenBabel was not used.

update: Thanks to Anders Christensen for pointing me towards an easy syntax highlighter.