Python, Chemistry and Other Stuff

Tuesday, October 15, 2013

Howto figure out whether two lists have any elements in common

While working with FragIt I had to figure out whether two lists had some elements in common (it does not matter which!) and I came up with the following piece of code (with the help of Google and Stack Overflow of course)

It takes advantage of the highly underused Sets available in python on which we can do some neat mathematically sane operations. I guess people just go for Lists and work their magic on that.

Wednesday, August 21, 2013

One way to safe guard imports of external libraries

Say you have some code that is dependent on some 3rd party library, for instance users can download FragIt that requires Open Babel, but what happens if the user 'forgot' to install Open Babel or configure the paths correctly? Importing FragIt under these circumstances will crash your application because it has some explicit import statements to Open Babel. What should you do as a developer to safe guard this behavior?

The best way to deal with the issue (and this it not just the case of using FragIt, but any kind of library) which I learned from the very excellent "The Clean Coder" by Robert C. Martin is to write a wrapper for the FragIt API to suit your specific needs because this you can unit test and know instantly when the API has changed.

In header of this wrapper, you can provide code that looks like

where we have defined a wrapper to the API to return the fragmentation points we want. However, if Open Babel is not defined, we overwrite that function, letting the user know that something is not right.

There are many ways to deal with it, this is one.

By the way: happy 1 year anniversary for the last blog post!

Wednesday, August 8, 2012

Fun and a little disappointment with amino acids

So you have Open Babel, you have the Python bindings to Open Babel and you think: "I just want to have fun!".

I wrote a small script to generate the 3D structures of the amino acids because I wan't to investigate how well the MMFF94 force field can calculate the charge of them. I've done this before for entire proteins, but here I just want to do it one amino acid at a time. To challenge myself, I cannot rely on the excellent (o)babel tool of the Open Babel package when generating the structures. All code is on github, but have shared it on the blog too via select gists.

My SMILES for the amino acids are saved in a dictionary:

and to simply generate the structures I've updated my obutil library with the following utility functions

Finally, to generate the structures I simply run the following code

I have a separate script to calculate the charge of a molecule

The table below is the result of the above. The images are generated using the babel tool with the -opng option.

Amino Acid	SMILES	Charge	Depiction
GLY	NCC(=O)O	0
ALA	NC(C)C(=O)O	0
SER	NC(CO)C(=O)O	0
THR	NC(C(C)O)C(=O)O	0
CYS	NC(CS)C(=O)O	0
VAL	NC(C(C)C)C(=O)O	0
LEU	NC(CC(C)C)C(=O)O	0
ILE	NC(C(C)CC)C(=O)O	0
MET	NC(CCSC)C(=O)O	0
PRO	N1C(C(=O)O)CCC1	0
PHE	NC(Cc1ccccc1)C(=O)O	0
TYR	NC(Cc1cc(O)ccc1)C(=O)O	0
TRP	NC(Cc1c2ccccc2nc1)C(=O)O	0
ASP	NC(CC(=O)O)C(=O)O	0
GLU	NC(CCC(=O)O)C(=O)O	0
ASN	NC(CC(=O)N)C(=O)O	0
GLN	NC(CCC(=O)N)C(=O)O	0
HIS	NC(C(=O)O)Cc1cncn1	0
LYS	NC(CCCCN)C(=O)O	0
ARG	NC(CCCNC(=N)N)C(=O)O	0

In this case, all charges are zero on all amino acids. Notice that I did not make use of Open Babel trying to assign Hydrogen for a specific pH as that is not what I want to test. Instead, notice in the aminoacids.py file listed above there are specific patterns for obtaining (de)protonated states of a few select amino acids: ASP, GLU, HIS, LYS and ARG.

Running the (de)protonated molecules above gives the following table

Amino Acid	SMILES	Charge
ASP^-	NC(CC(=O)[O-])C(=O)O	-1
GLU^-	NC(CCC(=O)[O-])C(=O)O	-1
HIS⁺	NC(C(=O)O)Cc1c[nH+]cn1	+1
LYS⁺	NC(CCCC[NH3+])C(=O)O	+1
ARG⁺	NC(CCCNC(=[NH2+])N)C(=O)O	0

The disappointing thing here is that the protonated Arginine is incorrectly assigned a charge of 0 (zero). Edit: As noted by Jan, the depiction of the protonated ARG, ASP and GLU is also wrong.

Edit2: It appears the problem is also present in the SVG depicter.

Maybe it's time to submit a bug-report?

Thursday, June 28, 2012

A Threaded PyQt GUI and Particles

I wrote a post about adding a GUI to your command line interface program. Sometimes people just want to click stuff. If you tried it / implemented it yourself, you'd have seen that the program would have gone from responsive to non-responsive when you hit the run button. Why? Because the program can only do ONE think at a time and frankly, keeping the controls available for clicking at all times, updating the graphics and calculating particle movement do not get along nicely. This post is about making it behave like we expect.

I present to you threading. Specifically for PyQt there is the QThread class which is actually discussed rather nicely in this PyQtWiki and this page by Jo Plaete which I used for inspiration.

What complicates the whole scene is that we need to realize that we have two independant things we want to accomplish: 1) move particles and 2) update the graphics. Both independent of the GUI.

I started by making my own class called BaseThread which subclasses QThread. The code is

where I believe the magic for everything not just crashing and burning is the subtle use of the custom boolean flag exiting (which we will use later) and the use of self.wait() in the __del__ method. Again, for details you should read the blog posts mentioned above.

To come around many of the syncronization problems that arises with threads the Qt framework allows us to use custom signals to make everything talk together. Here is my version of a StepThread which defines the run method (you do NEVER call this explicitly!) and a convenience function called simulate to start a simulation for N steps. The run function emits a custom signal and sleeps for a little while before simulating again.

Very similarly we have the DrawThread which emits a signal so we can update the matplotlib surface.

Notice that it updates less frequently. I have yet to figure out something clever in this regard.

Finally, I had to make some tweaks to the Simulator class (nothing very fancy) to hook up our custom signals.

What I need now is a video of it and maybe to implement some different potentials instead of only the non-interacting particles. Stay tuned.

The entire code can be downloaded from the latest gist I made.

Sunday, June 24, 2012

Adding a PyQt GUI to your program

I am all in for command line interface (CLI) programs. I'll admit that. They work in terminals. You can run them without an X Server and usually option flags are easy to use and obvious*. But sometimes I'd like to just hit a button. Click and magic happens. No more 200 character command line statement to get the ball rolling. A good example of this is from my colleague Anders Christensen who made a graphical user interface (GUI) for the Phaistos program.

In this post, we'll take the first small steps towards actually making a GUI for a simulation to run the ideal gas in and not just watch some numbers printed in a terminal. I'll be using Qt4 and the PyQt4 bindings. If you have a fairly recent linux distribution, you can apt-get install yourself to success very fast. Remember to also install the Qt-Designer app so you visually can layout your controls.

We'll start out simple by constructing buttons, textedits and a frame. I use a regular QWidget for my main form. The end goal for a user is to push the 1) setup button to initialize particles and 2) push the run button and make the simulation run. The simulation (i.e. the particles) should be displayed in the frame via matplotlib. It looks like this

If you want the resulting simulator.ui code its listed as part the gist for this blog post.

To move on from here, there is a nifty little tool called pyuic4 which will convert your .ui file into a python class with the name Simulator_UI. I always just parse it down to a file called simulator_ui.py. The code is

$ pyuic4 simulator.ui > simulator_ui.py

The simulator_ui class contains the framework for the QWidget form we will be using. The most clever approach I could think of was to subclass the simulator_ui class to separate the "setting up the form" code with the "what happens when I push a button" code. The QWidget I'll use to represent the window the user will see and I call it Simulator. It imports the Simulator_UI class, subclasses it and sets up the GUI. The two buttons are hooked up to either initialize new particles or run for some steps

The bread and butter of this application is the ParticleCanvas. Its tailored to draw and move the particles around. Look at its code

You see that it subclasses the Canvas class which is my best bet on how one should make the code for fast blitting of a matplotlib canvas. A class which subclasses the Canvas class is only responsible for redrawing the actual plot (and do it fast). However, I would gladly appreciate comments of you have a better way to implement it

Finally, the code that makes it all run is just my main executable which is defined like this

The final result looks like this

We now have a finished simulation GUI. However, the main problem is that the GUI and the simulation code is intimately hooked up so the GUI will freeze at some point. Fortunately, it leaves a new post on the horizon.

*I never understood anything of the find command, however.

Thursday, May 3, 2012

Programming the Ideal Gas

You've heard about it before. Many times. It's a model system used a lot in chemistry. The Ideal Gas law: A system where the particles can be treated as non-interacting. How do you program such a system? What do you need to describe it? Given 1) the basic introduction covered by Prof. Jan Jensen in another lecture, and 2) the following assignment (notice it is part 2), here is what the students are supposed to do once this exercise is complete:

Take a box with dimensions $(-1<x<1$ and $-1<y<1)$.
Make npart particles placed randomly in the right side of a box ($0<x<1$) but evenly from the top through the bottom
Give the npart particles random velocities in both directions and make it so they can move in both the positive and negative direction.
Move the particles around via the equation: $X_i^{n+1} = X_i^{n}+dt*dX_i^{n}$
Keep them inside the box at all times.
Plot the result.

The students start out with 1, 2 and 6 presented to them, both for inspiration and because they contain some subtle hints. The code is

One of the tricky parts is knowing "what is the i'th particle" because we need this information to update the system according to the equation above in point 4. Once you realize that you can access this information in the X, Y, dX and dY arrays as X[i] under a loop, then there is really no problem with that point either, for instance, to update all particles according to the equation in 4 in one go, we could write

Assigning a loop to iterate over the particles is not really troublesome, but understanding why is hard - even to explain it to yourself. A simple for-loop might not be the smartest thing to do, but I think it is the easiest one to go about and more importantly I often feel that many things can get lost in the whole numpy magic, especially for newcomers.

The next part is, that we should keep the particles inside the box at all times. The exercise has a little discussion about which version is the better, but in my opinion the particles should always be inside at all times. To keep them in, we ask whether a step (equation in point 4) in X and Y will result in the particle being outside the box and if that is the case, we reverse velocity that potentially brought the particle outside. We could use the following code to do that

Inserting the above two snippets of code, under a loop over the number of steps we want to take, we finally end up with the code that simulates non-interacting particles in a container.

Try and code it for yourself or use it in a course you teach. The students do find it interesting, both in the form of "what do I need to tell the computer to make it do stuff?" and "oh - the particles do move!".

Disclaimer: This code is part of a course I help teach at the University of Copenhagen - just imagine you never saw python before.

Friday, April 20, 2012

The Wonders of Numpy

I originally envisioned this post to be somewhat of a wall of code which would show just how ugly you can actually write code that works (see the public gist for that) but without the checks to see that it does what I want it to do. I've been parsing text lately - a weird and wonderful combination of taking the molecular orbital (MO) coefficients of a calculation and transforming them into a new basis set to calculate spin-spin coupling constants. This involves reading the input file to get the basis set exponents of the uncontracted basis set and matching them with the corresponding MO coefficients and generate a new segmented contracted basis set.

In the table below we have a segmented basis set where the primitive gaussian type orbitals (PGTO) are combined into contracted gaussian type orbitals (CGTO). In this case, we have a basis set with three degrees of freedom (three CGTOs) down from five degrees of freedom (five PGTOs). Dark fields in the table shows coefficients whose value are larger than zero. Bright fields are zero.

	CGTO1	CGTO2	CGTO3
PGTO1
PGTO2
PGTO3
PGTO4
PGTO5

So the PGTOs are obtained from the .mol file (I'm using Dalton for this) using the read_mol.py script found in the gist since it is used anyways to generate the new basis set. The CGTOs are obtained from an RHF calculation (see the hf.dal file in the gist) using the read_log.py script.

The problem with the design of a new basis set is to choose how many of the PGTOs to use in a CGTO, for instance, why choose three PGTOs in CGTO1? Since the number of PGTOs is not too overwhelming, I've gone ahead and made a general solution so I have the possibility to test all cases. The main problem then boiled down to: given an N x N matrix with zeros (notice that there are many zeros in this matrix), how do I overwrite subarrays (3 x 1 in this case) in Numpy so I can insert a specific number of coefficients to make CGTO1 from PGTO1 to PGTO3? and how to I make CGTO2 and CGTO3 couple with PGTO4 and PGTO5 using identity matrix inserted at the correct spot?

The trick is to make use of Numpys excellent (and very difficult) features known as boolean masking where a matrix filled with True or False will extract the corresponding values and one is able to assign to them. In the end, we need to remove excessive columns and I've found my head can do that easily by doing a transpose of the matrix, remove the unneeded rows (using a simpler Numpy/python logic which I know by hand) and transpose it back.

This was one of the more difficult issues, but at least I can now generate a multitude of basis set calculations and examining my property of interest in a matter of seconds. Numpy rocks, but I think I could grow old trying to learn all the awesome features it has.