Wednesday, August 8, 2012

Fun and a little disappointment with amino acids

So you have Open Babel, you have the Python bindings to Open Babel and you think: "I just want to have fun!".

I wrote a small script to generate the 3D structures of the amino acids because I wan't to investigate how well the MMFF94 force field can calculate the charge of them. I've done this before for entire proteins, but here I just want to do it one amino acid at a time. To challenge myself, I cannot rely on the excellent (o)babel tool of the Open Babel package when generating the structures. All code is on github, but have shared it on the blog too via select gists.

My SMILES for the amino acids are saved in a dictionary:

and to simply generate the structures I've updated my obutil library with the following utility functions

Finally, to generate the structures I simply run the following code

I have a separate script to calculate the charge of a molecule

The table below is the result of the above. The images are generated using the babel tool with the -opng option.


Amino AcidSMILESChargeDepiction
GLYNCC(=O)O0
ALANC(C)C(=O)O0
SERNC(CO)C(=O)O0
THRNC(C(C)O)C(=O)O0
CYSNC(CS)C(=O)O0
VALNC(C(C)C)C(=O)O0
LEUNC(CC(C)C)C(=O)O0
ILENC(C(C)CC)C(=O)O0
METNC(CCSC)C(=O)O0
PRON1C(C(=O)O)CCC10

PHENC(Cc1ccccc1)C(=O)O0
TYRNC(Cc1cc(O)ccc1)C(=O)O0
TRPNC(Cc1c2ccccc2nc1)C(=O)O0
ASPNC(CC(=O)O)C(=O)O0
GLUNC(CCC(=O)O)C(=O)O0
ASNNC(CC(=O)N)C(=O)O0
GLNNC(CCC(=O)N)C(=O)O0
HISNC(C(=O)O)Cc1cncn10
LYSNC(CCCCN)C(=O)O0
ARGNC(CCCNC(=N)N)C(=O)O0
In this case, all charges are zero on all amino acids. Notice that I did not make use of Open Babel trying to assign Hydrogen for a specific pH as that is not what I want to test. Instead, notice in the aminoacids.py file listed above there are specific patterns for obtaining (de)protonated states of a few select amino acids: ASP, GLU, HIS, LYS and ARG.

Running the (de)protonated molecules above gives the following table

Amino AcidSMILESChargeDepiction
ASP-NC(CC(=O)[O-])C(=O)O-1
GLU-NC(CCC(=O)[O-])C(=O)O-1
HIS+NC(C(=O)O)Cc1c[nH+]cn1+1
LYS+NC(CCCC[NH3+])C(=O)O+1
ARG+NC(CCCNC(=[NH2+])N)C(=O)O0

The disappointing thing here is that the protonated Arginine is incorrectly assigned a charge of 0 (zero). Edit: As noted by Jan, the depiction of the protonated ARG, ASP and GLU is also wrong.

Edit2: It appears the problem is also present in the SVG depicter.

Maybe it's time to submit a bug-report?