Saturday, November 29, 2014

Analyzing NBO Results with IPython Notebook


Introduction

Data processing, analysis, and visualization are routine tasks in sciences and humanities. The vast majority of data science today is conducted through R, Python, Java, Matlab, and SAS. With a plethora of programming languages suited for the data processing and visualization, Python seems the best general purpose functional programming language for such tasks. Indeed, being an interpreted, object-oriented language, Python became very attractive programming tool for application development as well as for use as scripting language for routine assignments in research and education. Python is open source software and it is freely available for a variety of platforms.

In the previous series of blogs related to Natural Bond Orbitals (NBO) (NBO Analysis, NBO Scripting), we have discussed several useful tools for parsing and analysis of text outputs from the NBO program [1-4]. For example, on the web pages dedicated to NBO Scripts and Handy Applications, we learned about pes_nbo3.py Python script. This standalone script was used to extract data from the potential energy scan (PES) output files generated by Gaussian09. In another example, Python Script nodes were included in KNIME workflow to process NBO Dipole Moment results. In general, NBO output consists of formatted tables and summaries describing results of NBO analysis. These results include Natural Population Analysis, NBO Summary, Natural Dipole Moment Analysis, Natural Steric analysis, energy effects based on second-order perturbation theory and other properties. Extracting text sections of such output files is a typical and repetitive task when examining molecular properties by methods of NBO analysis.


Python, IPython, and IPython Notebook

Until now we have worked with Python directly via the system shell by executing Python scripts. To make the work with Python and data easier for scientists, IPython interactive shell was introduced by Fernando Perez early in 2001. IPython provides tools for interactive and exploratory computing in Python making debugging, code optimization, and interactive plotting quite easy and straightforward. Effective use of IPython typically involves third-party specialized packages, such as NumPy, Pandas, and Matplotlib, which allow for analysis of large data sets. More importantly, IPython has a relatively new feature called the “Notebook”. IPython Notebook is a web-based interactive computing platform that combines live code, equations, narrative text, visualizations, images, interactive dashboards and other media. These documents provide a complete record of a computation that can be shared with others. Also, the IPython Notebook is not only a tool for scientific research and data analysis, but also a great tool for teaching. Technically, Notebook launches a web-based shell to an IPython session that has handy features, like the ability to save, edit, and delete lines of code. The code is organized into cells of Python, text, or Markdown. One can move the cells around, develop code interactively with documentation and notes, display objects that a browser can render (e.g., images, HTML, videos) and to share the whole notebook for a collaborative work.

To demonstrate the utility and benefits of using IPython Notebook, I will share such notebooks in the future blogs to explore molecular properties that are based on NBO analysis.


Installing IPython Notebook

Before we dive into the nuances of IPython Notebook, here are a few pre-requisites:

a) As indicated above, using IPython (and the Notebook) assumes familiarity with the Python language. We need to have a Python installation on our computer.

b) We will also need to install IPython and Notebook and a few required packages.

Continuum Analytics offers the Anaconda Python Distribution, which includes all the common scientific Python packages as well as IPython and Notebook. Anaconda itself is free and it comes with many pre-installed data analytics and processing Python packages and libraries. It comes with an excellent package manager named conda. It lets you install easily many modules on most platforms (Windows, Linux, Mac OS X), in 64-bit or 32-bit versions.
To get Anaconda installed under Windows OS, download Anaconda 1.9.1 and run the executable.


This particular version worked best for me under Window 7 system (Python 2.7.6 with IPython Notebook). Typical path to install into is: C:/Anaconda. Download of the latest installer is here.


When done, add the following string into your PATH:

C:\Anaconda;C:\Anaconda\Scripts

To access the PATH editor, right-click on Computer → Advanced system settings → Advanced tab → Environment Variables → System variables → Path.

To check that Anaconda is installed, launch the command shell (cmd.exe) and execute command:

conda info --all

Typical output follows:


Current conda install:


platform : win-32
conda version : 3.7.1
conda-build version : 1.2.0
python version : 2.7.6.final.0
requests version : 2.4.3
root environment : C:\Anaconda (writable)
default environment : C:\Anaconda
envs directories : C:\Anaconda\envs
package cache : C:\Anaconda\pkgs
channel URLs : http://repo.continuum.io/pkgs/gpl/win-32/
http://repo.continuum.io/pkgs/free/win-32/
config file : C:\Users\USER\.condarc
is foreign system : False


In order to simplify installation of other packages in the future, we will install a popular package manager pip;

In the command shell type:

conda install pip

After that, let’s install another Python visualization library seaborn:

pip install seaborn


A detailed description of Anaconda install and setup is described here.
If IPython has been installed correctly, you should be able to run it from a system shell with the IPython command. Launch the command shell (cmd.exe) again and execute the command:

C:\> ipython

We should see output similar to Figure 1 below.

Figure 1    The IPython console

The official IPython documentation webpage at http://ipython.org/documentation.html is the place to go to get some help. It contains links to the online manual and to unofficial tutorials and articles created by the community. The StackOverflow website at stackoverflow.com is also a great place to request help for IPython.


To check that everything is correctly installed, type ipython notebook in the shell. This will launch a local web server on the 8888 port (by default).
Go to http://127.0.0.1:8888/ in a browser and check if you can see the page shown in Figure 2.
Figure 2    The notebook dashboard


The dashboard will list all notebooks and notebook folders that you created or cloned from repositories.
An informative overview of the Notebook UI is here.


Launching IPython Notebooks on Windows

The easiest way to launch a Notebook is to create IPython Notebook shortcut at your Desktop. On the Windows system, right mouse-click the IPython (Py 2.7) Notebook item in Start → All Programs menu and select Send To → Desktop (create shortcut) (Figure 3).
Figure 3    IPython Notebook shortcut from menu 


Icon Properties for IPython Notebook 

For general customization of Notebook shortcuts, here are typical settings. Notebook location can be changed (path in orange)
Icon path: %SystemDrive%\Anaconda\Menu\IPython.ico
Target: C:\Anaconda\python.exe "C:\Anaconda\Scripts/ipython-script.py" notebook
Start in: "C:\Users\USER\Documents\IPython Notebooks\"


Running a standalone script:

One can still run a standalone Python scripts from either the IPython console or from the IPython Notebook cell. While the standard shell command requires syntax:

> python script.py input_file.txt

IPython console or Notebook cell require using the %magic function run:

> %run script.py input_file.txt


Markdown

To decorate plain text in the notebook cell, use the markup syntax published at:
http://daringfireball.net/projects/markdown/ (Figure 6).


IPython Notebook Example

We will conclude this blog article with the “real” example of IPython Notebook written on the topic of processing NBO outputs. In this particular example, we will extract DIPOLE MOMENT ANALYSIS summary from the NBO/GENNBO output file shown in Figure 4. For the sake of consistency, our example will use the molecule of formamide discussed in the KNIME blog. In its "DIPOLE MOMENT ANALYSIS:" section, the standard NBO output file contains the individual x,y,z-components and length of the total molecular dipole moment. Each of the entries is decomposed into the individual contributions of NLMO and NBO bond dipoles. We are going to extract those lines into a *.csv file for the later analysis.
Figure 4    Part of the Dipole moment summary output from the form.nbo file

Refer to the ReadNboDip.ipynb notebook for the parsing algorithm and steps at each cell. To run all steps in the notebook, click the “Cell” → “Run All” at the top navigation bar of the notebook page. Extracted NLMO part of the ‘DIPOLE MOMENT ANALYSIS:’ section is shown in Figure 5. The corresponding file form_dip.csv is saved for further analysis.
Figure 5    Formatted output of NLMO dipoles saved as form_dip.csv

The “pure” Python script ReadNboDip.py which is accomplishing the same task can be downloaded from the nbo-processing GitHub repository. For details on how to clone Git repository, see the next paragraph.

Figure 6    Top part of an exemplary Notebook

Complete IPython Notebook ReadNboDip.ipynb with the necessary files is available at Github. Html version of the notebook is available from here (or if the nbviewer website is unresponsive, use this link).


Working with GIT Repository

To download a local copy of examples and auxiliary files from this blog, you need git, a distributed versioning system. I recommend using one of the GUI applications, such as GitHub GUI for Windows or GUI for other platforms.
GitHub GUI for Windows comes with the icon shortcut to its own shell (Figure 7). To download the standalone Python script ReadNboDip.py and accompanying files, open Git Shell and type:

git clone https://github.com/marpat/nbo-processing.git

This will copy the repository of all processing scripts into a local folder named “nbo-processing”. While you can change the default GitHub directory from the GitHub GUI, path to the default location on Windows is:

Libraries/My Documents/GitHub 

To clone the notebook described in this blog, either:
a) Clone it as git clone https://github.com/marpat/blog.git
b) or go to its Git repository and download it as blog-master.zip file

Again, to view the same notebook as html file in a browser, go to: http://nbviewer.ipython.org/github/marpat/blog/blob/master/ReadNboDip.ipynb

References

1. NBO 6.0. E. D. Glendening, J. K. Badenhoop, A. E. Reed, J. E. Carpenter, J.Bohmann, C. M. Morales, C. R. Landis, and F. Weinhold, Theoretical Chemistry Institute, University of Wisconsin, Madison (2013).

2. E. D. Glendening, C. R. Landis and F. Weinhold, “Natural Bond Orbital
Methods,” WIREs Comp. Mol. Sci. 2, 1-42 (2012)

3. F. Weinhold, “Natural bond orbital analysis: A critical overview of
relationships to alternative bonding perspectives,” J. Comput. Chem. (2012).

4. Further background and bibliographic materials can be found on the NBO website: http://nbo6.chem.wisc.edu/biblio_css.htm

5. Learning IPython for Interactive Computing and Data Visualization, by Cyrille Rossant, PACKT Publishing, e-Book, 2013

6. IPython Interactive Computing and Visualization Cookbook, by Cyrille Rossant, PACKT Publishing, e-Book, 2014


Back to TOP

Saturday, August 30, 2014

Potential Energy Scan and NBO Analysis


Introduction
NBO analysis of orbital interactions is a powerful approach to understanding molecular properties. Subtle interactions can be often revealed by studying the NBO-derived properties at the potential energy surface (PES). By plotting the potential energy changes along the defined molecular coordinate (atom distance, angle, dihedral), one can get a better appreciation of different components of the total energy.


In this post, we will explore a typical workflow utilizing Gaussian09W ESS to generate multiple molecular structures at the PES. We will introduce a new Java application that processes the Gaussian PES output files (*.out) and creates formatted Gaussian input files to perform a single point NBO analysis (Figure 1). In the simplest case, Gaussian Scan keyword performs a series of single point calculations at various geometries. This, so called rigid PES scan performs evaluations over a rectangular grid involving selected internal coordinates. The molecular structure must be defined in Z-matrix coordinates. To avoid potential troubles with a poorly constructed Z-matrix, NboScan app can only read output files generated by the relaxed PES scan initiated with the keyword Opt=ModRedundant. In such case, the keyword requests that a geometry optimization (Opt) be performed in redundant internal coordinate that may include scan and constraint information [2]. The latter option requires a separate input section to activate, freeze, and scan PES. Initial geometry can be supplied as Cartesian coordinates or as Z-matrix. Results are recorded in the job output file. Each PES output file contains table of "Initial Parameters", which indicate the name and the value of Scanned and Frozen coordinates. Atom coordinates together with the corresponding energy at each coordinate increment are all part of the output file. Efficient retrieval of all requested values is often challenging task, certainly not error proof. It is where the NboScan app comes handy.

To avoid potential troubles with a poorly constructed Z-matrix, this app can only read output files generated by the the relaxed PES scan initiated with the keyword Opt=ModRedundant.
Fig. 1. A typical workflow for PES calculation using NboScan.


N-Methyl Formamide Example


We will start with an example of rigid rotation of the methyl group in cis-N-methylformamide (CH3NHCHO, cis-NMF). Results can be compared to the example discussed in the excellent book of Frank Weinhold and Clark R. Landis [1] on pages 141-142. The molecule of formamide was evaluated at HF B3LYP/6-311++G** level of theory. Gaussian09W was used for the relaxed PES evaluation with additional constraints. The input .gjf file is shown in Figure 2. All  discussed files can be downloaded from the Download section at the bottom of this post.

Fig. 2. Gaussian input in Cartesian coordinates with added redundant coordinates.

%chk=Formamide_rot_recoor.chk
# rhf/6-311++G(d,p) opt=modredundant nosymm

RHF/6-311++G(d,p) (MeHNCHO) Me rot 2-1-6-7= 0 to 60

0 1
N    0.00000000    0.00000000    -1.36100000
C    0.00000000    0.00000000     0.00000000
O    0.99400000    0.00000000     0.69340000
H   -0.85910000    0.00000000    -1.88640000
H   -1.02320000    0.00000000     0.42260000
C    1.28045663    0.00000000    -2.08303243
H    2.08663777    0.00000000    -1.37948999
H    1.34338205   -0.87365131   -2.69758412
H    1.34338270    0.87365135    -2.69758400

6 * F      Freeze bond lengths around the atom 6
1 * F      Freeze bond lengths around the atom 1
* 1 * F      Freeze bond angles around the atom 1
* 2 * F         Freeze bond angles around the atom 2
* 1 6 * R      Relax coordinates passing through atoms 1-6 (dihedral)
D 2 1 6 7 S 12 5.0      Scan dihedral 2-1-6-7 in 12 steps of 5 deg

Upon completion of the run, results of the nearly rigid methyl group rotation are written into the corresponding .out file.

  Extracting coordinates and energy data:


In the next step, we will launch the NboScan app by double-clicking the NboScan.jar file and load the above created .out file (Figure 3).
Fig. 3. Main window of the NboScan application.

Click on button 1. Browse Dir to locate the file. Set the output directory for newly created Gaussian input files (.gjf) using the 2. Set Dir button. Alternatively, check the Use input Dir box to direct output into the same directory (Figure 4).


Fig. 4. Loading files for processing and setting the output directory.


Set parameters for the following single point energy calculation in the section 3. Those include the level of theory, basis set (pull-down menus) and additional comments and keywords to the route card. Alternatively, type in the custom level of theory and basis set.

Fig. 5. Entering parameters for single point energy calculation at a given geometry.


Optionally, check the Plot profile box to output the energy profile along the scanned coordinate (Figure 6). This will plot the total electronic energies at different dihedral angles (0-60 deg).

Fig. 6. Energy profile along the scanned coordinate (dihedral 7-6-1-2).

Enable/disable additional options at the bottom of the window. Energy values at calculated points can be retrieved by pointing the mouse at the blue diamond. Axis values can be toggled between horizontal/vertical.

Section 4. involves entering NBO keywords or selecting the preset NBO properties (Figure 7). For more complex keyword lists, use the GennboHelper application described earlier and the Enter custom keywords. The latest version of the GennboHelper can be downloaded from the app’s website.

Fig. 7. Setting NBO keywords and Gaussian-NBO options.

Choosing the PLOT files only option will generate the corresponding NBO keywords necessary to create PLOT files for each scanned geometry. The last option, Set user 1, allows entering custom keywords that are retrieved next time when running the application.

Instructions to Gaussian program on how to perform NBO calculations are selected on line 5.

  • G03 option uses older NBO3 module compiled as part of Gaussian 03 package.
  • G09-link-NBO6 option adds keywords “EXTERNAL” and “pop=nbo6read” into the G09w route card and links NBO6 Windows binaries to G09w (revision D.01) [DEFAULT]. This option requires having gaunbo6.bat file (part of the NBO6 package) in the main Gaussian directory (e.g., C:\G09W\gaunbo6.bat).
  • G09-NBO option adds NBO6 specific keywords (pop=nbo6read) into the route card of G09w, which had NBO6 compiled together with G09w binaries.
  • .47 file only option instructs ESS to generate .47 archive file for later analysis by GENNBO5/6. This is equivalent to NBO keywords ARCHIVE FILE=xxx.

The message window at the bottom provides information related to the key scan parameters, such as, name of the scanned coordinate, its description in the output file, number of extracted geometries, and total number of atoms in the molecule. Path to files created by NboScan application is also part of the output.

To facilitate the next step, that is, evaluation of each geometry with specified NBO keywords, Gaussian batch file (*.bcf) is created at the end of the run. Just drop the batch file into the main Gaussian09w window and click the “run” triangle icon (Figure 8).


Fig. 8. Starting the batch process in main interface of Gaussian09w.


Files created after the run are shown in Figure 9. Each created file has a suffix indicating value of the scanned coordinate.
Fig. 9. List of input and output files.


A representative example of the newly created series of .gjf files is shown in Figure 10. Note the comment field referring to discrete geometry and the value of scanned coordinate (_10_, _20_, etc.).

Fig. 10. An example of created .gjf file.

References:


1. Weinhold, F. and Landis, C.R. in Discovering Chemistry with Natural Bond Orbitals
2. Rigid and Relaxed Potential Energy Surface Scans (PES Scan) in Gaussian 03 and Gaussian 09, J. Barroso.


Downloads:


NboScan java application: NboScan.zip v1.3
NboScan java app. (Linux-Ubuntu, SUSE): NboScan1.3Lin.zip v1.3
NboScan manual: NboScan_manual.pdf
Input and output files (.out, .gjf, .bcf): io_files.zip
Latest updates are at: NBO Apps and Scripts webpage

Back to TOP