Saturday, November 29, 2014

Analyzing NBO Results with IPython Notebook


Introduction

Data processing, analysis, and visualization are routine tasks in sciences and humanities. The vast majority of data science today is conducted through R, Python, Java, Matlab, and SAS. With a plethora of programming languages suited for the data processing and visualization, Python seems the best general purpose functional programming language for such tasks. Indeed, being an interpreted, object-oriented language, Python became very attractive programming tool for application development as well as for use as scripting language for routine assignments in research and education. Python is open source software and it is freely available for a variety of platforms.

In the previous series of blogs related to Natural Bond Orbitals (NBO) (NBO Analysis, NBO Scripting), we have discussed several useful tools for parsing and analysis of text outputs from the NBO program [1-4]. For example, on the web pages dedicated to NBO Scripts and Handy Applications, we learned about pes_nbo3.py Python script. This standalone script was used to extract data from the potential energy scan (PES) output files generated by Gaussian09. In another example, Python Script nodes were included in KNIME workflow to process NBO Dipole Moment results. In general, NBO output consists of formatted tables and summaries describing results of NBO analysis. These results include Natural Population Analysis, NBO Summary, Natural Dipole Moment Analysis, Natural Steric analysis, energy effects based on second-order perturbation theory and other properties. Extracting text sections of such output files is a typical and repetitive task when examining molecular properties by methods of NBO analysis.


Python, IPython, and IPython Notebook

Until now we have worked with Python directly via the system shell by executing Python scripts. To make the work with Python and data easier for scientists, IPython interactive shell was introduced by Fernando Perez early in 2001. IPython provides tools for interactive and exploratory computing in Python making debugging, code optimization, and interactive plotting quite easy and straightforward. Effective use of IPython typically involves third-party specialized packages, such as NumPy, Pandas, and Matplotlib, which allow for analysis of large data sets. More importantly, IPython has a relatively new feature called the “Notebook”. IPython Notebook is a web-based interactive computing platform that combines live code, equations, narrative text, visualizations, images, interactive dashboards and other media. These documents provide a complete record of a computation that can be shared with others. Also, the IPython Notebook is not only a tool for scientific research and data analysis, but also a great tool for teaching. Technically, Notebook launches a web-based shell to an IPython session that has handy features, like the ability to save, edit, and delete lines of code. The code is organized into cells of Python, text, or Markdown. One can move the cells around, develop code interactively with documentation and notes, display objects that a browser can render (e.g., images, HTML, videos) and to share the whole notebook for a collaborative work.

To demonstrate the utility and benefits of using IPython Notebook, I will share such notebooks in the future blogs to explore molecular properties that are based on NBO analysis.


Installing IPython Notebook

Before we dive into the nuances of IPython Notebook, here are a few pre-requisites:

a) As indicated above, using IPython (and the Notebook) assumes familiarity with the Python language. We need to have a Python installation on our computer.

b) We will also need to install IPython and Notebook and a few required packages.

Continuum Analytics offers the Anaconda Python Distribution, which includes all the common scientific Python packages as well as IPython and Notebook. Anaconda itself is free and it comes with many pre-installed data analytics and processing Python packages and libraries. It comes with an excellent package manager named conda. It lets you install easily many modules on most platforms (Windows, Linux, Mac OS X), in 64-bit or 32-bit versions.
To get Anaconda installed under Windows OS, download Anaconda 1.9.1 and run the executable.


This particular version worked best for me under Window 7 system (Python 2.7.6 with IPython Notebook). Typical path to install into is: C:/Anaconda. Download of the latest installer is here.


When done, add the following string into your PATH:

C:\Anaconda;C:\Anaconda\Scripts

To access the PATH editor, right-click on Computer → Advanced system settings → Advanced tab → Environment Variables → System variables → Path.

To check that Anaconda is installed, launch the command shell (cmd.exe) and execute command:

conda info --all

Typical output follows:


Current conda install:


platform : win-32
conda version : 3.7.1
conda-build version : 1.2.0
python version : 2.7.6.final.0
requests version : 2.4.3
root environment : C:\Anaconda (writable)
default environment : C:\Anaconda
envs directories : C:\Anaconda\envs
package cache : C:\Anaconda\pkgs
channel URLs : http://repo.continuum.io/pkgs/gpl/win-32/
http://repo.continuum.io/pkgs/free/win-32/
config file : C:\Users\USER\.condarc
is foreign system : False


In order to simplify installation of other packages in the future, we will install a popular package manager pip;

In the command shell type:

conda install pip

After that, let’s install another Python visualization library seaborn:

pip install seaborn


A detailed description of Anaconda install and setup is described here.
If IPython has been installed correctly, you should be able to run it from a system shell with the IPython command. Launch the command shell (cmd.exe) again and execute the command:

C:\> ipython

We should see output similar to Figure 1 below.

Figure 1    The IPython console

The official IPython documentation webpage at http://ipython.org/documentation.html is the place to go to get some help. It contains links to the online manual and to unofficial tutorials and articles created by the community. The StackOverflow website at stackoverflow.com is also a great place to request help for IPython.


To check that everything is correctly installed, type ipython notebook in the shell. This will launch a local web server on the 8888 port (by default).
Go to http://127.0.0.1:8888/ in a browser and check if you can see the page shown in Figure 2.
Figure 2    The notebook dashboard


The dashboard will list all notebooks and notebook folders that you created or cloned from repositories.
An informative overview of the Notebook UI is here.


Launching IPython Notebooks on Windows

The easiest way to launch a Notebook is to create IPython Notebook shortcut at your Desktop. On the Windows system, right mouse-click the IPython (Py 2.7) Notebook item in Start → All Programs menu and select Send To → Desktop (create shortcut) (Figure 3).
Figure 3    IPython Notebook shortcut from menu 


Icon Properties for IPython Notebook 

For general customization of Notebook shortcuts, here are typical settings. Notebook location can be changed (path in orange)
Icon path: %SystemDrive%\Anaconda\Menu\IPython.ico
Target: C:\Anaconda\python.exe "C:\Anaconda\Scripts/ipython-script.py" notebook
Start in: "C:\Users\USER\Documents\IPython Notebooks\"


Running a standalone script:

One can still run a standalone Python scripts from either the IPython console or from the IPython Notebook cell. While the standard shell command requires syntax:

> python script.py input_file.txt

IPython console or Notebook cell require using the %magic function run:

> %run script.py input_file.txt


Markdown

To decorate plain text in the notebook cell, use the markup syntax published at:
http://daringfireball.net/projects/markdown/ (Figure 6).


IPython Notebook Example

We will conclude this blog article with the “real” example of IPython Notebook written on the topic of processing NBO outputs. In this particular example, we will extract DIPOLE MOMENT ANALYSIS summary from the NBO/GENNBO output file shown in Figure 4. For the sake of consistency, our example will use the molecule of formamide discussed in the KNIME blog. In its "DIPOLE MOMENT ANALYSIS:" section, the standard NBO output file contains the individual x,y,z-components and length of the total molecular dipole moment. Each of the entries is decomposed into the individual contributions of NLMO and NBO bond dipoles. We are going to extract those lines into a *.csv file for the later analysis.
Figure 4    Part of the Dipole moment summary output from the form.nbo file

Refer to the ReadNboDip.ipynb notebook for the parsing algorithm and steps at each cell. To run all steps in the notebook, click the “Cell” → “Run All” at the top navigation bar of the notebook page. Extracted NLMO part of the ‘DIPOLE MOMENT ANALYSIS:’ section is shown in Figure 5. The corresponding file form_dip.csv is saved for further analysis.
Figure 5    Formatted output of NLMO dipoles saved as form_dip.csv

The “pure” Python script ReadNboDip.py which is accomplishing the same task can be downloaded from the nbo-processing GitHub repository. For details on how to clone Git repository, see the next paragraph.

Figure 6    Top part of an exemplary Notebook

Complete IPython Notebook ReadNboDip.ipynb with the necessary files is available at Github. Html version of the notebook is available from here (or if the nbviewer website is unresponsive, use this link).


Working with GIT Repository

To download a local copy of examples and auxiliary files from this blog, you need git, a distributed versioning system. I recommend using one of the GUI applications, such as GitHub GUI for Windows or GUI for other platforms.
GitHub GUI for Windows comes with the icon shortcut to its own shell (Figure 7). To download the standalone Python script ReadNboDip.py and accompanying files, open Git Shell and type:

git clone https://github.com/marpat/nbo-processing.git

This will copy the repository of all processing scripts into a local folder named “nbo-processing”. While you can change the default GitHub directory from the GitHub GUI, path to the default location on Windows is:

Libraries/My Documents/GitHub 

To clone the notebook described in this blog, either:
a) Clone it as git clone https://github.com/marpat/blog.git
b) or go to its Git repository and download it as blog-master.zip file

Again, to view the same notebook as html file in a browser, go to: http://nbviewer.ipython.org/github/marpat/blog/blob/master/ReadNboDip.ipynb

References

1. NBO 6.0. E. D. Glendening, J. K. Badenhoop, A. E. Reed, J. E. Carpenter, J.Bohmann, C. M. Morales, C. R. Landis, and F. Weinhold, Theoretical Chemistry Institute, University of Wisconsin, Madison (2013).

2. E. D. Glendening, C. R. Landis and F. Weinhold, “Natural Bond Orbital
Methods,” WIREs Comp. Mol. Sci. 2, 1-42 (2012)

3. F. Weinhold, “Natural bond orbital analysis: A critical overview of
relationships to alternative bonding perspectives,” J. Comput. Chem. (2012).

4. Further background and bibliographic materials can be found on the NBO website: http://nbo6.chem.wisc.edu/biblio_css.htm

5. Learning IPython for Interactive Computing and Data Visualization, by Cyrille Rossant, PACKT Publishing, e-Book, 2013

6. IPython Interactive Computing and Visualization Cookbook, by Cyrille Rossant, PACKT Publishing, e-Book, 2014


Back to TOP

No comments:

Post a Comment