Source Code of Bibtex Graph Generator

Now I would like to release the source code of the software described in an earlier post. The package will be released under the GNU General Public License, any release of other software which includes the full source code or parts of it must also be released under the GPL.

The software is running with python 2.7.9 (Python 3 is not supported) and requires PyGraphViz 1.3rc2 and BibtexParser 0.6.2.

The program includes three main data structures: the list of all publications from the Bibtex file, a list of authors, and a list of relations. The latter two are implemented as python classes. As first step, the full bibtexfile is parsed into an array of dictionaries with the package BibtexParser. Second, the author list is created by iterating over all publications and authors in the bibtex database. Third, the relationsship array is created with entries for all combinations of two authors A and B by iterating over the publications list and checking if both A and B are author of a specific publication. The the graph is constructed by adding nodes for each author and edges for all relations. As a last step, the graph is compacted by constraints given as arguments. Two short examples follow to describe such kind of constraints. The parsed bibtexfile is included in the published package.

Starting it without any argument will show the license info and the help.

A Graph Display Software for bibtex databases
 Copyright (C) 2015 Benjamin Laemmle, jdmorise a t
 This program is free software: you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation, either version 3 of the License, or
 (at your option) any later version.
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 GNU General Public License for more details.
 You should have received a copy of the GNU General Public License 
along with this program. If not, see <>. 

usage: "Examples:

optional arguments:
 -h, --help show this help message and exit
-if INPUT_FILENAME, --input_filename INPUT_FILENAME 
          Filename of publication database in bibtex format
-gf GRAPH_FILENAME, --graph_filename GRAPH_FILENAME    
          Filename of graph output stored as png
-ma MAIN_AUTHOR_NAME, --main_author_name MAIN_AUTHOR_NAME
          Only add edges with ERT or more number of relations
          Only add authors with ART or more number of relations
          Only add authors with APT number of publications
-lvl LEVEL, --level LEVEL
-b BEFORE, --before BEFORE     
          Only use Publications before YEAR for the graph
-a AFTER, --after AFTER
          Only use Publications after YEAR for the graph
          Graph Programm for rendering the graph. one of the
          following: fdp,dot,sfdp,circo,twopi.



The first example was created without any filter, just specifying the input database and the output picture:

python -if Darabi.bib -gf graph_plain.png


Now, in the second example the filters are explained in more detail. First, we would like to see only direct relations between an author and anybody else, so only direct collaboration. This is achieved by specifying the “main author” (-ma “A. Mirzaei” )  and the collaboration “level” (-lvl 1). Then, we would like to remove all authors with less than three publications (e.g. P. Suri) by adding an “author publication threshold” (-apt 3). To make it more readable, we also remove all edges with less than three relations by specyfing an “edge relation threshold” (-ert 3), which removes a couple of authors like K. Juan.

python -if Darabi.bib -gf graph_red.png 
-ma "A. Mirzaei" -lvl 1 -apt 3 -ert 3

The reduced graph is shown below with only a limited amount of authors remaining.


Additional arguments include an “author relation threshold”(-art) to remove all authors with only few relations, and you can filter by publication date by specifying a certain range. As a last argument, the graph generation programm can be selected (-gp) where fdp is chosen as default. The GraphViz documentation will give more details about the examples.

The python code can be found together with the example bibtexfile and the license file in my DropBox. Have fun!


Studying Research Dynamics of an Author with Timed Filters

A very interesting topic in research about science is the chronological dynamic of publications. The question of interest is how a specific scientist is developing with respect to his publications and with respect to his network. In order to study publications dynamics, the prloudos_2001esented graph program can filter the publication date of the drawn nodes and edges with a before and after filter.

For presentation purposes, a bibtex database available in the internet was chosen from the Biomedical Simulations and Imaging Laboratory at NTUA, Athens, Greece. The relationsships of George Loudos, now Assistand Professor at the Department of Biomedical Engineering, TEIATH, Greece are studied and plotted for the folllowing three graphs. His scientific career started in 2000, when he had two conference publications.

One year later, he already has 9 publications and a much broader network. On the one hand, we see a group of people which contributed to 7 out of his 9 publications in this period. This seems to be the PhD colleagues ind his group, who either had a close collaboration and a shared topic or who just put their names on each others publications. On the other hand, we see a new connection to authors with a high number of publications on the left side. These seems to be important guys in this field, George Matsopoulos for example was a professor in the field of computer science.


Timed filters offers a powerful and intuititive tool which gives detailed information how an author build up his network and which people are involved in his research. As shown in the last figures, his major buddies can be identified by scientists which share many or most of the publications. Important contributors and Scientist with higher rank are shown by authors with a high number of publications but a low number of relations.

Which kind of research you would like to conduct? Send me your bibtex file and I can present the outcomes in this blog.

Graph Display Software for Author Relationships with Bibtex Files

Now, I just introduced last week the idea of looking at author relationsships by a graph with a short example. In this post I want to further explain my approach to create such kind of figures. The first requirement is a valid base of data in forms of bibliographic information.

As a bibliography file in form a bibtex database was readily available, the first version of my graph display software is working with bibtex files as source. In a later post, it will be shown how databases can be easily created by automatic parsing of google scholar or other sources.

The created python software reads the bibtexfile, creates a list of authors and a list of relationsships. From these two lists, it creates a graph with nodes and edges and invokes the graphviz software to draw the graph and export it as a png file. The python lib pygraphviz is used as interface for graphviz which includes classes for creation and analysis of graphs, nodes, and edges.

An example of a graph for my bibtex database from my phd thesis is shown below. We can see many different authors and a very complex structure of relationships (along with some bugs as authornames are written differently in publications, e.g. Jager and Jaeger).


We see most authors are in a cloud and only some authors have no connection to others at all. Now, in order to get a better visibility of the graph, the following simple filters are available:

1. Authors with number of publication under a specific thmmWave3_51reshold are filtered

2. Authors with number of relations lower than a threshold are removed

3. Edges with lower weight than a threshold are removed.

These filters are very powerful to reduce complexity of the figure. In order to find main contributors to an area, authors with small amount of publications can be removed, which are mostly PhD students with only a short time in research.

Additionaly, a specific author can be marked as main author and only authors with specific neighbourhood level are printed. In this way, only direct neighbors, or  authors over two edges are printed.

Here I wanted to study the network around professor S. Voinigescu from University of Toronto and his direct network and removed all authors with only one publication.

What kind of filters do you want to see for the graph display software? In what kind of research about research are you interested? I’m curious about your feedback.