Open Source Software for Scientists

Open Source Software for Scientists

Over these past two years, I have gotten familiar with a range of open source tools (that means free!), that have greatly helped me to cope with my daily work. They are many, and each one of them definitely deserves its own blog entry. However, we don’t have that much time, so I will describe the most important ones. If you think they might be useful, I suggest you take a look.

Writing: LaTeX, Vim, Pandoc

LaTeX is a typesetting system, very popular among mathematicians due to its wide support of mathematical formulae. As opposed to the common approach “what you see is what you get” (WYSIWYG) kind of writing tools (such as word), in LaTeX the files are text files, and the formatting options are embed in the text. This means that the content and the style are separated, which saves me hours of tinkering with tables and whitespace. LaTeX also has a very consistent handling of intra-document references and bibliography, whereas Word usually sucks at this. Another advantage is the inline creation of scalable vector graphics, and the ability to use them. (https://www.latex-project.org/).

Vim is arguably the most powerful text editor ever created. Ubiquitous in any Linux system, it has been used for over 30 years and it is still the tool of choice for hard-core programmers. If your workflow involves heavy editing of text files (this includes data), I suggest you give it a try. It does takes a while to get used to it, but once you do, you become one being with the computer (http://www.vim.org/).

Pandoc is a simple command line application that allows for easy converting of text files between document formats: LaTeX, HTML, Markdown, Word and many more. It may come in handy for a thousand reasons (http://pandoc.org/).

File management: Git, Bash shell, sed & awk

Git is the best software for version control available. Also a command line tool, it is able to keep track of your work, create an automatic back-up in an external server, and revert it to previous versions if something goes wrong (https://git-scm.com/).

The bash shell is the most widely used command line, found in UNIX computers (Apple and Linux). In windows it can be obtained through applications like Cmder (http://cmder.net/). To explain it somehow, it’s like removing the limitations of interface-based operative systems, gaining access to a whole new world of automation. Commands like “sed” and “awk” for example, enable you to search for words inside documents, modify them, perform operations with them, export results to external files, and much more.

Graphics: SVG, inkscape, TikZ, Matplotlib, D3js

Scalable Vector Graphics (SVG), is a graphic format that allows you to maintain perfect resolution, no matter how much you zoom in, at minimum file size. Unlike bitmaps, where every dot in the picture has to be stored, in SVG the drawing is a set of equations that describe the image. This should be your preferred method to create plots and figures. Inkscape is the Photoshop equivalent of SVG, completely free (https://inkscape.org/es/). Another good option is TikZ, a LaTeX package for creating graphics programmatically in an intuitive manner (http://www.texample.net/tikz/).

For publication quality graphics, I recommend the plotting library of python, a popular programming language for scientific purposes. It allows you to create almost any kind of graphic, and customization is borderline infinite (https://matplotlib.org/index.html).

But if you are after hard-core data visualisation, the state of the art is no doubt 3Djs, a JavaScript library for creating breath-taking interactive data visualizations (https://d3js.org/). This, however, requires a lot of effort, and knowledge of HTML, CSS and the JavaScript programming language.

Data crunching: Python, R, Octave

Excel is usually the “go to” statistic tool for most people. I would need a book to explain all the reason why any programming language with scientific libraries and a decent interface (like any of the above), is far better in most tasks. Now, which one to choose really depends on your needs. Python is better suited for general purpose programming, data-science and system modelling (https://www.python.org/). R excels at statistics (https://www.r-project.org/), and Octave draws its strength by sharing the same syntax as MATLAB (https://www.gnu.org/software/octave/). You will have to research which one is more appropriate for your needs. If all of them can be used, I feel that Python is the one with most potential and reusability.

Presentations: beamer, Prezi, reveal.js

I admit that Power Point is an excellent tool when used properly. However, for some cases when you really need to shine, having better alternatives may come in handy. Beamer is the LaTeX package for creating presentations. A presentation with Beamer includes automatically updated progress bars and sections, compatibility with SVG, and all the magic and professionality of LaTeX (https://www.ctan.org/pkg/beamer).

Another option is Prezi. An online platform for presentations that gives you a gigantic canvas, and lets you fly through it as you present. It can look dazzling with the right topic, especially with touching histories, or when different parts of the presentation relate to a bigger, unifying part. Otherwise it can be a distracting rollercoaster (https://prezi.com/login/).

Reveal.js is a framework for creating beautiful presentations using HTML, CSS and JavaScript (https://github.com/hakimel/reveal.js). Say you have built an impressive tool for data exploration in JavaScript and would like to embed it into your presentation for maximum coolness. Or maybe you want an interactive presentation that resembles more of a workshop. If this is the case, reveal.js is your tool. But careful, this is extremely time consuming compared to what a normal presentation would be. Awesomeness comes at a prize.

GIS: Quantum GIS
Finally, Quantum GIS, an open source GIS that comes with a Python console to create your own applications and automate your workflow. QGIS is part of Atkins innovation programme, state of the art in efficiency (https://www.qgis.org/es/site/index.html).

Summary

As you can see, there is a whole world of open source tools available. All of them are free, which means that you can install them anywhere, anytime. They are constantly being developed by a friendly community of users, and are fully customizable to your needs.

I also try to use tools that work well with each other. For example, LaTeX, Beamer and TikZ are part of the same software. Vim will speed up your programming with LaTeX, Javascript and Python. Matplotlib and Python can be learnt together, just like D3js and reveal.js. The bash shell uses programming skills too, and is perfectly complemented by Vim. It is also needed for command line applications like Git and Pandoc, which in turn will help with your scripts (backing up data and translating them to other languages). QGIS uses python to automate workflows as well. Beamer and LaTeX work well with SVG, easily generated by inkscape… You get the idea ;)

These tools require patience, time and dedication. But once you master them, trust me, you will never regret it.