Ten days of code

Open code in academic computational science should be the standard. After all, the scientific ideal is to share information toward progress. This is at least an idealistic view of why we publish so competitively, with standards that demand we share our findings with our peers and with the public. And while the computational methods of my most recent joint experimental/computational paper far exceeded the length of our experimental methods explanation, the rather large step of implementation of our model into a numerical scheme is non-trivial (at least I’d like to think so).

But at the moment, I’ve decided it’s not time for me yet to fallback on the simple solution of just “releasing the source code”, because there is a right and wrong way of doing this. The right way includes clean up, documentation and licensing, and investigating these are both time consuming and difficult. In the world of commercial software, there is also a great deal of platform-dependent testing that is required, and I do not have the resources or time to do all of that. Will my MATLAB code run on another MATLAB? It might, but it might not unless several things are written in a platform-aware way, such as the writing of files. I’ve noticed variations of MATLAB behavior on Mac OS X versus Linux, largely with figure handles and GUIs. Will these elements of my program translate appropriately? How can I ensure this?

While these are certainly not MATLAB-specific software issues, there is a greater issue at play regarding who can run this code. MATLAB is not free (gratis) software; in fact it is quite expensive. Its free (gratis and libre) and open source alternative, Octave, could not handle my early attempts at rewriting this project due to memory and stability issues. Additionally, its performance is behind MATLAB.

But as I prefer open-access journals for their ability to be reached without a privileged gateway by any human being on the planet with an Internet connection, I would also like my code to be able to be run on virtually any machine in the world, for free. Granted, the number of people who would be interested in my codes may be limited to at least one of my officemates, but there’s something about the idea that a poor but gifted child halfway across the world could possibly use my code to understand my bit of science better than I.

I tried to move from MATLAB to Octave once, and I failed. For so many reasons — openness, consistency, stability, performance — I’ve wanted to move away from proprietary software into the realm of free, open source software with a lower level language. I’m finally making my first real steps toward doing that.

In the past two weeks, I’ve learned how to implement my network in C++ using many of its powerful constructs short of classes. I’ve learned how to implement a gamma frequency generating Hodgkin-Huxley network, for which I wrote a general numerical solver, learned how to optimize memory, used a pretty cool makefile, and passed around pointers like cigars while celebrating a newborn. I’ve stumbled upon SegFaults with gigantic 2D arrays, I’ve cursed at the Mersenne twister random number generator, and I’ve forgotten more than one ampersand or asterisk. Two weeks ago, I didn’t know how to use arrays of structures, header files, or a linker. Now, implementing features that get me closer to my mature MATLAB codebase is getting faster each day.

But unfortunately, I don’t have a lot of leisure time to be spending on getting everything just right. I’m trying to graduate very soon, and I need to be job hunting, thesis writing, and most importantly, finishing my research. I believe this is an important and necessary detour for that, since I became memory-bound by my MATLAB code at 10 GB of RAM (while my machines have from 2 GB to 10 GB each). The best solution for my memory problems is not more silicon; it’s better code. My ideal simulation environment is going to be a combination of tools from C++, python, GNU tools such as make, and vim. Don’t worry; I already know my trusty vim.

I have given myself 10 days to finish the coding of this project. On day 1, I wasted most of the morning trying to figure out whether or not I could use regex to read my parameter file. I then learned about strings and figured out how to use all the class methods to do what I needed to do. I wrote and tested all of the functions necessary to read parameters and place them in the appropriate variables. I also figured out how to use dynamic arrays, which will be useful when I need to parameterize the numbers of elements in my arrays. It turns out, my obsession with using pointers paid off, as nearly all of my dynamic arrays were a simple matter of changing the syntax of the creation slightly. I think there might be the added benefit of moving several large arrays from the stack to the heap, though I don’t fully understand what that means yet.

It’s 1:37 am on Day 2. I have to figure out my priorities, but I need to finish my conductance implementation for n x m cells, get my output right (and flexible), find spike times, learn python and matplotlib, write my analysis code, scale up my network, test everything, and get back to where I was with my research.

I have ten nine days.

Advertisements

One thought on “Ten days of code

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s