Some thoughts on being organized
Writing code in order to complete a specific task is often fast and straightforward. sousvide is very much about data analysis tasks that you can perform quickly by exploiting a set of existing tools. Each of the Python files downloadable from sousvide represents one or two hours of coding on my part. That said, organizing and documenting the code takes me another one to two hours for each module. The points below are sort of like vegetables; I promise they are good for you!
- Factor your code well. Especially with exploratory data analysis, you might not know when you first sit down how your code will be organized. You might even begin at an interactive shell like IPython before you start writing code in a separate file. This is all totally fine and normal. Once you have a rough draft of your code, take a few minutes to evaluate whether your code could be better factored, and then refactor your code if relevant.
- Writing documentation for your code is worth the effort. This will help you remember what you did and reuse your code months later when you realize you want to do something similar, or when you want to share your code with someone else. Comments buried within your code are great, but take a step back to write function-level and module-level docstrings. Use docstrings to describe any function-level input and return parameters.
- Sphinx makes it really easy and in fact a pleasure to write documentation. Sphinx was originally developed for Python’s documentation. This and many other websites in the Python community are built with Sphinx, which gives you a simple framework for combining tutorial-style documentation with documentation automatically generated from your docstrings. Sphinx uses reStructuredText; you can see what this looks like by clicking the “Show Source” link on any page of the sousvide website.
- Learn what good code and documentation looks like. I turn to NumPy as a guiding example. I have also read the PEP 8 Style Guide for Python Code and make an effort to follow these guidelines in my everyday code.
- To manage your code, use a version control system like Mercurial (hg), Git or Apache Subversion (svn). I use all of these and others for my work. When I have a choice, I use distributed source control and my system of choice is Mercurial. It is straightforward to use; look at the “Quick Start” on Mercurial’s homepage. Version control is not just for when you are collaborating on code with other people, it is incredibly useful when you are simply collaborating with yourself! I use Bitbucket to host my hg repositories.
- I am totally guilty of not building testing frameworks nearly enough. Learn about Python’s unittest module and nose.
- Once you start organizing, documenting and reusing your code, you might find yourself building libraries that you find really useful. Even if these are small, you may want to consider sharing them with your friends, colleagues, and the rest of the world. :)