rensa.co | writing

The Shell Pitch

There are a million and one tutorials and introductions to *nix shells out there. This one starts with a pitch aimed at those people who have no idea what I just said.

If you already code in three languages, run scientific analyses on a remote computer or hack power stations or something, you probably don’t need The Pitch—you’re already well aware of what working with command line tools can offer you.

But there are plenty of researchers for whom this isn’t the case. When I started honours, I avoided command lines like the plague. Instead of using a proper data analysis tool, I tried to open eight gigabyte sets of health records in Excel. Instead of using our department’s awesome servers and NCI’s awesome supercomputer to crunch numbers, I tried to do analysis work on my own computer, even when it wasn’t up to the job (especially on top of twenty browser tabs).

That’s a shame, because as soon as you start to get comfy on the command line, a lot of workflow possibilites start to open up. Aside from making you feel like a super awesome hacker (and making you look really smart if someone ever interviews you in front of your computer), mastering the command line will make you a better researcher in a number of ways.

“This is a great figure! Can we see it with these three parameters spread over a range of values?”

Even if your work concerns samples or other Real Things, you’ll eventually want to do some sort of analysis. And that analysis is going to spit out figures, or numbers, or lots of figures and lots of numbers. It might just be one figure at first, but then you realise you want to tweak another variable, so you run off a couple of other versions. Then you tweak another, and four figures becomes 16 figures. Prety soon, you’re trying to rename files in an output folder that looks like this:

Renaming a bunch of files using Finder. You value your time more than this, right?
Renaming a bunch of files using Finder. You value your time more than this, right?

If If you’re doing things at a computer ‘one at a time’, there’s probably a better way to use your time. To wit:

Renaming thouse same thousand files in bash in about ten seconds (because, shamefully, I never learned to touch type). Better, huh?
Renaming thouse same thousand files in bash in about ten seconds (because, shamefully, I never learned to touch type). Better, huh?

You can see how even this basic kind of automation makes more ambitious analysis possible and manageable (though sometimes it isn’t actually worth it).

Final_2.docx

Papers are rarely single author affairs, even if your other authors are primarily giving you feedback on the manuscript. Which means that, if you’re writing in Word, you’ll sooner or later end up with a folder that looks like this:

File revisions in my Dropbox. This is totally the final version, though.
File revisions in my Dropbox. This is totally the final version, though.

Sure, Word now has track change, and you can work online with others now. But sooner or later you’re going to want a record of who changed what, when they made the change and why they made it.

Programmers have been thinking about this problem for a long time in the context of big software projects, where a semi-colon in the wrong place screws everything up. Now I use git, a command line tool, to track changes to my code and deploy it to other computers. If I ever need to figure why my code stopped working, I can see what has changed and fix things:

History of changes to my project, including what I was thinking when I made changes and where they were. Phew.
History of changes to my project, including what I was thinking when I made changes and where they were. Phew.

I also don’t have to worry about mixing up code from different projects anymore (which happens disappointingly often when you have /phd-code/master.r and /resp/master.r).

Power level over 9000

Sooner or later your awesome laptop, which has served you well since forever, isn’t going to cut it. No matter how efficiently you work, one day you’re going to want to crunch gigabytes—or hundreds of gigabytes—of data, or you’re going to want to do some tricky maths that the li’l laptop that could just couldn’t.

Universities and other organisations have their own computers for this reason. Just as you access Google’s computers through a browser to get cat videos and the university’s computers to get your email, you can log in to a departmental computer to do analysis tasks with their rather more powerful computers. Hell, you can even rent your own computers for those times when you need extra grunt.

Separating the brains from the keyboard and screen is a Big Deal. Working wherever you are becomes a lot easier, provided you have a good Internet connection (and rather more dangerous as far as work-life balance is concerned). You can set a supercomputer onto a task that might take days and get an email on your phone when it’s done. Eventually it becomes so convenient to work remotely that you stop crunching on that laptop altogether.

I know this sounds scary. When I got started, all I knew of shells came from when they were illuminating Chris Hemsworth’s lovely face. But this isn’t black magick; you don’t need to be a computerey type to pick it up.

Plus, a lot of academic software only runs on the command line, so let’s face it: you’re getting pushed into this pool one way or another. Let’s get started, shall we?

Update: I did Software Carpentry’s beginner shell course this week and was blown away by the quality of the materials. If you’re a researcher or other non-coder considering the shell, I’d recommend them without hesitation. Have a look at their course and see what you think!

Software Carpentry’s Shell Tutorial

James Goldie

PhD student (climate change + health). I also code and build things for fun and for social change.

Read Next

Temperature and humidity effects on hospital morbidity in Darwin, Australia