Monday, 22 February 2016

Weekend Experiments with #golang

So for a while I've been dabbling with google's go. For many years I was enamoured with the transputer as it appealed to many things in the hardware engineer in me (that is most of me) and just looked like the right solution. It was just a shame that no-one used it. Many of the reasons it wasn't used could be placed at the door of software language support.
Fast forward to me discovering go late last year and the fact that it seems to be a very popular language in many communities meant I had to try it. At last a concurrent language with some momentum behind it.. Initial experiments were very positive but also surprisingly negative too. To frame my answers let's do some backgound on my software philosophy.

Of course I grew up learning Basic. What else does a child of the 80s who was a computer monitor at school learn? Then at college I was exposed to Pascal and Visual Basic for GUI design and I was happy. I quickly gained the philosophy that if you were hacking an app together then ease of development was what mattered. For anything where you cared about performance you'd profile your problem and hand code those in assembler. So between VB, Pascal and Assembler I was happy
Then at Uni we had C forced down our neck. I hated it. It seemed a massive step backwards. To this day I still say it is not a high level language, it is a medium level language. Sure it's somewhat portable between architectures and were I to write a compiler for a new CPU architecture then C would be my first step.
But I was aghast that people wrote whole systems in C, or as I discovered when I started work, pretty much everything that mattered was written in C. This seemed crazy, it was the wrong tool. It was premature optimisation. So I spent the next decade or so convinced that the whole world of software was crazy and gave up on it and dug into hardware.

Now as a hardware engineer you spend a lot of time fighting tools. You can automate your tools with TCL (Which of course we did) and interpret the stupid log files you end up with with Perl. Using TCL was enough to convince me that hardware engineers should not be allowed to get involved with software but Perl I always had a soft spot for especially after learning how to use it in an object oriented manner. Finally all those university lectures made sense as to why people would want to do this. I think in the past I had been thinking about it from an assembler point of view - I wanted to understand the execution and the data structures but there was a certain freedom in not worrying about how things would appear in memory and just describing the problem. I even write GUIs in Perl by playing with TCL/TK and I could solve any problem I wanted to. The fact that I wrote a Verilog Parser and initial state evaluation tool in Perl (i.e. I had to elaborate and simulate time slice 0) should be proof that I had a way to solve any problem

But then I started to hit some severe performance problems. As a hobby project I wrote a countdown solver - you know the numbers game on countdown where you are given 6 numbers and have to use them to calculate a 7th - well I wrote a solver for it by brute forcing every combination of numbers added subtracted multiplied and divided in every valid combination. The problem was the run time, to do 5 numbers took about 5 minutes. 6 Numbers wasn't even worth contemplating.
And then at that time I discovered Go. Re-wrote the program in go and it was solving 6 numbers in under a second. It could exhaustively do all combinations of 6 numbers in under a minute. I was very impressed but of course wanted to improve the speed still further. This was a trivially parallelisable problem, yet as far as I could see with a quick check of top it was only running on a single CPU. It appeared the limit was in the memory allocation/garbage collection. The nature of the algorithm was that I was acquiring lots of data structures from the heap and relying on the garbage collection to tidy up after me. I could have manually reference counted and therefore worked around this issue, but then the program would have lost most of its elegance and so I decided to leave the problem alone and call this a lesson learnt in the sort of trivially parallel problems Go could trivially help with.
(As an aside I do plan to go back to this at some point because it's an itchy feeling knowing that it could be improved)

This weekend's hobby project was to re-write one of my file management scripts. Originally written in perl it's quite simple; calculate a hash for each file and if two files have the same hash assume they are the same file and therefore make a decision as to which of them to delete based on their location in a file tree and delete the duplicate. It also does some automatic renaming if it finds cruft in the filename (depending on the file type). It would then also work with the backup tool (unison) to make sure the correct things were backed up.
This script was great it was the perfect problem for perl to solve except that on my main picture folder it could take the best part of a day to run. This was because it was calculating the MD5 hash of each file; so my first attempt to speed things up had it saving an xml file in each directory with the hash in there together with file size and inode number. That way it was easy to check if the file had changed and needed it re-calculating. This has so far proved a fine solution.
However there was still that nagging problem of 100% single CPU usage and here was me with the hammer that is golang parallelism wondering if I could get an easy factor of 8 increase (or more if golang did file IO better). For all I knew Perl's MD5 algo was laughably inefficient and I had lots of faith that Go's performance would be natively much better here.
I didn't appreciate how easy Perl made these kinds of things. After a weekend's work I have a program that does about 70% of what the perl script did (no backup integration, no auto-renaming) but is about 450 LOC to perl's 280. Part of this is probably me still becoming fluent with the language and trying to write better less hacky code the second time around. Part of it is certainly a function of the language. Because for example you have to declare everything properly in Go there is a lot of declarative code for data structures that just isn't needed in Perl. Regular Expressions are much more cumbersome to work with and for my purposes the xml handling is more bulky than the Perl equivalent.
Performance wise this is a lesson in correctly understanding the problem in the first place before you start writing code. The code is no faster in any measurable way. Let's make this very clear, I run a Windows desktop (8 core 3.5Ghz machine 16G ram, Gigabit ethernet etc) and a Raspberry Pi2 for Linux, which for all my love of 2836 is not the fastest data engine on the planet. I normally run all these things on Linux but I've started doing Go development work under Windows but then in production things always run on the Pi(easy to remote access and more power efficient). Go is very attractive here because it lets me be cross-platform with ease.
Now how on earth were they showing the same runtime? Go's Pprof on the problem was really enlightening 100% of the CPU time is spent waiting in the runtime for an external program. That is to say the MD5 and all the regex-ing was so fast it didn't even register. Slightly more digging showed this was the network file reading.
SHIT!
Going back the the Perl script it was definitely showing in top as using 100% of the CPU. (well reported as 25% but it's a quad core machine). So I wrote a test script that took a (large) file and read it in and wrote it straight to /dev/null. Also uses 100% of the CPU.

So after all it turned out I was not CPU limited by network limited. It's just that Perl was using 100% CPU regardless whereas Go was releaseing the CPU and ended up consuming about 7% for the same workload). Looking at the data transfer rates regardless of who reads it, it is my NAS box that was setting the limit. If I had just done the calculation at the start of this project at how long it theoretically took to read all that data I would have seen the 200Mbit/Sec that I have previously established as the upper limit for my NAS. But I neglected to do the research so spent bout 12 hours working on it.
Ah well you know what they say 6 months in the lab can save you 6 hours in the Library.

So to sum up, of the two CPU limited problems I've thrown at Go so far they have both come back as running in a none concurrent fashion (well using a single CPU thread) limited not by the CPU usage of the algorithm but by some other factor (GC or IO). There is no doubt that Go has produced a solution that is superior in terms of resource usage and maximising throughput, but it also took more work to get to the same eventual performance.
My next planned investigation is to leverage my Imaging knowledge and produce my own version of an ISP. I was thinking of using  DCRAW  as a starting point and porting it into golang and maybe into OpenCL depending on how it goes. If that doesn't give me a proper CPU bound problem to optimise then I give up!