Double You

Monday 22 February 2016

Weekend Experiments with #golang

So for a while I've been dabbling with google's go. For many years I was enamoured with the transputer as it appealed to many things in the hardware engineer in me (that is most of me) and just looked like the right solution. It was just a shame that no-one used it. Many of the reasons it wasn't used could be placed at the door of software language support.
Fast forward to me discovering go late last year and the fact that it seems to be a very popular language in many communities meant I had to try it. At last a concurrent language with some momentum behind it.. Initial experiments were very positive but also surprisingly negative too. To frame my answers let's do some backgound on my software philosophy.

Of course I grew up learning Basic. What else does a child of the 80s who was a computer monitor at school learn? Then at college I was exposed to Pascal and Visual Basic for GUI design and I was happy. I quickly gained the philosophy that if you were hacking an app together then ease of development was what mattered. For anything where you cared about performance you'd profile your problem and hand code those in assembler. So between VB, Pascal and Assembler I was happy
Then at Uni we had C forced down our neck. I hated it. It seemed a massive step backwards. To this day I still say it is not a high level language, it is a medium level language. Sure it's somewhat portable between architectures and were I to write a compiler for a new CPU architecture then C would be my first step.
But I was aghast that people wrote whole systems in C, or as I discovered when I started work, pretty much everything that mattered was written in C. This seemed crazy, it was the wrong tool. It was premature optimisation. So I spent the next decade or so convinced that the whole world of software was crazy and gave up on it and dug into hardware.

Now as a hardware engineer you spend a lot of time fighting tools. You can automate your tools with TCL (Which of course we did) and interpret the stupid log files you end up with with Perl. Using TCL was enough to convince me that hardware engineers should not be allowed to get involved with software but Perl I always had a soft spot for especially after learning how to use it in an object oriented manner. Finally all those university lectures made sense as to why people would want to do this. I think in the past I had been thinking about it from an assembler point of view - I wanted to understand the execution and the data structures but there was a certain freedom in not worrying about how things would appear in memory and just describing the problem. I even write GUIs in Perl by playing with TCL/TK and I could solve any problem I wanted to. The fact that I wrote a Verilog Parser and initial state evaluation tool in Perl (i.e. I had to elaborate and simulate time slice 0) should be proof that I had a way to solve any problem

But then I started to hit some severe performance problems. As a hobby project I wrote a countdown solver - you know the numbers game on countdown where you are given 6 numbers and have to use them to calculate a 7th - well I wrote a solver for it by brute forcing every combination of numbers added subtracted multiplied and divided in every valid combination. The problem was the run time, to do 5 numbers took about 5 minutes. 6 Numbers wasn't even worth contemplating.
And then at that time I discovered Go. Re-wrote the program in go and it was solving 6 numbers in under a second. It could exhaustively do all combinations of 6 numbers in under a minute. I was very impressed but of course wanted to improve the speed still further. This was a trivially parallelisable problem, yet as far as I could see with a quick check of top it was only running on a single CPU. It appeared the limit was in the memory allocation/garbage collection. The nature of the algorithm was that I was acquiring lots of data structures from the heap and relying on the garbage collection to tidy up after me. I could have manually reference counted and therefore worked around this issue, but then the program would have lost most of its elegance and so I decided to leave the problem alone and call this a lesson learnt in the sort of trivially parallel problems Go could trivially help with.
(As an aside I do plan to go back to this at some point because it's an itchy feeling knowing that it could be improved)

This weekend's hobby project was to re-write one of my file management scripts. Originally written in perl it's quite simple; calculate a hash for each file and if two files have the same hash assume they are the same file and therefore make a decision as to which of them to delete based on their location in a file tree and delete the duplicate. It also does some automatic renaming if it finds cruft in the filename (depending on the file type). It would then also work with the backup tool (unison) to make sure the correct things were backed up.
This script was great it was the perfect problem for perl to solve except that on my main picture folder it could take the best part of a day to run. This was because it was calculating the MD5 hash of each file; so my first attempt to speed things up had it saving an xml file in each directory with the hash in there together with file size and inode number. That way it was easy to check if the file had changed and needed it re-calculating. This has so far proved a fine solution.
However there was still that nagging problem of 100% single CPU usage and here was me with the hammer that is golang parallelism wondering if I could get an easy factor of 8 increase (or more if golang did file IO better). For all I knew Perl's MD5 algo was laughably inefficient and I had lots of faith that Go's performance would be natively much better here.
I didn't appreciate how easy Perl made these kinds of things. After a weekend's work I have a program that does about 70% of what the perl script did (no backup integration, no auto-renaming) but is about 450 LOC to perl's 280. Part of this is probably me still becoming fluent with the language and trying to write better less hacky code the second time around. Part of it is certainly a function of the language. Because for example you have to declare everything properly in Go there is a lot of declarative code for data structures that just isn't needed in Perl. Regular Expressions are much more cumbersome to work with and for my purposes the xml handling is more bulky than the Perl equivalent.
Performance wise this is a lesson in correctly understanding the problem in the first place before you start writing code. The code is no faster in any measurable way. Let's make this very clear, I run a Windows desktop (8 core 3.5Ghz machine 16G ram, Gigabit ethernet etc) and a Raspberry Pi2 for Linux, which for all my love of 2836 is not the fastest data engine on the planet. I normally run all these things on Linux but I've started doing Go development work under Windows but then in production things always run on the Pi(easy to remote access and more power efficient). Go is very attractive here because it lets me be cross-platform with ease.
Now how on earth were they showing the same runtime? Go's Pprof on the problem was really enlightening 100% of the CPU time is spent waiting in the runtime for an external program. That is to say the MD5 and all the regex-ing was so fast it didn't even register. Slightly more digging showed this was the network file reading.
SHIT!
Going back the the Perl script it was definitely showing in top as using 100% of the CPU. (well reported as 25% but it's a quad core machine). So I wrote a test script that took a (large) file and read it in and wrote it straight to /dev/null. Also uses 100% of the CPU.

So after all it turned out I was not CPU limited by network limited. It's just that Perl was using 100% CPU regardless whereas Go was releaseing the CPU and ended up consuming about 7% for the same workload). Looking at the data transfer rates regardless of who reads it, it is my NAS box that was setting the limit. If I had just done the calculation at the start of this project at how long it theoretically took to read all that data I would have seen the 200Mbit/Sec that I have previously established as the upper limit for my NAS. But I neglected to do the research so spent bout 12 hours working on it.
Ah well you know what they say 6 months in the lab can save you 6 hours in the Library.

So to sum up, of the two CPU limited problems I've thrown at Go so far they have both come back as running in a none concurrent fashion (well using a single CPU thread) limited not by the CPU usage of the algorithm but by some other factor (GC or IO). There is no doubt that Go has produced a solution that is superior in terms of resource usage and maximising throughput, but it also took more work to get to the same eventual performance.
My next planned investigation is to leverage my Imaging knowledge and produce my own version of an ISP. I was thinking of using DCRAW as a starting point and porting it into golang and maybe into OpenCL depending on how it goes. If that doesn't give me a proper CPU bound problem to optimise then I give up!

Wednesday 19 September 2012

Facts and Opinions

I realised that a lot of my thinking and posts and justifications I have tried to write for this blog it comes down to trying to justify my opinions or point out where someone else have their facts wrong. Perhaps that is a bit harsh on myself, I've usually ended up trying to say something along the lines of 'be more open minded'. Then of course there's the great phrase 'if you open your mind too much your brain will fall out'. i.e. you can't be fully open to every idea and still make critical conclusions.
Anyway I'm getting sidetracked, the nature of Facts and Opinions as it appears to me:

No opinion is 100% defensible. For any opinion you invent can be found an example or an argument that shows the flaw. Pick any opinion whatsoever and you can find a flaw. Conversely pick any opinion whatsoever and you can find merit in that too. Let's try some obviously flawed statements and find the merit:
"Black is the only colour a person should wear." Well it would make shopping and getting ready in the morning MUCH easier.
"Since a mature rainforest is actually carbon neutral whereas farms are carbon negative, the question of deforestation is actually one of what we do with the trees we cut down". If you could make this work sustainably and bury all the wood for less than the carbon cost of transport you should be able to cut down on world hunger.
"The world was a better place for humans in the stone age." Clear gender roles, unambiguous deity requirements, ample scope for innovation, plenty of opportunity for free enterpise etc.
"The Amiga is a better computing platform than Windows 8". Faster boot time, lower hardware cost, quicker to learn, simple programming model etc.
From this I take that no matter how ludicrous anyone else's opinions may seem there is going to be something they can say to defend it. Likewise no matter what opinions I hold are going to be equally subject to criticism.

No fact (ignoring Mathematics but I'll come back to that) can be 100% true. This is pretty much a ground rule of science, something is only a valid theory if it is possible to be disproved. The mainstay example of this is evolution, which has been done to death elsewhere but all of science is (almost by definition) filled with examples of theories that could be wrong but all available evidence indicates perhaps they are on the right track.
As for Mathematics, we can be sure things are 100% true there because we make up all the rules, so for most of life that is not applicable (where we can make up some of the rules but not all of them).

Where does this leave us? In much of life, much conversation, debate and policy is determined based upon a mixture of fact and opinion. Laws and fortunes are decided on the outcome of which is better: the Iphone or Android phone of the day, Windows or Linux, Conservative or Labour, Oranges have a good or bad harvest this year etc. Deciding which is right between a number of competing options represents a good chunk of what humans do. Any of the above examples might be found to have an outcome (Iphone sells more) but is it really better? The answer is always going to be subjective and insubstantial.

So what is the point of all this? To try and change the way we look at debates and discussions and human interaction in general. If both parties go into a conversation not just with the acceptance that they themselves might be wrong (which would be a big improvement on any number of internet arguments) but with the assertion that they almost certainly are wrong from the other person's perspective and their role in the discussion is to try and find the pros and cons of not just both sides, but of every side.
So there's my opinion on opinion. I put it out there for you to critique and show me where I am wrong, (because by my own argument I am, so the question is: where?).

Friday 14 October 2011

A new online privacy model.

I was reading through an old Slashdot post of mine and it got me thinking.
Privacy in the online world is a tricky thing. Many companies exist to sell your online information. Facebook and Google being the two headline companies. But it got to me about how it is now almost mpossible to do anything online without losing your privacy. One reaction might just be to give up on those who do that. You get plenty of people who refuse to use facebook or google or whatever because of privacy concerns.

But y also can't use price comparison websites,(they sell your information as part of their service) Most large company's websites (amazon, screwfix, M&S, Tesco - sorry for the UK slant there). Basically anything you do at all on the web gives away your privacy.

That said, you can't go outside because people will take photos of you and with face recognition software that will soon be available you'll be tracked by these. By the way I'm saying soon as any point within your lifetime because once online photos stay online forever, so if the technology is developed by anyone in the next 10-20 years then you need to be concerned.
Face it, technology has meant the end of privacy as we have expected it in the past. Kind of like it has meant the end of the copyright/distribution as RIAA has known it. How we deal with this is the next question, but hiding under a rock is a very luddite reaction.

I'm not saying we should all give out credit card details out to anyone, and post photos of what I got up to with the wife last night on LinkedIn, but the world has changed and hiding from it won't help. We need a new model of privacy.

So what do I suggest? Well the first thing is we have to carry on with Zones of Trust. Google quite literally knows everything that I get up to. It has all my photos, all my calendar entries and all my personal documents and plans. While this is convenient it is also not really sensible. Even my fiancée does not have quite that level of access to my information. Likewise I have to remember that anything I do in a public place will become public record for eternity. Also realise that soon enough everything I have ever done in a public place will be similarly searchable and available.
Here's the turnaround though, it will be similarly available for everyone else too. Thinking of taking a new job well not only does your boss google stalk you, but you can do the same to him and all the other interview candidates. I can easily imagine my nephews growing up in a world where that information is commonplace and accepted. Likewise your browsing habits, all your browsing habits will be available. I imagine it will start off by someone launching a service where you can add to your facebook profile your browsing history. People already like websites or +1 them to show off stuff they find, so why not show everything? Someone will do it and some kids will start using it amongst their friends and it will spread.
All of this I see as a good thing but I don't think the older generation will get it, but I am starting to see people ever more so now having no expectation of privacy and this ending in a more open and honest society.

Thursday 1 September 2011

Washing machine patent

Random patent idea.
Rather than setting a timer on the washing machine to come on overnight have a sensor that from the colour and intensity of the light works out when nighttime it and runs then automatically.

Just a thought

Tuesday 16 August 2011

Programming Hardware vs Programming Software

I think after all these years of being a hardware engineer looking at the world of software engineering with a mixture of scorn, pity, bafflement and incomprehension I've put my finger on a fundamental truth.
First some background:
Hardware engineering has no problem with parallelism and concurrency. In fact all the languages used rely on this so that it can be synthesised into hardware with relative ease. However The development pace is regarded as relatively slow.
Software engineering is (mostly) built around the model of a von Neumann model of a processor randomly accessing memory. For better or for worse most languages are one way or another linked to C's assumptions and models. Now maybe it would be more correct to describe it as Turing architecture, but let's move on. These systems and ways of thought have huge problems with parallelism, but exhibit relatively rapid code development. The model of one thing must happen before another is one that is easy to deal with and therefore debug.
What strikes me is this falls down in complex systems where you actually are quite happy with things happening at the same time, but I haven't been able to put this into words. It became clear to me today trying to reverse engineer some C code that I didn't understand that I am going about this the wrong way.
I like a system to have clearly defined boundaries. A chip has input and output busses. You cannot from outside a chip package get inside and modify the internal registers without going through the interface, it just physically isn't possible. Well defined interfaces are not only desirable, they are all that is physically possible.
Now if you talk to a software engineer they will say they like well defined interfaces too. They like structured code and hate things that cause spaghetti. I realised today that if that were true then the world of software would be very different.
At the most simple level it is considered a good thing that I can in some code initialise a bunch of data structures and then call a do_frame() function that accesses unknown elements of a variety of objects and produces a result. Worse than that there are many situations where hidden away in hundreds of lines of code is a little reference to an object that pulls in the thing that does all the work. Throw in globals, pointers and complex objects and the aim seems to be not only to abstract out the complexity (which is good) but abstract out the functionality (which is bad). This is I think the important point that in most hardware I have seen because there is a data flow and interfaces are isolated even if you have abstracted the complexity you have to show the functionality. A hardware block cannot sneakily change the contents of your SDRAM unless you explicitly connect it up.

Now don't think that I'm trying to say hardware methodology is superior, also don't try and say that this is the difference between good code and bad code. Perhaps it is good language vs bad language, but whenever I complain about C's memory model and the dangers of how it handles structures and pointers I am told that that sort of functionality is possible in any useful language. Granted verilog will allow you to do cross hierarchy references of the type that will allow you to break hierarchy, but the syntax to do that is so obvious and rarely used that it stands out like a sore thumb when it is used and therefore isn't used except in very rare circumstances. Languages like C seem to hide this sort of trickery and therefore programmers embrace it, or at least excuse it's existance.

So I guess that has been my epiphany, that software is written to hide complexity in all its forms. Hardware has its complexity constrained by the physical need to have defined interfaces and so has complexity hidden by use of hierarchy.
As a side note I've long thought that each individual engineer will make things just as complicated as he is able to understand it. Even when they try and make things simpler they'll often do that to reflect what they think is complex potentially introducing what another person would see as extra complexity to achieve this. Further given those two forces a system will converge to be slightly more complex than the team of people working on it can comprehend. I take great pride in trying to make things simple, but realise that often I must spend so much time trying to break things down that I can't see the wood for the trees. Sometimes you read others code and it is complex, but succinct.
Well I guess if this was easy then anyone could do it.

Tuesday 1 March 2011

Novel thoughts

So I've been working on my novel again.
I think it's reached a point whereby I can keep fiddling with it, but it either needs a complete re-write, or I just need to publish it.
I think I'll do something I've been thinking of for a while and setup a new blog to publish it but do it one section at a time. Then what should I call the blog?

You see the problem is that I have written it for myself as something I think I'd like to read. Which means it's quite geeky, full of exposition and suffers poor character development. But then whenever I read it to try and improve it, I enjoy re-reading it, to the point that it's beginning to feel like masturbation.

So I just need to finish this part about how to commit crime in in crime free society and then I think I should do something about it. The working title is "Sonnets from a proton" if that gives you some idea of how embarrassing this could be...

A new form of government

After my last post I thought I'd post this.
A while ago I came up with an idea for the full devolution of democracy, using online voting to the n'th degree. Based as much as possible on various voting/communal online systems. I'm not saying it's without flaws, but I present it here for comment and improvement.

The basic idea goes that everyone has the vote on every single subject.
But i hear you cry, this would be crazy, for a start there'd be too many decisions every person to make.
Correct. That's why you can appoint proxies.
For example if the subject to be voted upon concerned agricultural policy, and your mate Hugh nicely summed up your views on the subject, or at least summed them up enough that you trusted his views then you'd let him vote on your behalf for all fishing related things.
Let's say you met this guy Jeremy down the pub and you liked his ideas on the transport system, you could delegate your transport related votes to him.

So a bill gets proposed in true online democratic fashion (how this happens I'll get back to) and people tag it with tags in order to categorise it according to the various categories. Agriculture, Transport, Education, etc. Obviously most things would have several tags and therefore there would be conflict. This would therefore need you to have an arbiter in case of conflict. Depending on how important the subject was (determined by the amount of acticity on that topic) you would then either delegate it again (perhaps to the equivalent of your MP) or make the resolution yourself.
Ah, but how would the bill get proposed. Simple people could vote bills up or down. If your delegate chose to vote a bill up or down that was within his delegation remit then he would vote it up with the force of all the votes he had control over.

Now there are a number of issues with this system, not least the common problem of people voting for more public spending and lower taxes. There'd be so many on the dole voting for more taxes on the rich that the economy would probably collapse. Given every bill would have an economic impact and assessing the impact of this would be none trivial then how do you even assess the cost of each proposal never mind balance the budget.
My idea, (although it too is flawed) the personal tax allocation. You buy into government services. For example you can vote up or vote down the spending on military, but the cost is shared equally between all taxpayers. You can vote for more or less money spent on roads or whatever but you must pay your appropriate share. Two problems with this, how do you decide what is fair and how do you prevent the race for the bottom.
Simple, when it comes to tax we formalise the agreement that already exists through lobbyists and bribes, he who pays the piper calls the tune - sort of: on a logarithmic scale the more tax you pay the more say you have in financial matters. That is when it comes to tax policy it costs you (at least) four times as much to buy twice the votes. If you as a rich man vote for yourself to pay less tax, you then have fewer votes with which to maintain your position.

As I say it's far from perfect, but it's certainly different from our current scheme.