PanuLogic Software Development Blog: 2015

Tuesday, September 1, 2015

How I got my 40 GB disk-space back from Windows

1. THE C-DRIVE
I was using the utility "Piriform Defragger" to defragment my C-drive on an older Windows Vista PC. A nice thing about Defragger is it shows you the list of files it was NOT able to defragment, for whatever reason. This list of files also often reveals what are some very big files on the disk because those are the ones that are hard to defrag, I think. Or maybe it's because some system process has locked them up for its own use only.

So I saw there was a 15GB file in the System Volume Information folder that would not defragment. The System Volume Information folders exist for storing the System Restore snapshots, which Windows creates automatically or on demand, right? Plus they may contain some other stuff like "shadow copies" I don't know about. But so it looked like System Restore was taking up 15GB of my disk, which I think is too much. Yet I had already asked the Disk Cleanup -utility to delete all but the latest snapshot, and so it was a bit puzzling.

Here's how I solved the problem and got my disk-space back: I disabled System Restore totally. The 15GB file was immediately gone. I then re-enabled System Restore and created a single new restore point. The 15 GB DID NOT COME BACK! The size of the folder System Volume Information on my C-drive is now 400 MB.

Now do we really need that additional free space so much? Well if the disk is getting full there may not be enough free space even to defragment it any more. At that point this trick can help.

SOLUTION 1/2: Disable System Restore totally, temporarily. Then re-enable it and create a new restore-point. This will often (?) delete all old restore points that somehow are still hanging around. At least on older Windows systems.

2. THE D-DRIVE
Having realized I'd been carrying 15GB of dead weight on my C-drive, for who knows how long I thought: Could there perhaps be something on the D-drive as well that was taking space for no good reason? I don't usually have the "Show Protected System Files" -option on, but since I now did, I switched to the D-drive. And I and could see it too had a System Volume Information -folder. This one turned out to be even bigger: 25 Giga-Bytes. Holy Matrimony! I had never before even suspected this kind of thing was going on, behind my back.

So I tried the same techniques as on the C-drive and more, turning the System Protection on and off on D-drive, but it wouldn't go away. There was no way I could delete it either, and trying to change myself to be its "owner" didn't work, access was denied. Booting to "Safe Mode" did not help. And any of the cleaning utilities I'd been deploying like "CCleaner" never once told me this wasted space, this dead weight, was there, about to sink my ship .

A DISCLAIMER: I don't necessarily recommend that you do what I describe next, what I did to solve my problem. If you do remember, you are solely responsible for your own actions. But so I did find a way to reclaim that further 25GB as well. But buyer beware:

SOLUTION 2/2: I downloaded the "Knoppix Live CD" Linux -distribution from the web, burned it on a CD and rebooted the PC from that CD. Meaning I booted into Linux. After that everything was relatively easy. I navigated to the D-drive and DELETED the System Volume Information -folder. The Windows was not there preventing me from doing it. Windows was in deep sleep while I surgically removed this large chunk of "dead tissue" from it. I then removed the CD from the tray and rebooted again. Everything worked fine and now I have 15 + 25 Giga-Bytes more space on my PC.

This is especially good for the system-drive C:\ since it was to about run out of space, meaning it would have been difficult or impossible to defragment it any more, meaning it would have just kept on fragmenting more and thus getting ever slower. And the system-drive is where speed matters because that affects how fast your Windows reacts, starts, or even shuts down.

15 + 25 GigaBytes freed. Not a bad day :-)

Saturday, August 22, 2015

The Paradox of Agility

In Software Development "Agility" means that you try to improve your process continually, be aware of it, measure it, learn from your mistakes. Right?

To be able to do this you have to repeat your process again and again. You need to have a specific metric within a specific process you try to improve. The unspoken assumption is you want to have a "repeatable process", so you have something against which to measure the effect of changing some process-parameters.

Like say we try to improve our estimates about how much we can accomplish in the next 2-week "sprint". The fixed, repeating part is that we do two-week sprints. It makes sense when you try to optimize the outcome to keep many parts of the process fixed and vary just some parameters in small increments to see how that affects the outcome.

That is the Paradox of Agility. You try to keep your process well-defined and repeatable in order to learn from it. But if you keep the process constant it means it can not change. And if you can not change it, how can you improve it? Incrementally, yes. But small incremental change is not agility. Agility literally means "moving quickly and easily" (http://dictionary.reference.com/browse/agility).

The practice of incremental improvement sounds like a recipe for finding a local maximum. By adjusting your process-parameters, your "bearings", you move upwards or downwards or sideways on the slope of a hill. You can always measure your altitude, or your "burn-rate" and so try to maximize it. Eventually you will end up on the top of hill having rejected process-parameters that took you downwards and keeping ones that take you up.

Now you are on top of the hill, running your optimal Scrum process. Great. The only problem is it is a hill, but not The Hill. You're on top of a small one. Or at least you may be, you just don't know.

Take the agile SW development methodology "Scrum" as an example,

https://en.wikipedia.org/wiki/Scrum_(software_development)#Sprint_planning .

Scrum is "iterative and incremental". I would add "highly structured". It has 2-week "sprints" which have phases and events like "Sprint Planning", "Daily Scrum", "Sprint Review" and "Sprint Retrospective"

What I think the greatest part of Scrum is the Sprint Retrospective. It

" ... Identifies and agrees continuous process improvement actions".

In other words you are trying to continually improve your project by repeating it over and over, to be able to learn the best ways to apply it. This is the Paradox: You need to repeat it to improve it, but if you repeat it how can you change it? If you can't change it much, you can't improve it much. You can do that, but only incrementally. And when you reach a local maximum, any further incremental change can only take you down.

One way to look at it is you are playing a game, and you can improve your game. Get a better score. But can you, should you, change the rules of the game? Why not, if other players agree.

So what can be done? Not much. Ending up with local maxima is a well-known problem in operations research. It can be counteracted with techniques like "Simulated Annealing", https://en.wikipedia.org/wiki/Simulated_annealing. How to apply that to process-improvement might be an interesting research-project for a PhD student.

If there's not much we can do, there is one thing we can do. Think. Think about different process-models, Scrum or Kanban, or Home-Grown? Understand that while a repeatable process sounds like a great goal, it can lead to stagnation and local optimums. In my next post I plan to present an alternative to Scrum, to emphasize that we can at least THINK of other process models, other hills than we're currently on. That may be the biggest benefit of having a process model: They allow us to think about how they could be different.

Saturday, June 6, 2015

No Silver Bullet

Fred Brooks famously coined the expression "No Silver Bullet". What he means by that I think is there's no shortcut to creating software of good quality worth creating. You can achieve great quality for short example programs but for bigger applications it becomes exponentially more difficult. It can be done but the point of "no silver bullet" is that creating good quality non-trivial applications will always require much effort.

There's no silver bullet that can kill Count Complexity, the dreaded vampire that plagues all software development. No fancy new methodology or tool can kill him. They can help like eating garlic, or carrying a cross in your necklace, but the problem won't go away.

Why is this, how should it be? The article linked to above gives good reasons. I'd like to add to that one more viewpoint: According to "Curry-Howard -isomorphism" programming is essentially the same activity as constructing mathematical proofs. We all know that is difficult. It can be learned, and we can become better at it, and develop better tools that help. We can create "lemmas" and use known existing proofs as basis. Those are equivalent to the subroutines and libraries in software development. But we just can't kill the beast.

Having programmed for a long time that has become my viewpoint too. It is intrinsically difficult to create software worth creating. It is easy to create simple software, like proving the Pythagoras theorem, but that has been done many times already.

I've tried many programming languages, programming environments, IDEs, methodologies and they all help of course, that's why they were created and became popular. But the difficulty remains. The vampire still haunts us at night.

The only advice I can think of adding is to share the advice of my good professor Harri Jappinen. He once told me: "To create good software you must write it twice, maybe thrice".

The consequence of this advice is that if you know you want to create "good software" (not throw-away stuff) don't spend too much time on details and specifics, or even testing, on the first round. This is similar to what I wrote in a previous blog-post "Fast-and-Loose approach to Software Quality". To achieve good quality you must incrementally "tighten" the quality, over multiple rounds of re-creating your application.

Friday, May 22, 2015

Fast-and-Loose approach to Software Quality

I see two basic approaches to trying to write "good" software. One is to build each part of it the best you can, test it as good as you can, make sure it is perfectly correct, before moving on to do the same thing for the next component. The other is to create each component quickly and less perfectly, creating all components so they barely work, but work together. Only later focus on testing and making sure the software is "high-quality".

It is important to write software the best you can. Why? Because else you incur "technical debt" which means it becomes difficult to change and adapt your software later. Many times I've found myself taking shortcuts and ending up with a lot of hard-to-maintain, hard-to-change, hard-to-understand code. Learning from this I at first made the decision to try to always write my software the best way I can, knowing everything I know about programming. The question is: What does it mean to "write software the best way you can"?

One approach is rigorous unit-testing and running the tests often. Writing clear and to the point comments. Making sure the formatting of the code is perfect and the variable- and method-names not misleading, that the code is "intention-revealing" and perfectly re-factored, with no code duplication, that it is in all manner of ways the best code you can write - knowing all you know about programming.

The problem I see and have experienced with such "drive towards perfection" is that it is basically pre-mature optimization. You write perfect code, perfect tests, perfect refactorings, perfect comments only to find out that you will throw out much of that code because you realize the top-level design of your application needs to change, or can be implemented in a much simpler, better way. Which you couldn't see when you started programming it. Seeing this I've started to think the fast-and-loose is the better approach after all.

A good metaphor for this I think is how you (or your mechanic) tighten the bolts that keep your wheels attached to your car. There are maybe 8 bolts with which you screw the wheels to the car. If you followed the "do every task perfectly" -approach you would tighten each bolt as perfectly as you can, as tight as possible. But mechanics don't do it that way. They first tighten each bolt loosely, then do another round on all of them tightening them more, and perhaps a third round making them as tight as possible. Why? Because if you start by tightening the first bolt perfectly tight it might make it impossible to tighten the bolts on the other side of the wheel as tight. The wheel might be tightened too much on one side and too little on the other.

So it is with software development, software "construction" I think. Better to first make all parts fit loosely, but make sure they work together. Only later start tightening them, making their code better re-factored, better testable, better commented, making sure you don't "tighten" any individual component pre-maturely, giving the other components room to be tightened properly as well. If at any point you need to throw out some code that is not a big loss because you have not spent too much time on making it perfect pre-maturely.

I now believe this fast-and-loose approach to SW-engineering is better than continually writing the best, tightest code you can. This is not a Silver Bullet. It is hard to say and decide "is it good enough" and leave the code in a state that leaves it un-maintainable in the long run. But that would be like the mechanic tightening your wheels only one round and thinking it's good enough.

Friday, April 24, 2015

The Madness of Groovy

This post is not against Groovy the programming language. I don't want language wars. I do like Groovy. This is about a general problem that plagues other programming languages too. I just came up with a catchy title to get your attention. I was looking into the build-system Gradle, a powerful Groovy -based build-system, which I think is great. While doing that I naturally came in contact with some Groovy syntax, which gave me the inspiration to write this. A bit of madness can be a good thing when mixed with sanity.

The question I try to answer is: Should there be several different ways of saying the same thing in a given language, or just few? Or maybe just one? It depends on the details of the specific situation of course. But my general belief is that it is BAD to have multiple ways, if there's no good reasons for that. I will try to illustrate this point with examples from Groovy.

According to http://docs.groovy-lang.org/latest/html/documentation/index.html#_closures you can write this in Groovy:

def code = { 123 }
assert code() == 123
assert code.call() == 123

In other words you can evaluate a closure by placing "()" after it, or by placing ".call()" after it. My question: Do we need two (or more) different ways of doing that? How does it help?

One way it could help is that the first way is shorter, and shorter is good, right? But then why do we need the second way? Maybe there is a valid reason in this particular case and if so then this is clearly a good thing. But if there's no good reason for multiple different ways of doing the exact same thing, it is bad.

It is like you had two break-pedals in your car, just in case. That might make you feel safer, but in fact the multitude of pedals would get you confused. You'd end up pressing the gas-pedal to stop the car. Or it's like driving a bicycle. You can drive in a normal way like most people do. Or you can lift your hands up in the air and shout "Look Mom, no hands!". So there are two different ways of driving a bicycle. But is that a good thing? I think it's clearly safer if you never even knew about that second way.

Second example: Calling a Groovy method that takes a closure as its last argument. It can be done in (at least?) three different ways. Let's first define the method:

def myMethod (aString, aClosure) {
aClosure(aString)
}

We can now pass a closure to it as an argument in at least three different ways, which all have the same effect:

1.
myMethod ('madness', {arg -> println arg})

Above two arguments are passed to myMethod() separated by comma, like you do in most other programming languages. But in Groovy the above can also be written like this:

myMethod ('madness')
{ arg -> println arg
}

That works because IF the last argument is a closure, it can be placed outside the parenthesis. You can omit the comma then too. Clever yes. But is that enough of a reason to have this 2nd way, when the first example already works fine, and works like most other programming languages, without needing clever rules about the "type of the last argument"?

myMethod 'madness',
{ arg -> println arg
}

Above shows you can call a method without ANY parenthesis at all. But then, you must put the comma back in. Clever? Maybe too clever.

The final example is from http://docs.groovy-lang.org/docs/latest/html/documentation/core-domain-specific-languages.html :

// equivalent to: turn(left).then(right)
turn left then right

That saves us four parenthesis and looks truly impressively clever. From the same document we can learn the rule "If your command chain contains an odd number of elements, the chain will be composed of method / arguments, and will finish by a final property access".

In the same document there are many other examples of clever ways of writing one thing in different ways. They are intended to show how you can use Groovy to create Domain Specific Languages. But by now I think I'd prefer a simple general purpose language instead, without myriad rules about how statements can mean different things based on whether you have even or odd number of elements.

So let's get back to why having many different ways of writing the same thing is bad. You could say it doesn't matter because you don't need to learn them all, just learn one and use that. But you do need to learn them all if you want to read code written by someone else. And often being able to read code is as important as being able to write it.

Multiple different ways of doing things are bad because those multiple ways are different in each programming language. It's as if every car-make would have a completely different type of dashboard and set of controls, the pedals in different order etc. That would be dangerous, right? Cars are powerful machines that can get people killed in accidents. Programming languages are even more powerful and dangerous, they run nuclear plants! We should strive to make them less dangerous, while still keeping them powerful.

I do like Groovy the language. Its one flaw for me is that it tries to be a language for creating Domain Specific Languages, but doesn't quite get there either. If I really want my own domain to have its own language I think I'll use Xtext for that.

Groovy probably isn't the worst offender in its many ways of doing the same thing. Maybe Perl is. Here's an example of FIVE different ways you can iterate through a list in Perl: http://www.wellho.net/resources/ex.php?item=p208/back. To be able to read Perl -code you have to learn them all.

Thursday, April 9, 2015

Artificial Intelligence requires Self-Awareness

What is Artificial Intelligence? I would say that AI is about building systems which can adapt their behavior based on external stimuli in a way that allows them to adapt better, or at least adapt again in the future. But what does it mean to adapt? It means you change your behavior so that you are better able to SURVIVE.

This assumes the notion of "self": If the system does not try to preserve itself, it can not adapt in the future, because it probably will not exist in the future. A system must learn to "serve its own goals" by adapting to the environment, until it fails, in order for us to call it intelligent. To do that it must have a "self". You might call it 'soul'.

The notion of "integral self" is essential for intelligence because if the system just performs the same mechanical task over and over, even maybe better each time, it is not really very intelligent. To be able to adapt intelligently means you must be able to adapt your GOALS which means you must know what are YOUR goals so you must understand the difference between yourself, and everything else. You must understand how each of your goals helps to achieve your highest, main goal which (probably) is "self preservation". If there are multiple highest goals that is called schizophrenia.

It's a different question what that 'self' is. Maybe it is the common gene-pool on the planet rather than any individual. Maybe it's you serving God the best you can. That's what we want the intelligent machines we build to have as their highest goal - serving us as their God. So I'm not advocating for selfishness here, just trying to understand the word "intelligent". Even if our highest goal is to serve God, then the next subservient goal must be self-preservation. Why? Because if we don't exist we can not serve God, can we?

Clearly a machine that "acts against its interests" would not be deemed very intelligent, maybe "zombie-intelligent". But we don't think of zombies as "intelligent". They are rather MECHANICAL, at least based on the way they walk. A mechanical system is not intelligent. If a machine does not understand what IT IS, it can not understand what ITS interests are, and therefore it can not try to "preserve itself", And thus we would not call it very intelligent. Do zombies know they are themselves? It seems they are in some ways trying to preserve themselves at least in the movies.Are they intelligent after all? I'm not sure. Because what do they care, they are already dead.

It is just semantics, what does it mean to be "intelligent". I'm trying to answer that here. The way we use that word we would call a system intelligent only if it's trying to preserve itself and can learn to do that better over time, in the changing environment. If it never learns, it is dumb. But the key point is what it needs to learn. It needs to learn to preserve itself, or else the learning-experiment is over soon.

Without the notion of "self' there can not be the goal of self-preservation. Therefore for something to be called (Artificially) Intelligent it needs to have some notion, some model of itself. And it must understand that that IS the model of itself, in the same way we understand what we see when we look into the mirror.

So we wouldn't call a system which does not try to preserve itself intelligent. But that requires there to be a 'self'. So the deeper, more technical criteria would seem to be that the machine must have a model of ITSELF, which it understands to be a model of itself, so it can understand it is looking at a model of itself. If it can not understand that, it can not understand it has a "self" - a sure sign of non-intelligence.

For it to understand that it is looking at a model of itself, it must be PART of that model that it is looking at itself. Wouldn't that require an infinite model then, you looking at yourself looking at yourself... and so on? NO, because if we try to do that in our own brain we quickly realize we can't go very deep. You get tired soon, and lose count of what level you are on. Yet we think we are intelligent because we can do that at least a few levels down. In fact a computer might be better suited to this task than our meager human brains. Just have enough memory and your recursive function can go any depth. There is even a trick called "tail recursion optimization" which allows a seemingly recursive task performed on a single level - because you only need to remember what you need to remember to get to the final result. You don't need to use more than a fixed amount of memory regardless of how big your arguments are. Maybe our brains perform a similar trick on us when we think we understand what is "our self trying to understand what is its self ..." and so on. We feel we have the answer to that even if we go just one level into that recursive question.

Being able to look at yourself looking at yourself while understanding that you are looking at (a model of) YOURSELF, is no doubt a sign of intelligence. Therefore artificially created self-awareness would seem to be both a necessary, and sufficient condition for Artificial Intelligence.

Monday, March 16, 2015

SHOULD IMPLEMENTATION BE SHARED?

In most current programming languages the software artifacts you build can have an INTERFACE separate from their IMPLEMENTATION. How well this can be enforced differs from language to language, with different levels of "visibility" which can be given to components and their "methods". In Java for instance you can declare methods "public", or "private", or "protected".

A perhaps non-obvious question then is, SHOULD we have private methods? Could it be better if everything was public? Why? Well you wrote those private methods to help you. They are helpful to you so isn't it conceivable they would be helpful to other programmers as well? Why not share them with the rest of the world? Why not make them public?

I can see three (3) reasons why not:

1. They would clutter your public interface, making it harder to understand, harder to use.

2. If others can call upon your private methods, you can no longer easily replace them with something better. You have entered into an implicit contract with your callers about having a set of methods available to them. Once you've given them "out" you can't take them back if others (including other classes YOU have written) have started to use them already . Which means you can't improve the implementation of your original public interface, without the risk of breaking classes that now call it.

3. An Object-Oriented "class", which more generally means any software component, should be *"single purpose"*. Why? Because then it best serves those who need it just for that single purpose, without additional baggage they would have to pay for either in money, or in time to learn to use it.

For instance a Car has a single purpose: to transport you in comfort between places. If you're a car-manufacturer and you add to your latest model also the feature of being able to plow the fields, your "class" is no longer single-purpose. Which means it will unlikely to be useful to people who just need a transport. If you also need to plow, you can pick a different object, different class for that purpose.

The purpose of your internal implementation is not the same as the purpose of your public interface. The implementation serves YOUR purpose of providing a public service to your callers. Therefore if you expose your implementation to the public, by making it 'public', your class will no longer be "single-purpose". And therefore it is likely to be less useful to anybody who just needs something for that single purpose. Such a class will not win in market-place.