sexta-feira, 3 de agosto de 2012

Event Sourcing the Census

[image from here]

On the past weeks I've been working o a NGO like project to gather specific social data. Much like an official government Census, but using social networks and of course less official. But one aspect of the hole thing kept me aware of the time dependency of the data.

An idea that seamed to me largely related to the problem was the solution proposed by Event Sourcing in software engineering, which I've discovered through a Greg Young lecture (topic also explained in a Martin Fowler post). But very briefly it states, don't record state, record events. Why? well, mostly because events keep much more information, which are destroyed when collapsed to the entity current state.

With this in mind, and mixing both ideas, an official census is always an state polling right? In a given period of time (normally years), census representatives poll citizens about their current status (living conditions, job, kids,...). But this is not an optimal solution in terms of the data, which is the role purpose of making a census, to gather data!

Wouldn't it be better to know when changes in the state happened? for example, instead of asking people every 5 years if their children already finished high school, it would be much better to know when the children graduated.

With this simple change we have a broader view in our hands, we can know, for instance, how many graduated at the expected age? or is there an year that the high school graduation is larger than expected? does it relate to some other events? Is there a pattern in the events measured?

My little project targets a very specific and small question, but to me is being an opportunity to validate the event aspect of citizens public data. I hope it works and possibly expand the idea to other important fields.

terça-feira, 4 de outubro de 2011

Differential Kanban - WIP Limits Analysis

Avoiding Discreteness



When looking at a Kanban board one good and closer to people experience analogy is to imagine balls in baskets. The balls of course are the tasks on the board and each different basket is a different stage of tasks towards its completion.

For those who like a bit of mathematics/algorithms will refer to this model as a discrete one. That means that the values that are addressed are NOT continuous, they vary in a step manner like integers for example. One problem about Discrete Math is that in general problems are harder to solve comparing to continuous ones. So a common strategy is to pretend that your problem is continuous, and that's the proposal of this post.

Model


A first step in removing the discreteness of the problem is to consider a fluid instead of balls (a second one would be to consider a river instead of buckets, but let's stick with the first one). In this post I will describe the behavior of a two buckets (A and B) system in which:

  • Both buckets have the same WIP Limit, meaning that they have the same capacity.
  • At any time the rate that A and B are filled cannot be positive if the WIP limit is reached. 
  • B Will only be filled if A has any fluid in it.
  • It takes a time for the fluid to leave the bucket, to induce the delay imposed by the time to complete a task.
I will omit the technical details to be more objective, but I will inform that this yields a nice and quite simple set of coupled delayed differential equations, which tend to be chaotic, but lets be unbiased.

Results 

The graph bellow is the path that system took. The x-axis  is the amount in A and the y-axis the amount in B. 
The system started with both empty and evolved following the curve until it stopped at a stable position:



The WIP limit used here was 2, and we can see that the system explored to limits of WIP by "walking" around. 
Now changing the WIP limit to 1.2 we observe a much interesting solution:


The slight extrapolation of the WIP id due to the continuous implementation and the parameters that I used.

Different initial values
From this graphs we find an evidence of what we call a stationary point, in which the systems tends to be if it starts there, and those points could be found analytically from the differential equations.

The video bellow is the animation of the results varying the WIP limit, the different lines correspond to different initial values:


On this video the WIP limits varies from 2 to 0.5. A direct result from this "experiment" is that if the WIP limit is less than the stationary point We get a more uniform and consequently more predictable behavior. which isn't directly obvious:



 
Conclusion


The objective of this post is to try to explore this kind of continuous approach, and to validate if it could be a good model to solve more complex situations like feedback loops, varying throughput, determine optimal WIP limits.  

sexta-feira, 22 de julho de 2011

TDD Language Proposal

Introduction

Yesterday  I saw the inspiring The Last Programming Language lecture of Uncle Bob and extracted the following concepts:

  • Languages are always restricting our way to write code.
  • The last big thing in programming was TDD.
Proposal

joining the two concepts, I wondered if we could have a programming language in which we restrict the programmer to do TDD. But how could this language be structured? how could we inject TDD on the language desing?

Doing TDD is doing tests firsts and write only the code necessary for it to pass. What if we could, on the same file, write a unit test and the line that makes it pass? Obviously we would gain a great amount of time from shifting from test file to production file as we do today, but what else?

The main concept of this proposal is to value the test and its connection to the production line of code, in such a way that we would loose the direct control of the final code, the language will compile it for us.

Ok, a bit crazy I know, but it is possible to have such a language? lets wander:

[METHOD] AddTest [TEST_STATEMENT], [PRODUCTION_LINE]

Example:

Buy AddTest "throws if cart is empty", 
             throw if cart.size == 0

ok, simple enough, but where does cart come from? well, it would be inferred by compiler and possibly added  as a parameter of the method Buy.You might ask what about a more complex example, in which different tests are testing the same line? or if we are testing things inside a loop?

This is where Homoiconicity comes along and instead of repeating lines you contextualize the line of code in order to know where it will be inserted on the final code, example:

Buy AddTest "throws if item is not in stock", 
            [Buy.Iteration.cart] throw if item.NotInStock

Writing this line makes the compiler add an "for" on the Buy method that iterates the cart parameter and add the line that checks if its in stock or not.

Conclusion

If we preach "we should only write production code if its to pass an unit test", why not make it a language restriction and say "The only way to write production code is writing its test", and then we would take TDD really seriously and forget frameworks, IDEs tools and metrics that we have build to accommodate a language dysfunction, its the same thing as implementing polymorphism on C.

terça-feira, 5 de julho de 2011

Last Social Responsible Moments

                                                      image source

Software Developers underestimate their profession and should occupy a more responsible role in society. After the Agile Brazil 2011 my mind got hit by several thoughts that leaded to one direction: What's the purpose that drive developers?

So I asked my self that question and expanded to all other activities that motivate me: Physics research, 3D modeling, game developing and Agile. One thing that they have in common is the ability to change things in a powerful manner. Physics do it through facts that change our mental reality, 3D modeling and games through operations that change a fictitious and malleable world and Agile through an effective organization of people.

After a while I concluded that they all where driven by a general frustration. Thinking way back, when I was around 12 years old, I remember that a had the desire to change the world. But as I was growing up this thoughts where demoted to a feeling of naivety, but in fact all I did since then was trying to comply with this primal desire - "Oh well, I can't change the world ... but I can refactor this code to two lines!" . Any developer out there feel the same way?

A common word that we hear in the agile world is local optimization, and I believe the majority of Software developers are doing so in their jobs (me included!). When I work for a company I am constrained to the company's objective, and all my ideas are narrowed down to the profitable ones. But think for a while in what's  your company doing to society? Extracting Oil? Entertaining? Selling cereal? If those are all good to society we are local optimizing the world's problems, think about the worse of them: poverty, education, health and so on. Do you believe that online entertainment is above them?

Making society happy doesn't necessarily mean that it's good to the society, see coca-cola for example. When living in a capitalistic world we can make things that will increase the overall happiness and profit at the same time, looks like a win win case, right? But we can, and did many times, create a society demand where it didn't exist, and creating a new market might not be a healthy solution to the worlds problems.

At the end we might be only giving panem at circenses to the world, developers and entrepreneurs should go deeper then any medium class shallow desires. I believe that society needs a red pill and it could be our job to build them.

Since our alternatives to the classical capitalism are far from reality I still have to make money and try to balance the equation and dedicate time to this, but its getting hard to don't feel guilty about doing nothing.


References:

Joshua Kerievsky - Prioritizing Happiness

Agile Brazil 2011 - Lean Startup Open Space

sexta-feira, 27 de maio de 2011

Inverse Test Coverage



When we work with automated tests in our project, a common and controversial metric is the code coverage. Code coverage is a metric that evaluates the percentage of the lines or the decisions graph that has been evaluated during the test execution.

I say controversial because its promiscuous use generally drops the tests quality, after all we are what we measure, right? But it is a useful measure, because although good coverage don't mean good tests, bad coverage mean bad tests.

I was thinking in a different approach to the subject, test coverage measure the number of lines affected by a test execution, what if we could invert this operation and measure the number of tests affected by a line of code? If we could do that, we could give a grater meaning to this measure, in a sense that it not only tests the quality of your production code but also the quality of your tests.

Why? because imagine if we have a line that when we remove it completely, not only the project builds but it passes all tests? this could lead to two conclusions. If the tests are really testing what matters this results will show that this line of code doesn't matter, it could be a leftover of a refactoring, or simply dirty code. But if the line is correct then the tests are not testing the operation of this line of code, showing that although the line is covered, the result of its existence is not tested, which is a lot more powerful that the simple coverage. I came across this idea because, sometimes when using TDD I tend to be a bit compulsive in doing exactly this, after making the test pass, I some times comment the line that caused the green light just to see that the test fails again.

Okay, but even if it is a good idea, it is a bit hard to implement this, right? if we have a project with let's say 10,000 logic lines of code, which may correspond to a medium sized project, and if our project builds a runs all tests in a good machine in 30 seconds (highly optimistic!) to remove each line, build and run would take more than 3 days! That's a lot! but lines selected to be changed could be weighted by its age in the project or some other criteria, or it could be a IDE integrated tool such as JUnit Max.

segunda-feira, 21 de março de 2011

Monte Carlo and the inversion of continuous improvement cycles

Newton's Method

An usual slogan that we hear in the agile community is continuous improvement. A cycle of of inspection of problems and adaptations to solve them, optimizing incrementally and in an restless manner. 

When I was introduced to the concept, @toledorodrigo used an analogy that I really like:

 
Imagine the problem of finding the roots of an continuous function. For simple equations its usually trivial to find them analytically, but as the function gets more and more complex it becomes harder and some times impossible to get an analytically precise solution. 

But here is where the Newton method comes along with the iterative method described below:
  1. choose an starting X value.
  2. trace the tangent of the function.
  3. Find the root of that tangent.
  4. Use the new X value and repeat step 2.
The similarity with continuous improvement became clear when we point out the step of tracing the tangent as an inspection phase, where we guess in which direction we should search for the solution, and the discovery of the new candidate as the adaptation phase.

Monte-Carlo Method

This analogy got me wondering about another iterative method which is also used when the analytical solution is to complex to get, and that is the Monte Carlo method

In this method we are also using a lot of guessing, in fact a lot more guessing, but the first fundamental difference is that it is not governed by a local property such as the derivative at the current candidate. Normally we randomize in a global manner and look for global properties. 

I got in touch with the Monte Carlo method while in graduation where a college of mine used it in a complex problem. He had N particles interacting with each other, which gave an O(N^2) complexity in each frame, and that is a lot of work! The Monte Carlo Method came in where he (and many others) used small random perturbations to the system (adaptation) and measured if a global property of it was better or worse and accepting the change if it passed that test (inspection). In this case the test was the total energy of the system which had to became smaller and smaller.

The amount of change that was made in each iteration was proportional to the given temperature of the system. And a common method used in those kinds of simulations or even in experimental cases was raising and lowering the temperature in cycles, which helps the system to relax and don't stop in a local minimum, and get to the "optimal" state.

The beauty of the algorithm is that the randomness makes the step by step variations seem counter intuitive, things don't move the way they seem the should be moving, giving our mental biased perspective. 

Also notice a critical difference about the presented Monte Carlo method. The phases of inspection and adaptation are swapped, we first adapt to only then inspect, and only if the changes are any good to the global system they are accepted. Is this a silly algorithm? Maybe, but it's also similar to a very popular algorithm that is running for millions of years in a sustainable manner, which is in the core Darwin's theory, the survival of the fittest. Apparently nature likes to variate and inspect afterwards, killing the genes that don't improve the global property of staying alive. 

Conclusion

Although I only have a bit more then 3 years of experience working with teams, I have noticed that many of important changes came almost by chance, of course that well thought and planed improvements occur, that's us humans doing our part. But it seems this is another subject in agile that needs to be balanced, and the natural part of mutate our genes are neglected. To me the Agile Manifesto phrase "Responding to change over following a plan" is taking a hole new perspective, and not only responding we should also create disturbances so that we don't get stuck in following a plan.

segunda-feira, 3 de janeiro de 2011

Decision delaying and quantum computing

beautiful Schrödinger equation
The cat

A common and didactic story is told when someone is learning quantum mechanics - the Schrödinger's cat. It says that the nature of things is so that if a given entity has different possible states then combinations of them are also possible states. To illustrate this sentence we say that an radioactive atom, which is tested and accepted to be governed by the laws of quantum mechanics, is used as a trigger to release poison into a closed box when it changes its state. We also put into that box a cat that was certainly alive. The interesting part is that, if an atom could be in both states at the same time, and we assume that the cat could be in two states - dead or alive, then the cat could also be in both states at the same time.


The Truth 


This was really amusing for me, and helped a lot when I decided to go to graduate on Physics. The mind twisting idea is that is not that we are ignorant and don't now if he is dead or not, in fact it is pretty defined. It's part of many cultures the idea that we must have a "better answer" because we want the facts, we want the truth. But as Colonel Jessep stated, we can't handle it, our ego expect a behavior that nature will slap us in the face, but it will give us what we expect from it, because we don't see dead-alive cat states around that much.


Qubits 


What happens is that when we open the box we make a measurement, and we say that the state collapses and we make nature decided which one is it. Note that we destroyed information, and not discovered it. Which is a bit counter intuitive, but that principle is so powerful that in fact that it is in the core of quantum computing, by not making nature to chose and to truncate information into a human readable state, we use the exponential amount of possibilities to solve problems.


So What?


The thing that made me write this post was related to agile methods, and as the tittle says is related with delaying decisions, which is the practice of making decisions until the last responsible moment, to only deciding things when we really need that decision. A cute analogy is obvious, but what really amused me is that commonly we encourage our selfs to do it but we don't really accept a Schrödinger's cat when we see one, we keep making measurements and forcing a "valid" idea. If our brain is governed by the same laws, I wonder if the creative thinking part of our job is our quantum computer that is constantly being sabotaged by our intrusive and dogmatic ego that keeps opening the dam box.