The story of yesterday's weather

This story has to do with my view on what innovation is and how it happens. It also has to do with getting old and having memories to share.

Back in 2002, when I embarked on the PhD journey, I was exposed to a couple of topics. The one that captured my imagination was software evolution analysis. I did not know anything about it, but analyzing time sounded highly philosophical. And it was.

As expected when jumping in a new field, I started by reading papers. After a short while I set myself to implement one of the analyses I encountered. It was about determining the trend of a metric over time. I started typing, but I quickly realized that I was missing a concept to attach time related behavior to. I then went back to searching for papers that would describe what to do, but I could not really find any. So, I built it myself and I called it the history. It turned out that introducing this simple concept many analyses became easier to write, and the idea of modeling history as a first entity became the very center of my thesis.

During the same time, I stumbled across a paper that claimed that one heuristic to follow when understanding a software system is to start from the places that were modified last, the assumption being that these parts will be among those that will need changing next. This bugged me for a while. Somehow I felt like not quite agreeing (there is a story behind how I come to realize that I did not agree with the statement, but we leave it for another time). I was a novice and I could afford to be ignorant.

Thus, I proceeded with implementing an algorithm that would detect the probability of this heuristic. I named it Yesterday’s Weather. I managed to dug up one of the cleaned up versions of my original implementation. I listed it below.

Please, do not read it in details. Just look at it, and continue reading what comes. I promise there is a point.

yesterdayWeatherProbabilityWithTopPreviousWENM: topPreviousWENM
andTopCurrentENM: topCurrentENM
| currentVersions previousClassHistoriesSortedByWENM
yesterdayWeatherHits last2VersionsTopHistories last2Versions
last2HistoriesSortedByENM x valuesCount
previousVersionsTopHistories previousVersionsTopHistoriesNames
over |
currentVersions := OrderedCollection new.
currentVersions addLast: (self allVersionNames at: 1).

yesterdayWeatherHits := 0.

(2 to: self allVersionNames size) do: [: i |
self smelly: 'this algorithm is too big and complex'.

previousClassHistoriesSortedByWENM := (self classHistories
selectFromReferenceVersionCollection: currentVersions)
sortBy: [:a :b | a value getWENM >= b value getWENM].
currentVersions addLast: (self allVersionNames at: i).

previousVersionsTopHistories := OrderedCollection new.

x := previousClassHistoriesSortedByWENM first value getWENM.
valuesCount := 0.

previousClassHistoriesSortedByWENM do: [ :each |
(each value getWENM ~= x) ifTrue: [
valuesCount := valuesCount + 1. x:= each value getWENM].
(valuesCount < topPreviousWENM) ifTrue: [
previousVersionsTopHistories addLast: each]

last2VersionsTopHistories := OrderedCollection new.

last2Versions := OrderedCollection new.
last2Versions addLast: (self allVersionNames at: (i-1)).
last2Versions addLast: (self allVersionNames at: i).
last2HistoriesSortedByENM := (self classHistories
selectFromReferenceVersionCollection: last2Versions)
sortBy: [:a :b | a value getWENM >= b value getWENM].

x := last2HistoriesSortedByENM first value getENM.
valuesCount := 0.

last2HistoriesSortedByENM do: [ :each |
(each value getENM ~= x) ifTrue: [
valuesCount := valuesCount + 1. x:= each value getENM].
(valuesCount < topCurrentENM) ifTrue: [
last2VersionsTopHistories addLast: each]

previousVersionsTopHistoriesNames :=
previousVersionsTopHistories collect: [ :each |
each value name].
over := false.

last2VersionsTopHistories do: [:each |
((previousVersionsTopHistoriesNames includes:
(each value name)) and: [over not]) ifTrue: [
yesterdayWeatherHits := yesterdayWeatherHits + 1.
over := true].

^yesterdayWeatherHits/(self size - 1) asFloat.

I applied this algorithm on several systems, and it turned out I was right to doubt the heuristic. It only made sense in some situations. I was thrilled.

Then multiple things happened in a short amount of time.

First, I was highly annoyed by the look of the code. Look at it. Even if it is written in Smalltalk, and Smalltalk is a beautiful language, the code looks terrible. I tried to refactor it, but somehow it did not get better.

Second, I set myself to write a paper about it, when my professors came to me and said: you have to formalize it. "What do you mean?" I asked. "This is an algorithm, so you should be able to explain it to us mathematically," they said. I thought this was unnecessary, in particular given that I was always afraid of strange notations. After all, the code is a formalization in itself, and it does what it is supposed to do.

Finally, I had to give an internal presentation about it. I tried to explain what the algorithm does, but nobody seemed to quite get it. This is what bugged me the most. If they do not get it, why should my other peers?

So, I went back to the drawing board. Not to the code editor. To the drawing board. My target was to try to figure out a way to explain this piece to an outsider by using metaphors and visuals.

And so I did. Yesterday’s Weather boils down to picking a point in the past, and comparing what happened before that point with what happened after that point. More precisely, the original heuristic being that the parts that changed recently are relevant for the near future, we need to check whether those parts that were changed recently also got changed soon after the chosen point in time.

Graphically, it looks like this.


If we step back, what I needed was a simple intersection of two sets: the changed parts from the past and those from the future. So, in the end, the pseudocode looked like this:

 past:=all.topChanged(beginning, present)  
 future:=all.topChanged(present, end)

That’s it. If you apply this algorithm for each version and average the result you get a percentage that reflects the relevance of using the Yesterday’s Weather as a heuristic to predict future changes. You can find more details about the heuristic in the paper. And, yes, it turned out that I was right to doubt the heuristic because it does not necessarily apply to all systems.

Getting back to our story, armed with my new idea, I rewrote my code. It looked like this:

 yWFor: yesterdayCheck for: tomorrowCheck
^ ( 3 to: self versions size ) collect: [ :i |
| yesterday tomorrow |
yesterday := self
selectByExpression: yesterdayCheck
appliedFromVersionIndex: 1
toVersionIndexAndPresentInIt: i - 1.
tomorrow := self
selectByExpression: tomorrowCheck
appliedFromVersionIndexAndPresentInIt: i - 1
toVersionIndex: self versions size.
yesterday intersectWith: tomorrow ]
 yWFor: yesterdayCheck for: tomorrowCheck
| dailyYW hits |
dailyYW := self detailedYWFor: yesterdayCheck for: tomorrowCheck.
hits := dailyYW sum: [ :each |
each isEmpty
ifTrue: [0]
ifFalse: [1]].
^ hits / (self versions size - 2)

It also turns out that expressing this using a mathematical notation is straightforward, too.


In retrospect, it all seems straightforward. However, it was not as easy as it appears. To get this simple code to work, I had to extend the model with a couple of basic concepts. In the end, this proved to have a conceptual impact as it made the model highly generic. If you are interested in more details, you can read my PhD work.

As exciting as the technical details might be, the essence of the story is not of technical nature. The content was there all along. I could get the computer to compute what I wanted, although in a convoluted way. The main problem was that I could not explain it to other people. It was because of this that I could find what I did not understand. That is right. My understanding was incomplete and I could not explain my intuition. I had to go and rethink what I was doing. Once my ideas got clearer, the explanation got cleaner. In the end, I could explain Yesterday’s Weather in less than 5 minutes. 5 minutes.

My lesson:

Form without content is worthless. Content without form is pointless.

Why pointless? Because without form nobody understands the content, and thus, the package as a whole has no value.

Form is important, but it is often addressed at the end of the process. That’s too late because you will not get it right the first time. You need to test and iterate. And, here is the magic: To test your idea you need the form, and when the form gets clean, the content becomes simple. The act of creation must address both content and form at the same time. This, to me, is at the core of the innovation process.

Posted by Tudor Girba at 9 June 2012, 10:11 pm with tags representation, presentation, research comment link

Holistic software assessment

We need a radical change in the way we approach software assessment both in practice and in research.

Assessment is what we need to do before taking a decision. Assessment is a critical software engineering activity, often accounting to as much as 50% of the overall development effort. However, in practice this activity is regarded as secondary and it is dealt with in an ad-hoc way. This does not service. We should recognize it explicitly and approach it holistically as a discipline.

Why as a holistic discipline?

Because software evolution is a multidimensional phenomenon that exhibits itself in multiple forms and at multiple levels of abstraction. For example, software evolution spans over multiple topics such as modeling, data mining, visualization, human-computer interaction, or even language design. There exists an extensive body of research in each of these areas, but these approaches are mostly disparate and thus have little overall impact. We need a new kind of research effort that deals with their integration.

Ultimately, assessment is a human activity that concerns taking decisions in specific situations. Thus, to be effective, assessment must go beyond general technicalities and deal with those specific situations. For example, instead of having a predefined generic tool, we should be able to craft one that deals with the constraints of the system under study.

To accommodate the scale of the problem, the research methods should be adapted to the task as well. First, it is critical to integrate tool building into the research process, because without scalable tools we cannot handle large. Second, we have to work collaboratively both to share the practical costs and to integrate our approaches.

Holistic software assessment might sound too ambitious. But, it is simply too expensive to do it otherwise.

Posted by Tudor Girba at 19 March 2011, 12:34 am with tags assessment, research 2 comments link

The invisible strings of wrong assumptions

Please take a quick look at this movie.

This poor dog is laughable. Not seeing the obvious is always laughable. At least from outside, or in retrospect. The situation is different from the inside and during the action. It is much more serious. In fact, it is this seriousness that makes it miss the obvious. Ok, perhaps being a dog does not help the situation, but this kind of spinning around does happen to humans, too.

"Who must play cannot play," said James Carse in one of my favorite books. He said it in a slightly different context but it also holds true when it comes to us finding a way out of a corner. If we get too serious about the goal we risk missing the paths that are just in front of us. The more serious our state of mind is, the more focused we get. The more focused we get, the more our field of exploration narrows. The result is that whatever is just outside is not even considered.

And how do we decide what is considered and what not? Based on past experiences summarized in the form of assumptions. This is a great mechanism because it helps us deal with crisis situations. Assumptions help us prioritize rapidly by providing us with proven solutions. The more serious the situation, the more important assumptions get, and the more we focus on getting things done via explored paths.

But, it is exactly this mechanism that impedes us from finding new solutions. When assumptions get priority, we walk known paths. Thus, the chances of finding new paths diminishes dramatically.

Sometimes, like in the case of our dog, the situation is actually not that critical and we can have a good laugh at how we did not see what was in front of us. Yet, oftentimes it is important not to get stuck, especially when there actually isn’t any real obstacle.

If we find ourselves running in the same circle for more than two times, it probably means that we should remember that to be able to find something new, we first have to acknowledge the possibility of it. And the first thing in that direction is to get less serious and embrace the promise of a foolish thought.

Posted by Tudor Girba at 26 October 2009, 1:28 am with tags research, assessment comment link

Seven years in research

After seven years, I will move away from the academic ivory tower and into the wild territory of freelancing. Starting with October 1, I will start working full-time at Sw-eng. Software Engineering GmbH.

While research is not Tibet, the past 7 years still was an experience to remember. If nothing else, I got to earn a living without being expected to produce anything sensible.

Seriously, it was fascinating. I spent the day questioning issues that otherwise I would have taken for granted. Add to this that I was surrounded by eager minds that did not leave room for sloppy ideas, and you will get a glimpse of why "fascinating" is the right word.

So, what will I do from now on? I will offer consulting and coaching services in software engineering in general, and I will particularly focus on software assessmenttiny note. For a long time I argued that software assessment should take a more prominent position in software development. From now on I will get proactive about it, and I will bring with me the techniques I learnt and developed while in research.

The shift in my day-to-day focus will be perceived in this blog as well. I will continue to talk about presentation, representation and the rest, but I will start adding a flavor of the new things I will encounter.

Among others, in my quest for promoting software assessment I will feature more prominently issues embodied in or related to Moose, the analysis platform in whose development I was deeply involved for the past 7 years. While Moose is a piece of software, the essence of its philosophy is the empowerment of the analyst to control the course of the analysis. Thus, it is a vehicle that teaches that software assessment is a human activity, rather than a tooling issue. And this is one of the messages I want to bring in the practice of software development.

But let’s not get ahead of ourselves. All in all, I look forward to what is to come, and I would be happy if you will join me on this ride.

tiny note If you want to learn more about the services I provide, please feel free to contact me.

Posted by Tudor Girba at 22 September 2009, 12:34 pm with tags research, assessment, moose 3 comments link

Demo driven research

Research is about revealing what no one has revealed before. It consists of some distinct non-linear activities: understand the domain, find a new idea, make it applicable, gather experience reports, validate it, educate your peers, and eventually get it used in practice.

In other words it is a fascinating endeavor, and like any fascinating endeavor it has its challenges. Given that research is about navigating uncharted territories, the most pressing challenge is how to know whether you are on the right track. Or, as Yoggi Berra once put it:

You’ve got to be careful if you do not know where you are going, because you might not get there.
Yoggi Berra

We should try as much as possible to assess where we are and where we are heading. The question is: what should the process of research be to make sure we do not lose our way?

Before we go further, let us first take a look at what are the forces that influence the research environment in academia.

The engine of research is not the Professor and it is not really the PostDoc. It is the PhD student because he is the one that has the time and energy to actually go through all the required steps.

The design of a PhD requires the student to formulate, to build and to defend a thesis that is distinct enough from every other thesis around. I call this a "two arms length" situation because it is similar to what we use to do in school before starting the gymnastics exercises: each of us would raise and spread our hands to make sure we are at least two arms apart from each other. Like this we were safe from injuring each other while performing physical exercises. In the same way, the PhD student must be two conceptual arms away from anyone around him so that he is safe. While this design ensures originality, given that PhD students are so distant from each other, it also does not encourage deep collaborations, given that being at two arms length prevents handshaking. That is the reason why most PhD students have a hard time finding collaborators and they end up working alone.

From another point of view, regardless of what people might think, research is a highly competitive world. As a result, we naturally tend to regard our piers as competitors. Accordingly, it follows that it is better to keep our work a secret until it is ready for prime-time. This line of thought again tends to push students to work alone in their own corner.

However, lonesome research is less then optimal because it implies that research efforts are predominantly invested into rather small topics. Furthermore, when we work alone we are left defenseless against our greatest enemies: our own assumptions.

There is a way to find collaborators. Not on the horizontal plane, but on the vertical one. Instead of looking for peers that are interested in exactly the same things as we are, hence our possible competitors, we can look for people that either need what we do, or can provide input to what we do. Collaborating along these lines leads to win-win situations.

The effective way of fighting our assumptions is to expose them to other people and to obtain feedback. No everyone can provide useful feedback, but those that are truly interested can. Demo-driven research is a process that has as a goal finding the interested people and obtaining their feedback. I decompose this process into 7 distinct basic pieces.

1. Have a running model

Science is about models. Models are a representation of the real world, and they help us grasp the world by presenting only the details relevant for the problem at hand. To get a PhD you must create a model. This model has to solve a significant problem and it has to be original.

Models tend to remain abstract. They look nice on paper, but when it comes to the crude reality, they do not apply because of nitty gritty details.

2. Have a story

No matter how interesting a model is, it is still just about facts. And the fact is that facts are boring.

At conferences we get presented models after models, facts after facts, until we get to ask ourselves if the wireless network is still working. We just shut down. A shut down brain will not provide any feedback.

A good story, however, can capture interest. We learn through stories, we choose based on stories, and we even pay for stories. Stories are the way we perceive the world around us.

Your goal is to obtain feedback from interested people, but for that you need to find the interested people. Have a story together with your model. Get them intrigued. Get them excited. Get them interested.

3. Make your story fit your model

Not any story will do. Too many times we are sold fancy stories only to find out that the essence is not matching the advertisement. Announcing what is not there will not get you far. People will easily see through, and even if they will not at first, they will not stay with you too long. Your goal is to build trust.

The role of a story is not to just to sell, but to exercise the model in another, more accessible language. Your story should reflect your model. Make them raise their brows, but do not embellish. Intrigue honestly.

4. Make your model fit your story

Marketing is the art and science of telling a story about a product, and it is often viewed as being a cheap layer on top to make us buy something.

When I started my PhD, I wanted to build an analysis based on the metaphor of yesterday’s weather and applied it to predicting how software systems will change. That was the story, but when I tried to implement it the result was a rather incomprehensible two pages long piece of code. The problem was not that the computation was not right, the problem was that I just could not explain it to people. I tried to reorganize the code, but it just did not get any better. So, I went back to the model, introduced history a first class concept, and the implementation turned into a simple 5 lines of code. The concept of history as a first class concept became the very essence of my PhD thesis.

The relationship between the model and the story should be bi-directional, and you should not wait until the end to put a story on top of a model. The story development should be intertwined with the model.

5. Just demo

At the end, just go and demo. Both the model and the story will grow. The most important thing is to overcome the basic fear of being exposed. When you go and demo, it will be just you, your model and your story. You will be naked. And so it should be.

When you demo, keep in mind that you are asking for the most precious of all gifts from people: their time and their energy. So, be sure you ask for permission. If they give you 5 minutes stop after 5 minutes and ask if they want you to stop or if it would be Ok to go on for another 5 minutes. If they do, then stop. If they want you to go on, ask for the amount of time. If they give you 10, then take 10. If they give you 1 hour then take your time and elaborate for 1 hour. If they give you 2 hours, then take 1:30 hours.

To be sure you can demo, your demo should be ready at any time, so always have the starting point ready. And once you have a running demo, freeze it (Oscar Nierstrasz).

When you have a running model, you are tempted to start with the latest and greatest idea that you came up with and built last night. At least I do. Of course, some situations ask exactly for such kind of demos, but in most cases it hurts more than it helps because what you worked on last night is not engineered and might not work on all situations. So, fight this temptation and show a frozen demo from last month that works. Remember, the main goal is to deliver the story to get the audience interested enough to get you real feedback.

These days, it is considered a must to accompany your deliveries with slides. There used to be a time when slides were called visual aids. "Visual aid" reveals the intention of the item, while "slide" describes the technology of the item. Perhaps, it is because we focused for so long in getting the technology right that we forgot about the intension. Slides are visual aids. That is, they are visual and they are aids. Just aids. So, do not use them as walls of technological defense because they cannot deliver the story. You are the only one that can.

6. Listen

Perhaps, counterintuitive, the most important part of a demo is listening.

When you demo, the attention is directed to you. It is an empowering feeling, and you might start to like it. The problem with this is that you will tend to emphasize the parts that make you look good. Your goal should be about gathering feedback, and not to create an image.

Listening is hard especially when the feedback is negative. We tend to hasten to present our side of the story. While this is sometimes useful, it is more beneficial for you to listen fully to what your audience has to say.

How do you feel about the comments from an email notifying you that your paper got rejected? If you are like me, you will tend to think that the reviewers were shallow and did not get it. Even if sometimes this is the case, it does not help anyone if you feel oppressed and continue to complain. It is much healthier to fight these initial bursts and concentrate on learning from the reviews as much as you can and try to improve your work and its presentation. And many times you will be surprised at how often even if the idea still stands, the presentation is not proper. I know I am.

You need to listen. One way to force yourself into doing it, is to remember that your greatest enemies are your own assumptions.

7. Give feedback yourself

Feedback is a gift. Be happy when you get it, but do not forget to return it.

When you provide feedback, do not remain shallow. Get involved. Invest your energy and dive into all details you can get your hands on. Do not stop at the model. Look at the story, too. Form an opinion. Propose alternatives.

If done correctly, while providing feedback you get to observe from the outside how difficult it is to see obvious things from inside. You get to learn that the point of view is all that matters.

Posted by Tudor Girba at 8 September 2009, 12:57 am with tags research, demo 2 comments link
<< 1 2 3 >>