Holistic software assessment

To manage software evolution, we first need to understand the mechanisms that govern it. Software evolution is a multidimensional phenomenon that exhibits itself in multiple forms and at multiple levels of abstraction. First, the evolution of software systems can be observed in various forms: source code and its history, runtime information, bug reports, specifications and diagrams, build and deployment scripts, tests, formal documents, or informal message exchanges. Second, a complex software system relies on several layers of abstraction: the semantics of the different programming languages used, the semantics of the interaction between these languages, the semantics of the multitude of technologies used, the architectural constraints, the actual data that is manipulated, or the way features are mapped on all these. Third, there are many actors interested in the effects of software evolution from several perspectives such as ensuring that the right functionality is implemented right, that the internal software structure is maintainable, that system is secure, or that it has the desired performance. These are but a handful of issues.

I maintain that to assess software evolution effectively we need to approach it holistically.

I have developed several techniques that tackle a wide spectrum of software analyses ranging from the course-grained analysis of the system architecture to the fine-grained understanding of the runtime of the system, and from the code duplication detection to semantic analysis of the natural language vocabulary used in the source code. While these works might appear divergent, they are complementary as each reveals a different facet of a software system. No one single technique is enough to provide an overview of a system. That is why we need to approach the assessment of software systems from multiple points of view.

This philosophy is embodied in Moose, an open-source platform for software and data analysis. Moose is a comprehensive platform that tackles the complete cycle of analysis: from providing a generic meta-modeling and data exchange engine cite{Duca08a} to offering a scriptable infrastructure for interactive visualizations cite{Meye06a}. Being at the center of the Moose community allowed me to interact with several dozens researchers and students and to form a broader perspective on the issue of software evolution. I summarize the lessons learnt as:

Comprehensive analyses should be based on multiple data sources.
Software systems are large and they typically rely on data from multiple sources, such as source code in multiple languages, deployment descriptors etc. For providing an overview of the system, analyses should take all these sources into account. For example, in the context of enterprise systems, an accurate identification of quality problems should complement the source code with other data sources related to the specific technologies used cite{Peri10a}.
The process of designing analyses should control all steps, from fact extraction to modeling and to presentation.
The analysis process involves several types of activities: extracting the facts from data sources, modeling the information, developing mining and analysis techniques, visualizing the information, and even constructing complete interactive user interfaces for exposing the underlying data to the user cite{Nier05c}. Each of these activities influence the resulting analysis and thus, the research process should explicitly tackle them.
Analyses should be expressed on a central meta-model.
Every analysis reveals new points of view, but for them to be combinable and comparable, we need to express them in a common meta-model or formalism. For example, during my PhD I argued that to analyze software evolution effectively we need an explicit meta-model. I introduced the notion of history as a first class entity, and I defined the Hismo meta-model. Hismo places historical data and structural data in the same space, and eases the expression of complex evolution analyses cite{Girb06a}.
Lessons learnt from assessment should affect forward engineering practices and models.
During assessment we have the chance to draw higher-level lessons about how to improve forward engineering. For example, we used the experience in dynamic analysis to develop of a novel approach that makes back-in-time debugging practical cite{Lien08b}.

I have long been concerned with testing the researched ideas into real contexts. To this end, I collaborated with industrial partners on both research and consulting projects. Since more than one year, I went a step further and I dedicated full time to bringing my assessment experience in industry.

During these assessment projects, it soon became apparent that real-life assessment problems tend to be specific, to be intertwined with context specific details, and to require answers in a short amount of time. In these situations, generic, fully automatic analysis tools do not work, because they cannot deal with the given context. As a consequence, even if there exist plenty of automatic tools, engineers recourse to manual code reading. However, given the size of systems, the manual solution leads to decisions based on partial and inaccurate information.

As a solution, I introduced a new approach entitled humane assessment. It is called “humane” because it is suited to our human abilities: on the one hand, we need tools to deal with the vastness of data, on the other hand, we need these tools to be custom. Thus, I advocate that assessment is only effective when the engineers posses the knowledge and ability of crafting their own tools. To put this idea in practice I coach engineers to use the tool building capabilities of the Moose platform.

I intend to continue being involved in industrial projects for two reasons. First, being exposed to real problems is critical for research. Second, it is only through direct contact that research results can make their way into industrial practices.