Statistical Code Analysis with SonarQube

Developers sometimes ask themselves: “What are we actually doing here?” Now that has nothing to do with a sudden crisis of being or the disintegration of their personal ideology. It has, in fact, far more to do with the fact that their inner software architect awakens and demands an illustration of the whole.

This article demonstrates how answers to the following questions can be found with SonarQube:

  • What does the software’s structure look like?
  • Which areas are particularly affected by programming mistakes?
  • Where were incorrect dependencies incorporated?
  • What is the actual situation with my layers and are they still completely secure?

The question that really should be asked however is why the architect slept so long that such self-doubt could arise in the first place? The answer is relatively simple unfortunately: A typical everyday mistake. It can always happen that one or two points are simply overlooked even though you should really know better. It is perhaps the case that no consideration is given to whether the logic really does belong in a component or if it should be extracted. That is something about which you can rack your brains about later and possibly refactor. Or it is just not possible to retrace as to why this test has suddenly failed, even though everything is correct. Time is getting short, there are further tickets waiting.

The following problem fields result from this:

  • No or insufficient tests: Was the software correct before I adapted it? Is it still correct afterwards?
  • Time pressure: Must not exist, still happens though unfortunately.
  • Legacy Code: There are some legacies that no one wants to follow.

Such mistakes then add-up quickly. Even when the same developers work on the project, a Code basis is still eroded. There are then a lot of names for such projects:

  • Big Ball of Mud: There are no recognisable architectures or clear dependencies. Everything seems to be interwoven with everything else.
  • Gas factory: A gas factory uses up a lot of gas and on the face of it, only hot air comes out because no one understands it. Effectively the direct opposite of the Big Ball of Mud.
  • Spaghetti Code: Long methods. Popularly seen together with Copy Pasta.
  • Inner platform: Parts of the application are so strongly configured in their behaviour that they become a weak copy of the platform that they have been built with.
  • Sumo marriage: Parts of the application that separate from one another are so strongly bound with each other that they are effectively inseparable.

What you then have exactly, where you best start to clear-up, how you measure if this is then actually an improvement, all this is something that statistical Code analysis can help you with.

What does statistical Code analysis do?

Statistical Code analysis takes compiled Code or source text, uses metrics on it and generates numbers. In order that this does not have to be done by hand, there are ready-made tools for this. The most well-known in the Java world are sure to be FindBugs, PMD and checkstyle:

  • FindBugs is the only one of the three named that works on a bytecode level. While doing so it searches for hard mistakes such as problems in class hierarchies, defective array treatments, impossible type transformations or equals- and hashcode-methods that have not been overwritten in pairs.
  • PMD, whose name has no official meaning by the way, looks for inefficient Code. Also included here are empty blocks of Code, unused variables or wasteful use of Strings or StringBuffers.
  • checkstyle, as the third member of this trio, examines the programming style and, in the process, works exactly like PMD on the source-text level. This allows the adherence with programming guidelines or formatters to be forced.

These tools either use more or less metrics in order to create their respective statistics in accordance with their nature and the areas where they are used. The name “metric”, however, has very little to do with what a metric actually entails. If you look up where the word originates from, you encounter Latin: “ars metrica“, the science of measurements. But if you ask at the Institute of Electrical and Electronics Engineers, what a software metric is, you get the following answer:

“software quality metric: A function whose inputs are software data and whose output is a single numerical value that can be interpreted as the degree to which software possesses a given attribute that affects its quality.” “Eine Softwarequalitätsmetrik ist eine Funktion, die eine Software-Einheit in einen Zahlenwert abbildet, welcher als Erfüllungsgrad einer Qualitätseigenschaft der Software-Einheit interpretierbar ist.” – IEEE Standard 1061, 1998

In the end, this means that a metric is a function that generates numbers for arbitrary entries. These are so constituted that they can be compared with one another, as long as they were generated using the same function. This means that conclusions can be drawn from the entries with respect to the function.

An example for this is the McCabe metric, also called cyclomatic complexity. This very fundamental metric calculates the number of different paths through a piece of Code. The formula is very simple: A number of control structures such as if, while, case and boolean operators such as && and || are totalled up and 1 is added. Let’s look at this information once again using an example:

String nameOfDayInWeek(int nr) {
    switch(nr) {
        case 1: return "Monday";
        case 2: return "Tuesday";
        case 3: return "Wednesday";
        case 4: return "Thursday";
        case 5: return "Friday";
        case 6: return "Saturday";
        case 7: return "Sunday";
    }
    return "";
}

This very simple method returns the name of the week day corresponding to its 1-indicated position within the week. Its cyclomatic complexity consists of eight: 1 plus 7 times case. This is a relatively high value: A maximum value of 10 is generally regarded as being acceptable and sufficiently tested. So in order to reduce the complexity of this method, it is refactored:

String nameOfDayInWeek(int nr) {
    String[] names =  new String[] {
        "Monday", "Tuesday", "Wednesday",
        "Thursday", "Friday", "Saturday",
        "Sunday"
    };
    if(nr > 0 && nr <= names.length) {
        return names[nr - 1];
    }
    return "";
}

The cyclomatic complexity of this method consists of three: 1 plus 1 times if plus 1 times &&. Through the different approach, the complexity is reduced, it is relatively indisputable that the first version can be more quickly understood.

Now, if you want to use all the tools at the same time, all need to be configured and their results merged so that they give a common picture. It furthermore inevitably comes to doublings in evaluated metrics or other indicators. PMD, for example, due to its relative vague field of activity has overlaps with respect to Code style with checkstyle, while it also, exactly like FindBugs equally pays attention to dead Code. At these and further places, SonarQube can bring about improvements.

SonarQube

SonarQube fundamentaly consists of three components that are roughly separated from one another: A scanner that takes Code and analyses it, a database in which analysis results are saved, and a web component that shows the gathered results after processing. This allows the scanner to be retrieved from any source, for example from a Maven-Build, a CI-Server or out of IDEs. SonarQube is under the LGPL v3, making it OpenSource.

When it comes to analysis and metrics, SonarQube uses, among others, the tools already named. The analyses of PMD and checkstyle are securely integrated, FindBugs can be reinstalled via the plug-in interface. These can also be used to subsequently submit many further functions. In addition to language support, for example for Java or PHP, plug-ins also offer functions for source text administration systems such as Git, Subversion or GitHubs Pull Requests.

SonarQube makes reference values available for the analysis results derived from metrics. An Issue is created for every infringement against such a reference value. These are sorted in accordance with category and severity.

An analysis of Apache Log4j in the Version 1.2.18-SNAPSHOT was carried out for the following screenshots.

Categories

During the analysis SonarQube divides the metric infringements, named Issues, into three categories in addition to severity:

  • Code Smell: An example for this are the cyclomatic complexities, as Deprecated marked Code or useless mathematical functions, for example the rounding of constants. Such Issues in the majority of cases indicate the existence of more fundamental problems. If the cyclomatic complexity of a method, for example, is too significant, this could possibly suggest that a design fault exists in the architecture.
  • Vulnerability: Issues that affect the security land in this category. This does not just refer to the security in the form of SQL Injection or set passwords that have been programmed, but also to the inner security. The Issue “public static fields should be constants”, for example should be classified here.
  • Bug: It is possible to encounter classic tradesman’s mistake in this case, mistakes which contradict the Java specification for example. This means that Issues such as the comparison of classes via their not fully qualified names, unending loops or the dereferencing of known zero variables, are located here.

Severity

Issues continue to be divided in accordance with their severity. This can range from a stop to development through to a notification on the margin.

  • Blocker: In the most severe category, Issues such as permanently saved passwords can be found. As the name already suggests, no project with Issues of this type should be further developed without these being rectified.
  • Critical: This is where those Issues land, such as untreated Exceptions or others, for example, that could have a severe negative impact on the programme sequences.
  • Major: Issues that land here already belong in the area of Code-style or conventions: Empty blocks of Code without an explanatory commentary, absent @Override annotations or commented-out Code.
  • Minor: Issues of this severity are syntactically correct, their semantics left a lot to be desired however. This is why unnecessary type conversions, duplications or unused return values belong here.
  • Information: Issues are assigned to this severity that are meant to be treated at some stage, but which can otherwise also continue to be neglected. TODO comments should be removed or the @Deprecated Code removed.

Automatic Interpretation

SonarQube uses a process for the rough evaluation of the Code basis: the technical debt that is often quoted in other places. Each Issue is assigned a length of time that is required in order to rectify it. The sum of the technical debt is then put in the ratio to the entire effort that the project involves, and this provides the Maintainability Rating:
Maintainability Rating = Technical Debt / Development Cost
Depending on the ratio of these two indicators, this is then the extent of the Maintainability Rating as described by SonarQube.

  • A: 0 – 0,1
  • B: 0,11 – 0,2
  • C: 0,21 – 0,5
  • D: 0,5 – 1
  • E: > 1

Now you never exactly know yourself, and certainly not SonarQube, just how much development effort has gone into a project, or better still, how much effort is required for a single line of Code. This means that it is always an estimation: SonarQube assumes that per line of Code, 30 minutes of development time are required, irrespective of how old they already are. This of course does not apply to every line of Code, but it should function as a rough average time span for large projects over a longer period of time.

A Sample Bill: We have a small project consisting of 2,500 lines and have accumulated 50 days of technical debt. Over the course of a typical working day of eight hours, which SonarQube assumes, 16 lines of Code can be written, or 0.0625 days are required per line. This results in the following bill: 50 / (0.0625 * 2,500) = 0.32. According to the table above, this results in a C grade.

If we take a closer look once again at the evaluation benchmark, it can be seen that per time unit of development, a maximum of ten percent of this needs to be generated on technical debt, in order to receive a grade that is worse than A. From my own experience, I can say that each project that is large enough, achieves exactly this grade in the overall consideration. This appears to be less astonishing when we consider that large projects run for longer and it is over this long time period that a lot of good average Code is produced on average. It is for this reason interesting when Maintainability Rating is calculated in accordance with the categories that are already known. Bugs, Vulnerability and Code Smells, as is already displayed on the screenshot “category” above.

Manual Interpretation

It is now the case that we have a rather unmanageably large volume of statistics and numbers. The interesting part of the Code analysis is in fact not just to accept these statistics and simply believe them, but to assign importance to them through interpretation. A few simple rules need to be adhered to within the process:

  • Relative values beat absolute: 5,000 Issues in a project? Sounds like a lot, but not anymore when you know that there are only five per class.
  • Changes beat the status quo: Completely in sync with the principle of forward thinking, it is more interesting that 5,000 Issues in the last release could be closed and only 500 new were received, then when 4,500 were still open. The positive development is the most important thing.
  • Understanding metrics: If a metric was not understood, it can neither be applied, nor can its result be interpreted. For example, it is great to know what cyclomatic complexity is. However, whoever equates this to legibility or even maintainability, is on the wrong track.
  • Put metrics in relation to one another: As the previous sentence already alluded to, a solidary metric tells us very little. A high cyclomatic complexity can certainly mean that the affected Code is difficult to read. There are, however, also still metrics such as the maximum nesting depth or the complexity of Boolean expressions, who also have a thing or two to say about things.

Understanding metrics

The most popular, but at the same time, most abused metric, is the Lines of Code, often abbreviated to LOC. But that LOC is not simply LOC can be readily recognised when we take a quick look in common reference literature:

    • Lines of Code (LOC): All physically existing lines. Comments, brackets, empty spaces etc.
    • Source Lines of Code (SLOC): Like LOC, just without empty spaces and comments
    • Comment Lines of Code (CLOC): All comment lines
    • Non-Comment Lines of Code (NCLOC): All lines that are not empty spaces, comments, brackets, includes etc.

(Source: https://de.wikipedia.org/wiki/Lines_of_Code)

If the size of a project is quantified, which reference value should someone use? Another question: how large can the influence of the different Code Styleguides on the number of LOC and on the number of NCLOC be?

SonarQube uses this for its data about the number of lines of Code NCLOC.

Put metrics in relation to one another

Let’s have a further look at example Code:

DTO search(List<List<DTO>> rawData, int id) {
  if(rawData != null) {
    for(List<DTO> sublist : rawData) {
      for(DTO dto : sublist) {
        if(dto.getId() == id) {
          return dto;
        }
      }
    }
  }
  return null;
}

The method shown above searches for an object with a given ID in a double-nested list. According to McCabe, it possesses a cyclomatic complexity of 5, which is within boundaries. However, at the same time it has a maximum nesting depth of four. SonarQube defines a maximum of 3, meaning that this method has its first Issue. If we had taken McCabe into consideration, it would have passed through as green and would never have attracted any attention in the report.

One early return later, we have solved the problem:

DTO search(List<List<DTO>> rawData, int id) {
  if(rawData == null) {
    return null;
  }

  for(List<DTO> sublist : rawData) {
    for(DTO dto : sublist) {
      if(dto.getId() == id) {
        return dto;
      }
    }
  }
  return null;
}

The metrics are satisfied, but are we as well? There is a reason why they belong to the category Code Smell: it is very probable that the architecture design failed in this case: Why was a double-nested list passed into the method? Does the Code intentionally work on this limited abstraction level? But perhaps work was carried out that bypassed the domain design? Why wasn’t the suitable DTO retrieved from the data source, instead of initiating a method with a high run-time complexity? Many also say that the DTO pattern is an anti-pattern. From these questions alone we can derive that the actual errors are not to be found in the individual methods, but need to be looked for on a higher level.

A further combination of metrics would be the already mentioned size of the Code basis, counted in terms of the number of Lines and the number of Issues. If the Code basis is large enough, the absolute number of Issues is already large enough. This is also reflected in the Maintainability Rating. If we, however, now add a third dimension, the project run-time, the consideration becomes a lot more interesting. If the project grows very strongly very quickly, a high or low number of Issues receives a completely other meaning. If it shrinks, however, and the Issues remain the same, it can happen that there was not enough time in order to deal with these. If, however, there had been sufficient time available, this development receives a negative aftertaste. At that point, it needs to be pointed out once again that the number of different developers working on a project and their fluctuations is an excellent metric.

As a further point, the development was already named above: Changes beat the status quo: SonarQube creates histories that can also be wonderfully evaluated. For example, the development of the project size can be compared with the number of duplications:

Within the same time period in which 5,000 lines of Code were removed, 2.5% of Code duplications were added. That can mean that redundant Code has been created, or non-redundant Code has been deleted, which in turn can mean that duplications can be more significant. If it is then also taken into consideration that log4j has a total of 16,000 lines of Code, the latter is more probable.

Experience

  1. Assuming that a sufficiently large code basis exists, many Legacy projects achieve an A rating. As already mentioned, that is not very surprising. When the average Code contribution is of a good average, than the average is good. The distribution according to Issue category helps in this case enormously.
  2. In the embedded area there is a principle that for all 30 rule infringements, 3 small and a significant Bug can be expected. This rule helps to extrapolate the bugs during operation from the number of Issues during the development. This because it is usually difficult to create a direct connection between these indicators: In certain forecast NullPointers, Users simply do not walk in for example because the applications are just not operated in the necessary way. However, the average values mentioned above also help here. The purely numerical values are certain to require fine adjustment in order to fit on the respective application area, the sample should fit however.
  3. Working live with SonarQube can lead to gas factories. If a dedicated SonarQube monitor is set-up in the office, the team members will observe that Issues are either rectified immediately or that they are just avoided from the get-go. Although this tendency is certainly worthy of praise, it can however lead to an overly complex way of thinking and losing sight of the actual goal. This behavioural pattern has a name: Over Engineering. It is recommendable to look together at the end of the sprint or at a comparable points-in-time, how the Code basis is developing.

Summary

In summary, the following can be said:

  • Code analysis provides a feeling for the code basis. It is only in this way that grounded statements can be made as to which area of the project are particularly endangered, unstable or need to be renovated.
  • Regular analyses could increase the team motivation. A positive Issue balance at the end of a Sprint and an upward increasing history graphs should be good drivers for a group developer and as evidence of one’s own working performance.
  • Analysis results can serve as an argumentation basis. With the help of the project history that contains a selection of indicators that can be well illustrated, this translates into a better position from which to discuss a possibly required technical Release with customers or decision makers.

The original article from the magazine “Java aktuell” is available here for download.

Share this article

Josha von Gizycki
Software Developer
Constantly learning to expand your knowledge and exchanging experiences and ideas with like-minded people are, in his opinion, the most important building blocks for constant improvement.