Aggregated software documentation

In software development, software documentation is an essential building block for preserving knowledge and creating transparency.
In contrast to source code, documentation can provide information on why certain architectural decisions were made, unlike the source code itself, which only answers the how.

”Code tells you how, comments tell you why
Jeff AtwoodGründer von Stack Overflow

Furthermore, the focus and degree of abstraction as well as the perspective can be deliberately chosen in a documentation, which is not possible with source code.

Software documentation as utopia

However, software documentation is the poor relation of most developers.
From the developers’ point of view, a rudimentary readme is often sufficient (GitHub best practices).
The readme seems to be the ideal compromise between no software documentation and the user manual.
It fulfills the following purposes:

Brief description with technology overview
(Local) set-up
Maintainability due to small scope

One criticism that all software documentation has to put up with is that it quickly becomes outdated and is often poorly tested.
One reason for this is that documentation is usually a static projection of a software system, and software is volatile.
This means that since software is constantly changing, there is a gap between the actual state of the documentation and the target state of the software system at the moment.
Depending on the resolution and level of detail of the documentation, this gap can be too large, which can quickly render the documentation unusable without constant updating.

In addition, it does not create any direct added value for developers who have already internalized everything relevant.
In the heat of the moment, documentary reworking may seem like pointless busywork to many developers, the benefits of which only become apparent when an old project is dusted off again after a months-long break and needs to be restarted.
All the “trivialities“ that were not worth automating or documenting now have to be painstakingly reprocessed.
You often hear the subsequent rationalization that the benefits of not documenting outweigh the costs.

However, this perception may be wrong and if the correct detail resolution is selected, even more extensive software documentation can still be usable without constant reworking.

Problem definition of the software documentation

In a customer project, we had a similar problem, namely the subsequent software documentation of these trivialities, to document a historically grown system of different software components retrospectively.
In this case, retrospectively means that most of the developers involved in building the system were no longer available.
We were entering a system landscape that had grown over decades and the developer culture promoted specialization in a few projects rather than universalists.
This led to the readmes mentioned above.

So only use Git repositories and log archives as sources and apply the trained eye of decades of experience as a software developer.
This restriction seems dramatic at first, but a surprising amount can be derived from the sources mentioned and presented in a suitable form, as will be demonstrated later.

Our efforts were essentially aimed at two aspects: Documenting the system landscape and software components, called services.
The customer expects the software documentation to simplify the onboarding of new employees and provide a better overview of the interrelationships, complexity and criticality of the systems.

Here is a brief excursion into the essential characteristics of the two perspectives.

System landscape in the software documentation

The software documentation of the system landscape provides a comprehensive overview of the technological infrastructure of a project or organization.
It provides a useful framework for architects and managers to develop a common understanding of systemic dependencies and interrelationships.
The level of abstraction here is relatively high and the level of technical detail is low.

Interaction diagrams: Diagrams that show the interactions between the various system components, including external interfaces and services.

Architecture diagrams: Graphical representations of the system architecture that illustrate the components, their dependencies and their provision in the infrastructure.

Dependencies and integrations: Descriptions of dependencies between different system components and integrations with external services or systems.

Deployment architecture: Information about the deployment environment, including cloud platforms, server configurations and network topologies.

Software components (services)

The documentation of software components focuses on the description of a specific component or module within a software project.
This documentation provides detailed insights into the functionality, choice of technology, structure and interfaces of a single unit of the system.
Typical content is the readme mentioned above.
The target group for this artifact is technically experienced persons, usually software developers.
The documentation of the software component provides developers with a comprehensive understanding of the internal workings of a particular unit of the system, which facilitates development, maintenance and troubleshooting.

Theory of software documentation

Fortunately, despite the large number of over 50 projects, a homogeneous structure of the Git repositories and the technologies used has been established.
The aim was to generate as much documentation as possible automatically.
However, it quickly became clear that data had to be supplied manually in order to bring the semantics, service groupings and descriptions up to the desired standard.
The following section outlines which information could be extracted automatically from which sources.

Technology stack

Maven, Gradle, npm: The technology stack of a project provides insight into the tools and frameworks used for development and the build process.

Dependencies module

Maven, Gradle: These build management tools enable the definition and management of module dependencies that influence the structure and composition of the project.

Library dependencies

Maven, Gradle, npm: Information about external and internal libraries and their dependencies is crucial to understanding the functionality and integration of the code.

Runtime/Service dependencies

The analysis of access logs, Spring Boot application properties and Kubernetes YAMLs provides insights into the runtime dependencies of the component.

Releases

Tags from Git, Maven, Nexus: The identification of releases and versions makes it possible to document important milestones and the progress of the project.

Deployments

Kubernetes and deploy scripts: Deploying applications to production environments requires the use of tools such as Kubernetes and specific deployment scripts.

Runtimes

Dockerfile and pipeline specification: Docker files and pipeline specifications are used to define and document the build and deployment process of a project.

Contributors

Git commits: The analysis of Git commits makes it possible to track the participation of developers in the project over time and to recognize their contributions.

Project Status

Git Activity, Jira: The activities in Git and in Jira tickets provide insights into the current status and progress of the project.

Implementation of the software documentation

The entire source code was available on GitLab. Using the GitLab API all projects can be read out and cloned locally. GitLab also provides some metadata that was relevant for the service documentation. With Grafana Queries, the effective service dependencies could be resolved live, as outgoing and incoming requests were logged in Prometheus.

Analysis of the software documentation

The source code was analyzed using special scanners and the data stored in a Postgres database. These scanners are programs written in TypeScript to analyze the above information. These scanners run in a GitLab pipeline, as a cron job, and generate documentation that is as up-to-date as possible. The documentation is published in GitLab CI/CD Pipelines.

Once the analysis phase was complete, the service-dependent and logical groupings (see Figure 1) visualized and Markdown files (see figure 2) are generated. To do this, all data fragments had to be logically linked together. The visualizations were created with PlantUML in Markdown.

Diagram of a service architecture akin to software documentation, showcasing various components. The main service is central, connecting with external services on the left and user interfaces on the right. It includes multiple internal modules and a database at the bottom.

Screenshot of a webpage titled Service 1 showcasing comprehensive software documentation. It features sections like General information, Environments, and Modules, with details such as GitLab project URLs, host criteria, and specific script paths for test and dev environments.

Summary

Good documentation can make the lives of developers and managers much easier and ensure greater planning security. A lack of understanding of the system can act as a brake and like sand in the gears. Generated software documentation, as shown here, can counteract this. Through aggregation and automation, key points of criticism such as cost/benefit and timeliness have been significantly mitigated.

Aggregated software documentation – A field report