frankie-tales

A call to minimalistic programming

2025-09-10T17:00:00Z

Minimalism in development is a forgotten virtue of our time that should gain more attention. A straightforward summary of some minimalism principles is available here. Briefly, the principles of minimalism in Software Engineering can be summarized as follows, based on the manifesto for minimalism.

Fight for Pareto's law: look for the 20% of effort that will yield 80% of the results.
Prioritize: minimalism isn't about not doing things but about focusing first on the important.
The perfect is the enemy of the good: first do it, then do it right, then do it better.
Kill the baby: don't be afraid of starting all over again. Fail soon, learn fast.
Add value: continuously consider how you can support your team and enhance your position in that field or skill.
Basics, first: always follow top-down thinking, starting with the best practices of computer science.
Think differently: simple is more complicated than complex, which means you'll need to use your creativity.
Synthesis is the key to communication: we have to write code for humans, not machines.
Keep it plain: try to keep your designs with a few layers of indirection.
Clean kipple and redundancy: minimalism is all about removing distractions.

Most of those principles are coherent with each other and relate heavily to the well-known Unix KISS principle.

An extended and fascinating book about the practical application of such principles is Eric S. Raymond's "The Art of Unix Programming", which I strongly recommend reading. I can also recommend a now-classic volume on the same topic by John Ousterhout, "A Philosophy of Software Design". Both outline practical examples of how minimalism in design can be effectively embraced, with a focus on doing the right thing sooner rather than later.

The same principles could (or maybe should) be applied even to programming languages, but this is often a neglected aspect of such a minimalistic approach. Note that one of the most successful languages of all time is the C language, which indeed has a straightforward syntax and, as such, cannot be easy to use correctly (the principle is that what is simple is not necessarily easy, too). That's because the programmer needs to create her/his own abstractions and layers to build her/his vision of a software design. It seems that this is precisely the opposite of the C++ or Java approach, where the entire specification spans thousands of pages, and many high-level abstractions are integral parts of the language. The same can be applied to Python nowadays, which started as a simple language, more readable and clean than Perl, but now has a wide and articulated specification. Again, hundreds of pages are now needed to describe a once-simple language, where tons of new features and abstractions have been added to enrich its expressiveness. If one considers its standard libraries and modules, the actual situation appears even worse. Can such an approach be considered easier? I don't think so. Let me say: how can a program be considered simple if it relies on hundreds (or even thousands, including dependencies recursively) of external modules, as well as hundreds of syntactical constructs and glues? Some languages also manage multi-versioned dependencies, allowing a program to cross-depend on multiple editions of the same module (yes, JavaScript, I'm talking about you), with the concrete possibility of introducing obscure bugs as a result. At the opposite extreme, there is the consideration that we only know and deeply understand what we make.

Minimalism also means actively seeking a balance between these two opposing approaches, because reusing third-party modules and packages can be an immediate solution to deadline urgencies, but can also potentially introduce instability and dependencies on unmaintained software in the long run. Long dependency chains where changes happen independently of the main program focus and are introduced by third-party motivations and reasons - often with wrong timing for depending projects - can cause breakages at multiple levels.

Of course, to reach the right tradeoff, a few things need to be considered: every single programmer could not be smarter than a lot of libraries and modules out there, where multiple developers could have spent hours/weeks/months, or even years refining them. That's true, but it is also true that not all libraries or modules are written with the same level of quality and effort. For instance, we all know cases of elementary modules available for Node that could be easily avoided, and instead are imported for some form of laziness in development. Even, sometimes features that need to be used could be only a small portion of the whole library/module, which could be reimplemented with a very reasonable effort and time. This approach could be amplified in modern times when AI tools could significantly increase productivity in such cases. I would simplify these concepts with some additional mottos:

Limit your external dependencies: avoid depending on modules or libraries that are not strictly required to significantly reduce the total development time, are not rock stable for their interfaces and features, and do not have a clear and stabilized roadmap.
Reproducibility of the software stack is a must: these days, a SBOM has become recommended/mandatory, but it should not only consist in a documentation of external dependencies and their versions, but also the full process of building a runtime environment should be fully defined and consistent for the long term.
Do not follow the last oh!-so-cool technology: while that could be done for an amateur project to develop during spare time, it is not a good idea depending on a technology whose future is not clearly stated, with a well-established development team and proven sustainability in the long term. I consider a risk even depending on a single company project, and even more if it is considered a startup. Synthetically, this can be generically considered as minimalism in coding style.

Moreover, if you are going to use a well-established framework, such as Django, for developing your mid-to-long-term web project, it is probably better than using the latest Nodejs-based framework created six months ago that seems the latest 'big thing'. But that's probably only common sense. Instead, ask yourself if your project should be created from scratch using a simple jamstack system and some microservices for well-defined and minimal parts. In many cases, that is more than enough for too many CMS-based sites out there: indeed, I continuously ask myself why a lot of websites are still based on WordPress, when most of them could be easily converted into a handful of static pages and simple JavaScript snippets that they will use in any case. This can be declined in terms of minimalism in defining computing architectures, which can also allow scaling up applications more easily.

So minimalism principles can be considered at multiple levels: for programming languages, libraries, architectures, and design. However, they require skills, in-depth research, and a significant amount of time to dedicate to continuous refactoring and meditation about viable alternatives. And that's probably the key point: developers with deadlines and urgency imposed by PMs are too often tempted to follow the easiest and richest paths and provide a solution of any kind without too much meditation on the final balance among efforts, quality, efficiency, and durability of results.

Of course, about minimalism, an extraordinary citation is due for the whole suckless effort on the uncompromising minimalism side. And why not?.

Ok, ok, I'm joking. But you got the point.

Does HPC mean High-Pain Computing?

2025-09-06T19:40:00Z

Please, forgive the silly joke in the title of this semi-serious post, but lately I have been thinking about the strange fate of an area of general computing that I have spent more and more time in recently, as in the near and far past. For my job, I have utilized a series of scientific HPC clusters worldwide to solve multiple computing problems most efficiently by distributing computation across numerous nodes. Over the last thirty years, all such platforms have consistently shared the same common characteristics, which invariably pose a problem in their use for the average scientist (often a young/junior dedicated to a short-term project) in any application domain.

To use Fred Brooks' definition, HPC technologies have both intrinsic and incidental fallacies for such users category. The intrinsic one is due to the inner complexity of creating a parallel and distributed solution to any problem, possibly in a way that does not harm the final implementation due to the increase in communication time among computational agents. This is already a relevant problem per se, which can often be out of the abilities, knowledge, and interests of the average researcher in bioinformatics, physics, mathematics, remote sensing, or whatever other research domain.

The incidental fallacy is instead always due to the accessibility of platforms and the technologies used for their implementation. At large, all such HPC clusters are a large pool of multi-core hosts with plenty of memory and connected with multiple high-speed networks for implementing some sort of multi-tier distributed POSIX file system and/or object storage. Users can log in on a limited number of such hosts that are connected to all others and run some type of scheduling system (e.g., Slurm or HTcondor) where multiple computational nodes can be reserved for a limited period of time to execute batch jobs or even an interactive one (mainly for debugging). In most cases, such clusters can also be used with some MPI/OpenMP implementations for proper parallel computational modeling based on message passing among computing agents that run on multiple cores and hosts, with or without multi-threading. Alternatively, GPUs can also be reserved and exploited via Cuda/OpenCL. In many cases, such implementations are vendor-oriented and trigger the need to adopt specific libraries and compilers that add another layer of complexity to implementations.

The incidental problems start when the casual users discover that all such computing nodes invariably run some legacy enterprise Linux distribution that is maintained for a period of ten years or even more, until a full reinstallation of the whole cluster. On top of such legacy systems (that are for any practical use simply unusable as such) these scientific clusters give essentially a few different mechanisms for creating a general computational environment:

Environment Modules
Containers (Singularity or Apptainer)
Anaconda/Miniconda-like environment (or free forks like Miniforge)
Some specific software/application to run

But for containers, the other solutions are all binary-based hubs, which could expose them to possible breakages when the application developed needs to access exotic language bindings for extensions, and the poor users enter the mysterious and dangerous world of ABI violations and a chain of broken dependencies. Even, often, such hubs are not always consistent, and any upgrade by the admin team exposes them to sudden breakages from night to day.

The final solution (or apparently so) nowadays is using containerization and a target environment where the user code can find all and only the correct dependencies and versions for the whole software stack of the application. This, at least, until the third-party hubs of base distributions and languages ensure complete consistency and retain past binaries and versions for any medium/long-term need. Of course, a full source-based stack with proper version tracking a la Guix would help to avoid dependencies on external binary hubs and seems the way to go. Indeed, a small group of interest in such a solution has existed for a few years, but I am unaware of so many HPC clusters that consistently propose this kind of implementation for users. That said, writing Guile Scheme descriptors for preparing an execution environment may not be within the reach of the average researcher in biochemistry or astrophysics.

Unfortunately, as I wrote in a past post on this digital site, this moves the whole responsibility of a software stack maintenance onto the shoulders of the final users, who are often the infamous junior profiles I mentioned before. These are non-IT specialists who should adopt such HPC platform to implement solutions as part of their daily job in their special scientific domain.

The result, to be honest, is that the average researcher simply tries to avoid the whole thing as soon as possible because of the significant complexity that the entire thing involves, while the private sector introduced specialistic roles of data and software engineers to manage such problems properly (which is the only reasonable approach, indeed). Adding insult to injury, in some academic areas, such interests in HPC are also viewed with contempt or as a waste of time, if not openly discouraged.

All this explains why a roundabout in any of the significant HPC clusters worldwide often guarantees hilarious experiences in terms of who is doing what and how.

Sometimes, I almost feel like I can hear them swearing...