Engineering Data Solutions

👋 Welcome to my page!

I am Jonathan, a Data Engineer based in Munich, Germany. Here I share little snippets of things that I learn, find interesting, or worth discussing.

I will post mostly about topics that relate to the practice of data engineering. These might be centered around Microsoft Azure, Linux, and open source technology such as Airflow, DuckDB, or dlt. I also plan to publish posts around data governance, management, and strategy as I firmly believe that data (engineering) projects are most successful, if they are truly part of an organization’s broader business strategy.

At some point you’ll find some of my personal projects here too (once those are ready to share 😉) - from a fun facts CLI and a recipe management web server written in Go to a data analytics platform for my personal finances.

You can find my socials and recent posts below and check the “About” page in case you’d like to know more about me!

Exploring Project Nessie - a transactional catalogue over iceberg tables

This week, I dove into Project Nessie - an open-source transactional data catalogue for Apache Iceberg tables. I’d heard about Nessie’s git-like semantics and was curious about its potential for better managing data versioning and auditability in my projects. Docker compose setup for Nessie Server and CLI To experiment locally, I leveraged Docker, conveniently supported by a guide provided by the Nessie team. Following their materials, I put together a straightforward Docker Compose file that neatly places both the Nessie server and CLI into the same Docker network. This setup greatly simplifies communication between the containers. ...

Azure Data Factory Data Flow - Oddity of the week

This week, I stumbled into an interesting puzzle while investigating an issue for a client. They had recently transitioned from System A to System B, and one of their critical metrics suddenly showed significant discrepancies. Business-wise, these numbers were expected to remain identical, so naturally, it called for some digging. I started by checking the basics: Source API: Was the new system feeding incorrect data? No issues there. Transformed Reporting Tables: Were calculations or transformations misconfigured? Again, everything seemed correct. ...

How I studied for the AZ-104 exam

I recently obtained the AZ-104 Microsoft Certified: Azure Administrator Associate certification. This is my second Azure certification after the Data Engineer Associate from last year. I am delibaretly focusing on the Azure platform as I believe deep knowledge on one cloud provider is more beneficial compared to shallow knowledge across multiple providers. And, all my clients over the last two years have used Azure. For this certification I changed up my study routine a bit and found it really helpful. It has taken me less time this time around and I also felt more confident in my abilities. Therefore I wanted to take a quick note and reflect on the resources I used and methods I tried to follow. ...

February 25, 2025 3 min

Azure DNS aliases can reference other Azure resources

Azure DNS is Microsoft’s hosting service for DNS domains. It allows users to manage their infrastructure and related related domain information in one central place. This alone could be a decent argument for Azure DNS. But one of the key benefits is the tight integration with Azure’s resources: Azure DNS aliases provide dynamic references to Azure resources and can be created at the zone apex level. This provides key benefits such as automatic updates of the DNS record set when an underlying IP address changes thus preventing dangling DNS records, load balancing of the apex domain, and direct reference to Azure Content Delivery Network endpoints. ...

February 3, 2025 1 min

Short rundown of Azure blob storage's access tiers

There are different access tiers for blobs residing in Azure Storage. These tiers include Hot Cool Cold Archive The storage tier is set on the storage account level. Options differ with respect to storage and access cost, minimum storage duration, and latency (time until data is retrieved) and are thus suited to different scenarios. Hot This tier is optimized for frequent read or writes. It is thus suited for actively used data. It incurs the highest storage cost, but lowest access cost and is the default option when creating storage accounts. ...