Master data maintenance is a time-consuming activity for many businesses. Companies like retailers selling large amounts of different articles or manufacturing companies processing raw materials into finished goods can easily collect databases containing hundreds of thousands of master data items, which in turn may possess many hundreds of attributes. This blog post proposes a way of populating these attributes by applying a transformer-based Large Language Model.
Read More ›
In my previous blog post on reinforcement learning I demonstrated a way to get a gentle introduction into this field by using Keras-RL2. While writing that I found it quite difficult to get an overview of the many reinforcement learning frameworks available today which all have different levels of maturity. In this post I will dive into how to set up a reinforcement learning experiment using stable-baselines 3 which provides you an even quicker way to get started.
Read More ›
I recently had written a Python program to perform some performance calculations over the results from a regression model. The program was not too complex but the sheer number of calculations to be performed made it quite slow and resulted in a runtime of several hours. In my search for a way to utilize the multiple cores of the machine the program was running on I came across the Ray package from https://www.ray.io/ which makes it very easy to distribute a single-core process onto multiple cores. Read More ›
In this post I will present a fast solution for loading CSV data into HANA by using Node.js. In the last year I have been working with Node.js as a generic command line programming tool I have noticed that as it has been optimized for fast I/O many problems are really faster when converted into JavaScript and run in Node.js. Read More ›
Back in 2014 the city of New York put online a dataset with yellow cab rides comprising a full year of data. Back then I remember struggling quite a bit with managing the sheer volume of the dataset involved, trying out various alternatives for reading in the full dataset. After a few years SAP introduced an “Express edition” of their HANA in-memory database which allowed you to run a 32 GB database just from your own hardware. That was enough to load a full years’ worth of data and be able to analyze it using a standard SQL approach. Read More ›