Simple Multicore programming in Python

I recently had written a Python program to perform some performance calculations over the results from a regression model. The program was not too complex but the sheer number of calculations to be performed made it quite slow and resulted in a runtime of several hours. In my search for a way to utilize the multiple cores of the machine the program was running on I came across the Ray package from https://www.ray.io/ which makes it very easy to distribute a single-core process onto multiple cores. Read More ›

Bulk import of CSV data into HANA using Node.js and xargs

In this post I will present a fast solution for loading CSV data into HANA by using Node.js. In the last year I have been working with Node.js as a generic command line programming tool I have noticed that as it has been optimized for fast I/O many problems are really faster when converted into JavaScript and run in Node.js. Read More ›

Timelapse data exploration of NYC taxi rides

Back in 2014 the city of New York put online a dataset with yellow cab rides comprising a full year of data. Back then I remember struggling quite a bit with managing the sheer volume of the dataset involved, trying out various alternatives for reading in the full dataset. After a few years SAP introduced an “Express edition” of their HANA in-memory database which allowed you to run a 32 GB database just from your own hardware. That was enough to load a full years’ worth of data and be able to analyze it using a standard SQL approach. Read More ›

Explainable Forecasting Using HANA and APL

This is part 2 in a two-part series of blogs on large-scale and explainable forecasting using APL. In part 1 I have outlined a way to utilize the APL library for in-database training of a regression model in HANA in order to be used together with an external Node.js inference script. In this part of the blog I will dive deeper into built-in functionality to retrieve insights into a trained model which is called the ‘model debrief’. Read More ›

Large-scale Forecasting Using HANA, APL and Node.js

Last year I became involved in a project for a retailer based in The Netherlands who had finished construction of a new distribution center. This new innovative DC is fully mechanized and is operated with a minimal amount of personnel, which is different from conventional distribution centers where sudden large order spikes are fulfilled by having more staff pick these orders in parallel. This blog post describes my work of developing a large-scale machine learning model to forecasting the goods movements between various parts of the supply chain. Read More ›