Logging means to track what events happens when you run some software

https://www.codemotion.com/magazine/dev-hub/big-data-analyst/logging-in-python-a-broad-gentle-introduction/

In Python the easiest way to do that is to use print() statements when your code needs to perform actions that you want to track. If you need some more advanced logging (i.e. save the logs in a file or in a database table) the easiest way is to use logging

Another point to consider is: when do you have to use logging vs raise an exception? In a nutshell as always it depends on what you have to do. How the software is expected to behave? Is the error so critical that further steps cannot/should not be done? In…

Let’s create a calendar using Python and QlikView

QlikView document

It’s often needed to create calendar based reports so it would be nice to have a calendar already in place and ready to copy/paste into your QlikView reports script

The idea is to use Python to generate a CSV file containing the holidays in the required timeframe and then I will use QlikView to ETL the data and create the report

Since I am currently based in Italy I will consider the working days/non working days based on Italian legislation, for example for year 2020:

  • New Year’s Day (Capodanno) — 01/01/2020
  • Epiphany…

Quick and easy python script to convert images to PDF

convert images to PDF

Many times you have to create a PDF starting from some images. The idea is to run an .exe file from any folder on the PC that will take as input some images filename and it will create a PDF file. The order of the images in the PDF will be the same in which the images are passed to the .exe. By default the output filename will be merge.pdf

Let’s have a look at the code:

Python script

Basically the steps are:
1 taking the images filename as input
2 open…

Let‘s use Dogs vs Cats Kaggle dataset to try out GCP AutoML

https://mc.ai/build-a-machine-learning-model-on-cloud-using-google-automl/

This is my first project with Google Auto ML and I was very curious to try it out because I have seen a lot of interesting post here on Medium about it. For example I suggest you to read this very interesting post from Sriram Gopal where he explains all the steps to approach a similar project using Google AutoML

I decided to use the Dogs vs. Cats dataset from Kaggle. The goal is classify whether images contain either a dog or a cat

Let’s start by using the CRISP-DM Process (Cross Industry Process for Data Mining):

  1. Business Understanding
  2. Data…

Django application to keep track of working hours and paychecks

Application homepage

This time no ML related cool stuff but just an old good management application. The motivation for this project came from the fact that I just wanted to get rid of a bunch of Excel spreadsheets I was using to do this tasks

Business Understanding

The idea is to have a calendar representation for the working day/non working days, specific working hours type and the paychecks

I am referring to the italian legislation regarding working day/non working days and the types of working hours:

Day Types:

  • Working Day — WOD
  • Non Working Day — NWD

Non working days in Italy…

Udacity Data Scientist Nanodegree Program Project

https://en.wikipedia.org/wiki/Oenothera_speciosa

This project is part of the Udacity Data Scientist Nanodegree Program: Image Classifier Project and the goal was to apply Deep learning techniques to train a image classifier to recognize different species of flowers

Let’s start by using the CRISP-DM Process (Cross Industry Process for Data Mining):

  1. Business Understanding
  2. Data Understanding
  3. Prepare Data
  4. Data Modeling
  5. Evaluate the Results
  6. Deploy

Business Understanding

Image classification is a pretty common task nowadays and it consists in taking an image and some classes as input and outputting a probability that the input image belongs to one or more of the given classes. About this…

Analyse COVID-19 data with Microsoft Power BI

https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Italy

As always let’s apply CRISP-DM Process (Cross Industry Process for Data Mining) to tackle the problem:

  1. Business Understanding
  2. Data Understanding
  3. Prepare Data
  4. Data Modeling
  5. Evaluate the Results
  6. Deploy

Business Understanding

The goal is to get some insights about the spreading of the COVID-19 in Italy

Epidemics are often modelled as Logistic function. More informations about this can be easily found online. I want to recommend this really interesting post on Wired and this Youtube video which focuses on the mathematics behind this modelling assumption

Visualize Data with Power BI

https://www.open.online/wp-content/uploads/2019/04/istat-litalia-e-fuori-dalla-recessione-nel-primo-trimestre-pil-a-02.png

The ISTAT (Italian National Institute of Statistics) is the main producer of official statistics in Italy including the census of population, economic censuses and a number of social and environmental surveys and analyses

As always let’s apply CRISP-DM Process (Cross Industry Process for Data Mining) to tackle the problem:

  1. Business Understanding
  2. Data Understanding
  3. Prepare Data
  4. Data Modeling
  5. Evaluate the Results
  6. Deploy

Business Understanding

The goal is to get some insights about the regional distribution of population in Italy and how it is changed by year

Data Understanding

ISTAT let you select from different data topics: in this case we are…

Udacity Data Scientist Nanodegree Program Project

https://www.figure-eight.com/dataset/combined-disaster-response-data/

This project is part of the Udacity Data Scientist Nanodegree Program: Disaster Response Pipeline Project and the goal was to apply the data engineering skills learned in the course to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages. As always let’s apply CRISP-DM Process (Cross Industry Process for Data Mining) to tackle the problem:

  1. Business Understanding
  2. Data Understanding
  3. Prepare Data
  4. Data Modeling
  5. Evaluate the Results
  6. Deploy

Business Understanding

During and immediately after natural disaster there are millions of communication to disaster response organizations either direct or through social media. Disaster response…

Udacity Data Scientist Nanodegree Program Project

https://www.ledgerinsights.com/wp-content/uploads/2019/09/charity-810x476.jpg

This project is part of the Udacity Data Scientist Nanodegree Program: Finding Donors for CharityML Project and the goal was to apply Supervised learning techniques on data collected for the U.S. census to help a fictitious charity organization CharityML to identify people who would most likely donate to their cause

Let’s start by using the CRISP-DM Process (Cross Industry Process for Data Mining):

  1. Business Understanding
  2. Data Understanding
  3. Prepare Data
  4. Data Modeling
  5. Evaluate the Results
  6. Deploy

Business Understanding

CharityML is a fictitious charity organization that wants to expand their potential donor base by sending letters to residents of the region where…

Simone Rigoni

Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store