Sunday 20 October 2019

Twilight of the unicorn Data Scientist: how the sexiest job of this decade is going to change


Hello and welcome to Open Citizen Data Science!

After a very interesting two days full-immersion in the world of citizen data science at Alteryx Inspire in London, some trends are starting to become clear.



Thursday 26 September 2019

6 Red Flags to avoid for a successful partnership with Data Science Consulting firms

Hello and welcome to Open Citizen Data Science!

 In many industries Data Science is still a relatively new field and as such it's likely that hiring a consultant to try new ways to harness the power of available data is seen as a relatively low-risk path.



Sunday 21 July 2019

Extracting Openstreetmap features with Alteryx

Hello and welcome to Open Citizen Data Science!

Today we will see how to harness the power of Open Street Map for geolocalized analysis:

Sunday 24 March 2019

Estimating Census Area income using open data and Alteryx Part I: Influencing variables and Correlation VS Random Forest

Hello and welcome to Open Citizen Data Scientist!

Today we will see possible ways to estimate which parts of a city are relatively better or worse off than average by using the census variables we gathered in our previous post along with available income open data on a city basis.



Sunday 17 March 2019

Digging deeper in the census data part 7: Building type and status

Hello and welcome to Open Citizen Data Science!

Today we will look into the last part of the census data, regarding building types (residential VS non residential), their composition and their status.
This last part is pretty important as it describes the physical landscape in many ways, allowing us to determine if we're looking at an old hamlet or part of an urban sprawl for example.

Let's take for example the center of the area with the most buildings:

Not what one could expect

Sunday 3 March 2019

Digging deeper in the census data part 6: Building Usage

Hello and welcome to Open Citizen Data Science!

Today we will explore building usage by residents, plus occupation and ownership rates.



Sunday 24 February 2019

Digging deeper in the census data part 5: Ethnical Distribution

Hello and welcome to Open Citizen Data Science!


In our previous post we dealt with jobs and unemployment, which along with education levels showed some strong influence over socio-economic conditions.

Today we will deal with census data about something that has recently become an hot topic in many countries: immigration and how it changes the ethnic make-up of many areas.

Sunday 17 February 2019

Digging deeper in the census data part 4: Jobs and Unemployment

Hello and welcome to Open Citizen Data Science!



In our previous post we dealt with educational levels and discovered how outliers can be more frequent than expected, making some areas hard to analyze for many variables.

Following educational levels, employment is another extremely important set of information.


Sunday 10 February 2019

Digging deeper in the census data part 3: Determining Education Levels

Hello and welcome to Open Citizen Data Science!

In this article we will look into the census variables related to educational levels.
Strictly related to the demographic segments we treated in our previous article, education is an important variable in defining demographic segments, especially in Italy where university graduates are relatively fewer compared to EU averages.


Sunday 3 February 2019

Digging deeper in the census data part 2: Age segmentation

Hello and welcome to Open Citizen Data Science!

In this article we will look into the census variables related to population age.
Knowing if a neighbourhood is populated mostly by working age people opposed to pensioners could make a world of difference depending on what is being researched on.

Not only the general tastes of different generations changes but also the kind of services that are required. A lack of schools near a zone with an high percentage of pre-schoolers for example could indicate both a potential business niche for private child care or a place in troubled socio-economic status.

Sunday 27 January 2019

Digging deeper in the census data: analizing the available data for demographic segmentation

Hello and welcome to Open Citizen Data Science!

In our previous post we found out that using raw variables out of the box could lead to sub-optimal results even when a simple target is involved and the data source is of high quality and how adding a few simple metrics could greatly improve the result.

Our new sample obtained from the research in the previous post should be fairly homogeneous for our initial target (highly populated areas), but is it enough? Let's see:

Sunday 20 January 2019

Data Wrangling with Alteryx Part II: Understanding your data to avoid pitfalls

Hello and welcome to Open Citizen Data Science!

Following our previous article, we tried to optimize our search by finding the census areas with the largest population inside.
Searching the top 10 most populated places we find that our research covers areas that tend to have wildy different demographic profiles:


Apparently our search also found out that a church has over 4000 people registered as residents!
Some areas also seems to be actually highly populated just because the census area is actually pretty large, meaning that the population density is actually not as high as we hoped.
If our objective is to do market targeting for commercial purposes this is definitely not going to work, so we need to refine our search.

Sunday 13 January 2019

Basic Data Wrangling and geo-visualization with Alteryx

Hello and welcome to Open Citizen Data Science!

Following up on our previous article we will now try to take a dataset and actually assess the value of the data within. The beauty of a census dataset is that it's about things that are relatable to us all: people and places.

We will start with the dataset we created and using the techniques in the previous articles we will now enrich our data with visual geographic information, which will allow us to actually see what and where the data we're analyzing actually is.

Sunday 6 January 2019

Data Preparation using ISTAT Open Data with Alteryx

Hello and welcome to Open Citizen Data Science!

Every Data Science project starts with Data Preparation.
While it may be considered a fairly mundane task, it is instead a fundamental step to ensure proper and coherent results and avoid inconsistent performance in your future data modeling.

As a first example, we will start with a very user-friendly data set, the 2011 Italian Census: ISTAT Open Data. 

This is a solid data source, with data that has been collected and verified by statisticians, formatted as text tables. It is also a great place to start for Data Science beginners as it uses a great variety of structured data about something that can be easily related with: people and geography.

Wednesday 2 January 2019

5 Myths about Data Scientists

Hello and welcome to Open Citizen Data Science!

It's 2019 and while Data Science is starting to become more understood by businesses there are still many myths about what a Data Scientist can do.

Do you remember 1980s and early 1990s movies that featured programmers or hackers? They appeared like magicians that could do the impossible just by having a computer handed to them.

Likewise, today we live in an era where Machine Learning is enabling self-driving cars and voice assistants, making some of the myths of the 1980s closer to reality:

The old Knightrider series is starting to become close to reality: "Alexa, drive my car here" is a plausible scenario today