Hello and welcome to Open Citizen Data Science!
In our previous post we found out that using raw variables out of the box could lead to sub-optimal results even when a simple target is involved and the data source is of high quality and how adding a few simple metrics could greatly improve the result.
Our new sample obtained from the research in the previous post should be fairly homogeneous for our initial target (highly populated areas), but is it enough? Let's see:
Sunday, 27 January 2019
Sunday, 20 January 2019
Data Wrangling with Alteryx Part II: Understanding your data to avoid pitfalls
Hello and welcome to Open Citizen Data Science!
Following our previous article, we tried to optimize our search by finding the census areas with the largest population inside.
Searching the top 10 most populated places we find that our research covers areas that tend to have wildy different demographic profiles:
Apparently our search also found out that a church has over 4000 people registered as residents!
Some areas also seems to be actually highly populated just because the census area is actually pretty large, meaning that the population density is actually not as high as we hoped.
If our objective is to do market targeting for commercial purposes this is definitely not going to work, so we need to refine our search.
Following our previous article, we tried to optimize our search by finding the census areas with the largest population inside.
Searching the top 10 most populated places we find that our research covers areas that tend to have wildy different demographic profiles:
Apparently our search also found out that a church has over 4000 people registered as residents!
Some areas also seems to be actually highly populated just because the census area is actually pretty large, meaning that the population density is actually not as high as we hoped.
If our objective is to do market targeting for commercial purposes this is definitely not going to work, so we need to refine our search.
Sunday, 13 January 2019
Basic Data Wrangling and geo-visualization with Alteryx
Hello and welcome to Open Citizen Data Science!
Following up on our previous article we will now try to take a dataset and actually assess the value of the data within. The beauty of a census dataset is that it's about things that are relatable to us all: people and places.
We will start with the dataset we created and using the techniques in the previous articles we will now enrich our data with visual geographic information, which will allow us to actually see what and where the data we're analyzing actually is.
Following up on our previous article we will now try to take a dataset and actually assess the value of the data within. The beauty of a census dataset is that it's about things that are relatable to us all: people and places.
We will start with the dataset we created and using the techniques in the previous articles we will now enrich our data with visual geographic information, which will allow us to actually see what and where the data we're analyzing actually is.
Sunday, 6 January 2019
Data Preparation using ISTAT Open Data with Alteryx
Hello and welcome to Open Citizen Data Science!
Every Data Science project starts with Data Preparation.
While it may be considered a fairly mundane task, it is instead a fundamental step to ensure proper and coherent results and avoid inconsistent performance in your future data modeling.
As a first example, we will start with a very user-friendly data set, the 2011 Italian Census: ISTAT Open Data.
This is a solid data source, with data that has been collected and verified by statisticians, formatted as text tables. It is also a great place to start for Data Science beginners as it uses a great variety of structured data about something that can be easily related with: people and geography.
Every Data Science project starts with Data Preparation.
While it may be considered a fairly mundane task, it is instead a fundamental step to ensure proper and coherent results and avoid inconsistent performance in your future data modeling.
As a first example, we will start with a very user-friendly data set, the 2011 Italian Census: ISTAT Open Data.
This is a solid data source, with data that has been collected and verified by statisticians, formatted as text tables. It is also a great place to start for Data Science beginners as it uses a great variety of structured data about something that can be easily related with: people and geography.
Wednesday, 2 January 2019
5 Myths about Data Scientists
Hello and welcome to Open Citizen Data Science!
It's 2019 and while Data Science is starting to become more understood by businesses there are still many myths about what a Data Scientist can do.
Do you remember 1980s and early 1990s movies that featured programmers or hackers? They appeared like magicians that could do the impossible just by having a computer handed to them.
Likewise, today we live in an era where Machine Learning is enabling self-driving cars and voice assistants, making some of the myths of the 1980s closer to reality:
It's 2019 and while Data Science is starting to become more understood by businesses there are still many myths about what a Data Scientist can do.
Do you remember 1980s and early 1990s movies that featured programmers or hackers? They appeared like magicians that could do the impossible just by having a computer handed to them.
Likewise, today we live in an era where Machine Learning is enabling self-driving cars and voice assistants, making some of the myths of the 1980s closer to reality:
The old Knightrider series is starting to become close to reality: "Alexa, drive my car here" is a plausible scenario today |
Subscribe to:
Posts (Atom)