Sunday, 17 February 2019

Digging deeper in the census data part 4: Jobs and Unemployment

Hello and welcome to Open Citizen Data Science!



In our previous post we dealt with educational levels and discovered how outliers can be more frequent than expected, making some areas hard to analyze for many variables.

Following educational levels, employment is another extremely important set of information.



Let's do a recap of the available fields:


- Popolazione residente - totale di 15 anni e più appartenente alle forze di lavoro totale
- Popolazione residente - totale di 15 anni e più occupata (FL)
- Popolazione residente - totale di 15 anni e più disoccupata in cerca nuova occupazione
- Popolazione residente - totale di 15 anni e più non appartenente alle forze di lavoro (NFL)
- Popolazione residente - totale di 15 anni e più casalinghi/e
- Popolazione residente - totale di 15 anni e più studenti
- Popolazione residente - totale di 15 anni e più in altra condizione
- Popolazione residente che si sposta giornalmente nel comune di dimora abituale
- Popolazione residente che si sposta giornalmente fuori del comune di dimora abituale
- Popolazione residente - totale di 15 anni e più percettori di reddito da lavoro o capitale


Let's assume employment is highly correlated with education so it makes sense to use many of our previous filters and take into account retirement age as well.

The census is also helpful in this as it gives us some data in how the working age population is split although the information is far from detailed so we'll have to do some further approximation.

To do so, we will take the total population and exclude the following:

- everyone below 15 and over 65 years old: while some population is still employed after 65 (especially after 2012 pension reform), in 2011 it was not that many people and it's also balanced out by the people that chose to retire before 65.

- People in the census that declared to be homeworkers and consider that their own job

- People in the census that declared to be students

- People in the census that declared to be in "other" condition, which could be inability to work

The Census already seems to do this with the  "Popolazione residente - totale di 15 anni e più appartenente alle forze di lavoro totale" field, so let's do a quick test and see how our assumption works:


Less than 2% difference in average, so we can reliably use the field to establish the population that is employable.
Now that we determined that we have a field telling us the active population, let's see the extremes as usual. Rather than taking the top segment, let's see the largest one in the top 10:

A residence, located on the outskirts of Milano's urban area. It's likely workers use this place as temporary residence

What about the opposite?

A retirement home. The population there is definitely not active.
The data seems reliable enough, but not exactly an interesting value alone. Perhaps seeing how many people are considered employed over the employable ones could be more interesting.

Let's see again the most popolous among the top 10 areas with highest employed people over employable population:

Rural area, likely dedicated to farming
 What about the least employed?

Does this look familiar?
 Looks like we need to filter out areas that have no employable population, let's see what happens:

An hostel, 14 residents, 6 of which employable
This is tricky. 6 people are employable, they reside at an hostel, yet none declares to work. What looks like an anomaly is that there are 4 people declaring to be perceiving an income. Given that they are unemployed they must be earning money from capital, not exactly likely for people living in an hostel.

Let's try to filter out areas where people that earns money are more than employed people:

Not the nicest building we've seen
 This is more believable, but this filtering costed us another million and half people and 10 thousand square kilometers, leaving just above 51 million people available for analysis.

We now have good data about workers, what about earners? Let's see earners VS workers:


Town outskirts, rural area
100% of employable people are employed and the area looks pretty well-off.
What about the opposite?

Not the nicest building we've seen... And apparently no paid workers living here

1 worker than declares no pay, no people on retirement age, it's definitely an outlier but the only outstanding item is that there are 4 people that declares to be moving within the city (for work? It's in the same section) every day despite only one being employed.

There are two fields related to people moving (both in and out of the city), trouble is that the sum of them is almost always bigger than the sum of employed people so it appears that either one could answer both or the data here is exceedingly dirty.
In an attempt to make a sane filter, even trying to just leave out the areas where there either value is greater than the number of actively employed people rather than the sum reveals too many areas left out so we must declare this data not useful for information about workers.

This fluke about the data could be explained by some assumptions (people illegally employed not wanting to declare about working, non-working people erroneously compiling the field or some confusion about self-employed people) but as not enough detail is contained in the data we have no way to verify that.

What about our earlier assumption about education levels and employment?
We  can calculate an Education index by taking % of title obtained VS eligible population and giving different higher weight to higher title in the following way:



This index calculated on our filtered population appears to have a correlation of 0.2176, significant but not very strong. This is explainable by the fact that students are also included in the index calculation and we have no way to know the education level of just the employable people.

This concludes this article, stay tuned for our next article on ethnic distribution!

No comments:

Post a Comment