One of my friends is a Syrian refugee, who was granted asylum in Sweden last year. I also want to try data analysis, so it fits that I should analyze something that’s relevant to my friend. This is my first ever analysis in pandas, apologies for code abomination in advance.

In this analysis, I use pandas for dataframe, numpy for dealing with numbers (because I need to count and do some math with it) and matplotlib for plotting graphs.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

The next step is to clean up the dataframe for further analysis. The steps are:

  • Read csv
  • Group by origin country and year resettled
  • Remove destination country column (because it’s the same value)
  • Remove non-integer values (because you can’t do math magic with it)
  • Convert year and value to integer (hello, math magic)
## data prep
df = pd.read_csv('unhcr_resettlement_residence_swe.csv')[1:]
df = df.groupby(['Origin', 'Year'], as_index=False).sum() # group by two columns
df = df.drop('Country / territory of asylum/residence', axis=1) # drop destination country column
df = df[(df != '*').all(1)] # remove any rows that has '*' value
df.Year = df.Year.astype(np.int64) # convert year to int
df.Value = df.Value.astype(np.int64) # convert value to int

df
OriginYearValue
0Afghanistan19847
1Afghanistan19854
2Afghanistan19864
3Afghanistan19871
4Afghanistan19881
5Afghanistan19912
6Afghanistan199218
7Afghanistan19971
8Afghanistan19985
9Afghanistan199916
10Afghanistan2000339
11Afghanistan2001270
12Afghanistan2002156
13Afghanistan2003244
14Afghanistan2004314
15Afghanistan2005183
16Afghanistan2006353
17Afghanistan2007185
18Afghanistan2008414
19Afghanistan2009318
20Afghanistan2010336
21Afghanistan2011404
22Afghanistan2012438
23Afghanistan2013219
24Afghanistan2014328
25Afghanistan2015222
26Afghanistan201620
27Albania19911
28Albania19921
29Albania20033
............
705Various/unknown20092
706Various/unknown20132
707Venezuela (Bolivarian Republic of)20154
708Viet Nam198476
709Viet Nam198548
710Viet Nam1986171
711Viet Nam1987232
712Viet Nam198894
713Viet Nam1990939
714Viet Nam1991656
715Viet Nam1992474
716Viet Nam1993197
717Viet Nam199432
718Viet Nam19954
719Viet Nam199721
720Viet Nam20021
721Viet Nam200410
722Viet Nam200610
723Viet Nam20092
724Viet Nam20106
726Yemen19921
727Yemen20041
728Yemen20054
729Yemen20061
730Zimbabwe20064
731Zimbabwe20081
732Zimbabwe20111
733Zimbabwe20147
734Zimbabwe20156
735Zimbabwe20169

725 rows × 3 columns

Since I want to plot a multiple line graph, I need to supply one dataframe per each line. This step is to create one dataframe per source country and clean it up. For example, if there is one year where no refugees are resettled, that year doesn’t exist in the dataframe, so I have to check whether the years are missing or not, and if missing, create it and set the value to 0.

## create one dataframe per one origin country
UniqueNames = df.Origin.unique()
DataFrameDict = {elem : pd.DataFrame for elem in UniqueNames}

for key in DataFrameDict.keys():
    DataFrameDict[key] = df[:][df.Origin == key]

def clean_up_dataframe(df):
    country = df.Origin.unique()[0]
    df = df.drop('Origin', axis=1)
    df.index = df.Year
    df = df.drop('Year', axis=1)
    df = df.rename(columns={'Value': country})

    df2 = pd.DataFrame({'Year':range(1983,2016+1), country:0}) # dummy dataframe
    df2.index = df2.Year
    df2[country] = df[country]
    df2 = df2.fillna(0)
    df2 = df2[country]

    return df2

And because Syria is in the Middle East, I want to focus in the MENA region (Middle East and North Africa). However, the list is too big, and I’ve yet to figure out how to make it look pretty. What I do instead is group countries into each subregion and plot them.

## orginal MENA, too big
UniqueNames_og_mena = ['Algeria', 'Bahrain', 'Djibouti', 'Egypt', 'Iran', 'Iraq', 'Israel', 'Jordan',
'Kuwait', 'Lebanon', 'Libya', 'Mauritania', 'Morocco', 'Oman', 'Palestine', 'Qatar',
'Sahrawi Arab Democratic Republic', 'Saudi Arabia', 'Somalia', 'Sudan', 'Syria', 'Tunisia',
'United Arab Emirates', 'Yemen', 'Afghanistan', 'Armenia', 'Azerbaijan', 'Chad', 'Comoros',
'Cyprus', 'Eritrea', 'Georgia', 'Mali', 'Niger', 'Pakistan', 'Turkey']

## MENA
UniqueNames_mena = ['Algeria', 'Bahrain', 'Djibouti', 'Egypt', 'Iran (Islamic Rep. of)', 'Iraq', 'Jordan',
               'Kuwait', 'Lebanon', 'Libya', 'Mauritania', 'Saudi Arabia', 'Somalia', 'Sudan',
               'Syrian Arab Rep.', 'Tunisia', 'Yemen', 'Afghanistan',
               'Armenia', 'Azerbaijan', 'Chad', 'Eritrea', 'Georgia', 'Pakistan', 'Turkey']

## LEVANT
UniqueNames_levant = [ 'Iraq', 'Jordan', 'Lebanon', 'Syrian Arab Rep.']

## NORTH AFRICA
UniqueNames_north_africa = ['Algeria', 'Djibouti', 'Egypt', 'Libya', 'Mauritania',  'Somalia', 'Sudan',
                'Tunisia', 'Chad', 'Eritrea']

def plot(region_name, region_list):
    df1 = clean_up_dataframe(DataFrameDict[region_list[0]])
    ax = df1.plot(figsize=(15,10))

    for i in region_list[1:]:
        df = clean_up_dataframe(DataFrameDict[i])
        df.plot(ax=ax)

    plt.xlabel('Year')
    plt.ylabel('Value')
    plt.title('Resettled Refugees in Sweden from {} Region Between 1983-2016'.format(region_name))
    ax.legend()

    plt.show()

## plot('All MENA', UniqueNames_og_mena) # list is too big
plot('MENA', UniqueNames_mena)

You can see that a lot of Iraqi refugees resettled between 1990-1995, which coincides with the Gulf War (1990-1).

plot('Levant', UniqueNames_levant)

This graph shows only refugees from the Levant region. As expected, a lot of Iraqis sought asylum during the 90’s, but Syrian refugees spiked up after 2010, which coincides with Arab Spring (2010-2).

plot('North Africa', UniqueNames_north_africa)

In North Africa, Somalian refugees spiked up around 2010, which is the result from non-functioning government, which resulted in rising clan wars. Additionally, you can see that there are a lot of Eritrean refugees too, from indefinite conscription. Families of those who fled the military are also targeted.