Overview

This analysis explores global health and economic indicators sourced from the World Health Organization’s Global Health Observatory. The dataset covers 222 countries with the following variables:

Variable Description
country Country name
code Country code
gni Gross National Income per capita (USD)
skulspend Education expenditures per capita
death Death rate per 1,000
healthexp Health expenditures per capita
unemp Unemployment rate
water % population with access to clean water
skul % enrolled in secondary school
maternal Maternal death rate per 100,000 live births

Setup

library(tidyverse)
library(dplyr)
library(ggplot2)
library(knitr)
library(psych)
library(janitor)

Load Data

ind <- read_csv('data/globalind.csv', col_names = TRUE)
kable(head(ind, 10), caption = "First 10 rows of the Global Health Indicators dataset")
First 10 rows of the Global Health Indicators dataset
country code gni skulspend death healthexp unemp water skul maternal
Afghanistan AFG 500 2.86 7.11 80.29 11.71 75.09 NA 620
Albania ALB 5270 3.10 10.79 NA 13.07 95.07 94.28 8
Algeria DZA 3610 7.04 5.40 214.85 12.25 94.44 NA 78
American Samoa ASM NA NA 5.60 NA NA 99.77 NA NA
Andorra AND NA NA NA 3336.92 NA 100.00 NA NA
Angola AGO 1690 2.42 7.82 50.74 10.35 57.17 NA 222
Antigua and Barbuda ATG 15010 3.45 5.96 830.34 NA NA NA 21
Argentina ARG 9010 5.02 8.51 863.71 11.46 NA 110.13 45
Armenia ARM 4470 2.71 12.79 551.54 12.18 99.97 87.25 27
Aruba ABW 24840 NA 9.69 NA NA NA NA NA
str(ind)
## spc_tbl_ [222 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ country  : chr [1:222] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##  $ code     : chr [1:222] "AFG" "ALB" "DZA" "ASM" ...
##  $ gni      : num [1:222] 500 5270 3610 NA NA ...
##  $ skulspend: num [1:222] 2.86 3.1 7.04 NA NA 2.42 3.45 5.02 2.71 NA ...
##  $ death    : num [1:222] 7.11 10.79 5.4 5.6 NA ...
##  $ healthexp: num [1:222] 80.3 NA 214.9 NA 3336.9 ...
##  $ unemp    : num [1:222] 11.7 13.1 12.2 NA NA ...
##  $ water    : num [1:222] 75.1 95.1 94.4 99.8 100 ...
##  $ skul     : num [1:222] NA 94.3 NA NA NA ...
##  $ maternal : num [1:222] 620 8 78 NA NA 222 21 45 27 NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   country = col_character(),
##   ..   code = col_character(),
##   ..   gni = col_number(),
##   ..   skulspend = col_double(),
##   ..   death = col_double(),
##   ..   healthexp = col_number(),
##   ..   unemp = col_double(),
##   ..   water = col_double(),
##   ..   skul = col_double(),
##   ..   maternal = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

The dataset contains 222 observations across 10 variables — 2 character fields (country, code) and 8 numeric indicators.

Data Exploration

Filtering by Income Level

Countries with GNI per capita exceeding $50,000, sorted by education spending:

ind %>%
  select(country, code, gni, skulspend) %>%
  filter(gni >= 50000) %>%
  arrange(desc(skulspend)) %>%
  kable(caption = "High-income countries ranked by education spending")
High-income countries ranked by education spending
country code gni skulspend
Iceland ISL 62110 7.72
Sweden SWE 54830 7.17
Denmark DNK 62710 6.38
Australia AUS 53620 6.10
United States USA 64650 6.05
Norway NOR 78610 5.90
Finland FIN 50060 5.88
North America NAC 62515 5.61
Netherlands NLD 50170 5.30
Switzerland CHE 82490 5.22
Luxembourg LUX 79580 4.97
Qatar QAT 58420 3.23
Ireland IRL 65330 3.10
Singapore SGP 55260 2.51
Bermuda BMU 111800 1.87
Cayman Islands CYM 62420 NA
Faroe Islands FRO 64250 NA
Isle of Man IMN 79300 NA

Key Questions

Which country has the highest health expenditure per capita?

ind %>%
  select(country, healthexp) %>%
  arrange(desc(healthexp)) %>%
  head(10) %>%
  kable(caption = "Top 10 countries by health expenditure per capita")
Top 10 countries by health expenditure per capita
country healthexp
United States 11702.41
North America 11076.74
Switzerland 10309.76
Norway 7704.44
Luxembourg 6757.00
Denmark 6438.39
Ireland 6092.18
Sweden 6027.97
Germany 5930.33
Australia 5901.11

How many countries have unemployment rates above 10%?

high_unemp <- ind %>%
  filter(unemp > 10)

cat("Number of countries with unemployment > 10%:", nrow(high_unemp))
## Number of countries with unemployment > 10%: 56

Among countries with 100% clean water access, which has the lowest health expenditure?

ind %>%
  select(country, water, healthexp) %>%
  filter(water == 100) %>%
  arrange(healthexp) %>%
  head(10) %>%
  kable(caption = "Countries with 100% clean water access, ranked by health expenditure")
Countries with 100% clean water access, ranked by health expenditure
country water healthexp
Thailand 100 305.09
Turkmenistan 100 483.74
Romania 100 809.59
Tuvalu 100 1071.31
Bahrain 100 1110.00
Nauru 100 1143.74
Hungary 100 1163.25
Chile 100 1278.18
Kuwait 100 1532.56
Greece 100 1675.12

Summary Statistics

# Using the psych package for comprehensive descriptive statistics
describe(ind[3:10]) %>%
  kable(digits = 2, caption = "Descriptive statistics for all numeric indicators")
Descriptive statistics for all numeric indicators
vars n mean sd median trimmed mad min max range skew kurtosis se
gni 1 200 15296.93 20321.32 6055.00 10949.01 7168.37 220.00 111800.00 111580.00 1.92 3.47 1436.93
skulspend 2 161 4.65 1.94 4.55 4.50 1.81 0.00 13.63 13.63 1.21 3.54 0.15
death 3 218 8.11 2.87 7.52 7.88 2.31 1.22 18.00 16.78 0.80 0.98 0.19
healthexp 4 189 1253.79 2073.63 388.55 760.57 495.74 16.42 11702.41 11685.99 2.59 7.33 150.83
unemp 5 191 8.27 5.64 6.58 7.54 4.17 0.14 28.05 27.91 1.17 0.93 0.41
water 6 201 89.36 15.32 97.01 92.49 4.43 37.20 100.00 62.80 -1.55 1.34 1.08
skul 7 121 93.90 25.25 100.37 95.05 18.13 31.42 157.70 126.28 -0.39 0.23 2.30
maternal 8 189 143.00 207.71 60.00 98.84 78.58 1.00 1223.00 1222.00 2.43 6.99 15.11

Notable observations: - The mean GNI and median GNI may diverge significantly, indicating income inequality across countries. - Missing data is present across multiple indicators — handled via na.rm = TRUE in calculations.

Correlation Analysis

Correlation Matrix

cor_matrix <- round(cor(ind[3:10], use = "pairwise.complete.obs"), 2)
kable(cor_matrix, caption = "Correlation matrix of global health indicators")
Correlation matrix of global health indicators
gni skulspend death healthexp unemp water skul maternal
gni 1.00 0.11 0.01 0.93 -0.19 0.46 0.54 -0.41
skulspend 0.11 1.00 0.00 0.18 0.14 0.20 0.31 -0.25
death 0.01 0.00 1.00 0.12 0.03 0.00 0.15 0.08
healthexp 0.93 0.18 0.12 1.00 -0.12 0.41 0.49 -0.34
unemp -0.19 0.14 0.03 -0.12 1.00 0.05 0.10 -0.02
water 0.46 0.20 0.00 0.41 0.05 1.00 0.73 -0.78
skul 0.54 0.31 0.15 0.49 0.10 0.73 1.00 -0.74
maternal -0.41 -0.25 0.08 -0.34 -0.02 -0.78 -0.74 1.00

Key findings from the correlation matrix: - GNI and health expenditure show the strongest positive correlation - Education spending and school enrollment are positively correlated - Maternal death rate tends to be negatively correlated with income and health spending

Visualizations

Distribution of Key Variables

par(mfrow = c(1, 2))
hist(ind$gni, main = "Distribution of GNI per Capita",
     xlab = "GNI (USD)", col = "steelblue", border = "white")
hist(ind$healthexp, main = "Distribution of Health Expenditure",
     xlab = "Health Expenditure (USD)", col = "coral", border = "white")

Both distributions exhibit a strong positive skew — most countries cluster at lower values with a long tail toward wealthier nations. A log transformation can normalize these distributions:

par(mfrow = c(1, 2))
hist(log10(ind$gni), main = "Log-transformed GNI",
     xlab = "log10(GNI)", col = "steelblue", border = "white")
hist(log10(ind$healthexp), main = "Log-transformed Health Expenditure",
     xlab = "log10(Health Exp)", col = "coral", border = "white")

Scatter Plots

plot(ind$gni, ind$healthexp,
     main = "GNI vs. Health Expenditure per Capita",
     xlab = "Gross National Income per Capita (USD)",
     ylab = "Health Expenditure per Capita (USD)",
     pch = 19,
     col = rgb(0.2, 0.4, 0.6, 0.6),
     cex = 1.2)
grid()

The scatter plot confirms the strong positive relationship between national income and health spending.

Multi-variable Scatter Plot Matrix

pairs(~ gni + healthexp + skul, data = ind,
      main = "Scatter Plot Matrix: GNI, Health Expenditure, School Enrollment")

Schools Dataset Analysis

The California schools dataset provides school-level data including poverty rates, student demographics, and academic performance.

schools <- read_csv('data/schools.csv', col_names = TRUE)
describe(schools[5:19]) %>%
  kable(digits = 2, caption = "Descriptive statistics for schools dataset")
Descriptive statistics for schools dataset
vars n mean sd median trimmed mad min max range skew kurtosis se
Students 1 503 564.11 219.24 550.00 551.95 223.87 137.00 1602.00 1465.00 0.70 1.06 9.78
p_minority 2 503 48.25 32.06 42.18 47.20 37.92 0.18 100.00 99.82 0.35 -1.24 1.43
Diversity 3 498 2.97 1.71 3.37 3.07 2.11 0.04 5.00 4.96 -0.39 -1.32 0.08
teachers 4 503 33.55 10.79 33.00 32.91 10.38 11.00 76.80 65.80 0.60 0.45 0.48
spendrat 5 503 3305.50 705.92 3190.00 3230.12 510.01 199.00 6802.00 6603.00 1.25 4.67 31.48
avgsal 6 503 31698.44 2849.83 31681.00 31748.12 3208.35 21009.00 38630.00 17621.00 -0.19 -0.28 127.07
teachrat 7 503 16.56 2.41 16.80 16.61 2.22 9.60 28.50 18.90 0.01 1.44 0.11
p_5yearsexp 8 503 21.97 7.81 21.60 21.64 8.01 3.50 47.20 43.70 0.44 0.11 0.35
poorpct 9 503 47.23 30.31 42.75 46.77 36.40 0.00 99.54 99.54 0.19 -1.19 1.35
Language limit 10 503 13.42 17.99 5.26 9.50 7.25 0.00 80.80 80.80 1.81 2.59 0.80
p_gifted 11 503 8.26 7.36 7.01 7.43 5.93 0.00 77.25 77.25 4.09 30.82 0.33
p_special 12 503 11.66 4.65 11.15 11.27 4.18 1.96 28.43 26.47 0.84 0.91 0.21
p_moved 13 503 20.33 9.95 19.55 19.74 8.66 0.00 69.14 69.14 0.78 1.77 0.44
p_passed 14 503 69.61 17.17 72.92 70.78 18.22 19.79 99.02 79.23 -0.56 -0.49 0.77
attendance 15 503 93.73 15.90 96.60 96.45 0.74 0.00 98.40 98.40 -5.70 30.63 0.71
round(cor(schools[5:19], use = "pairwise.complete.obs"), 2) %>%
  kable(caption = "Correlation matrix for schools variables")
Correlation matrix for schools variables
Students p_minority Diversity teachers spendrat avgsal teachrat p_5yearsexp poorpct Language limit p_gifted p_special p_moved p_passed attendance
Students 1.00 0.31 -0.03 0.94 -0.36 0.26 0.61 0.78 0.18 0.46 0.00 -0.40 0.10 -0.17 0.01
p_minority 0.31 1.00 -0.27 0.29 0.16 0.39 0.19 0.27 0.84 0.61 0.22 -0.41 0.54 -0.66 0.10
Diversity -0.03 -0.27 1.00 -0.04 -0.22 -0.08 0.07 -0.04 -0.36 -0.19 -0.24 0.24 -0.10 0.38 0.01
teachers 0.94 0.29 -0.04 1.00 -0.21 0.24 0.34 0.84 0.18 0.39 -0.02 -0.30 0.12 -0.17 0.01
spendrat -0.36 0.16 -0.22 -0.21 1.00 0.17 -0.56 -0.12 0.23 -0.05 0.07 0.22 0.22 -0.19 0.14
avgsal 0.26 0.39 -0.08 0.24 0.17 1.00 0.26 0.53 0.16 0.24 0.18 -0.35 0.11 -0.10 0.12
teachrat 0.61 0.19 0.07 0.34 -0.56 0.26 1.00 0.29 0.04 0.31 0.07 -0.47 -0.05 -0.05 -0.02
p_5yearsexp 0.78 0.27 -0.04 0.84 -0.12 0.53 0.29 1.00 0.15 0.32 0.07 -0.28 0.07 -0.12 0.11
poorpct 0.18 0.84 -0.36 0.18 0.23 0.16 0.04 0.15 1.00 0.59 0.18 -0.18 0.61 -0.72 0.13
Language limit 0.46 0.61 -0.19 0.39 -0.05 0.24 0.31 0.32 0.59 1.00 0.09 -0.37 0.30 -0.37 0.05
p_gifted 0.00 0.22 -0.24 -0.02 0.07 0.18 0.07 0.07 0.18 0.09 1.00 -0.18 0.03 -0.15 0.05
p_special -0.40 -0.41 0.24 -0.30 0.22 -0.35 -0.47 -0.28 -0.18 -0.37 -0.18 1.00 -0.05 0.17 0.03
p_moved 0.10 0.54 -0.10 0.12 0.22 0.11 -0.05 0.07 0.61 0.30 0.03 -0.05 1.00 -0.49 0.32
p_passed -0.17 -0.66 0.38 -0.17 -0.19 -0.10 -0.05 -0.12 -0.72 -0.37 -0.15 0.17 -0.49 1.00 -0.06
attendance 0.01 0.10 0.01 0.01 0.14 0.12 -0.02 0.11 0.13 0.05 0.05 0.03 0.32 -0.06 1.00
par(mfrow = c(1, 2))
hist(schools$poorpct, main = "Distribution of Poverty Rate",
     xlab = "% Students in Poverty", col = "tomato", border = "white")
hist(log10(schools$attendance), main = "Distribution of Attendance (log)",
     xlab = "log10(Attendance)", col = "mediumpurple", border = "white")

plot(schools$p_moved, schools$attendanc,
     main = "Student Mobility vs. Attendance",
     xlab = "% Students Moved",
     ylab = "Attendance Rate",
     pch = 19,
     col = rgb(0.6, 0.2, 0.4, 0.5))
grid()

pairs(~ poorpct + p_passed + p_moved, data = schools,
      main = "Scatter Matrix: Poverty, Pass Rate, Mobility")

Key Takeaways

  1. Income drives health spending: GNI per capita explains a large proportion of variation in health expenditure across countries.
  2. Skewed distributions: Most global indicators are positively skewed, reflecting vast disparities between wealthy and developing nations.
  3. Education and health are linked: Countries investing more in education tend to have better health outcomes.
  4. Poverty impacts education: In the California schools data, poverty rates are closely tied to academic performance metrics.

Data sources: WHO Global Health Observatory, California Schools Dataset