Air pollution in urban areas of India contribute to increased infant mortality rates more than water pollution while considering what cities are most affected by that air and water pollution as well as an increasing infant mortality rate?
Packages:
library(ggplot2)
library(MASS)
library(stargazer)
library(AER)
library(plm)
library(tidyverse)
library(lmtest)
library(sandwich)
library(dplyr)
library(ggdark)
library(readr)
library(cowplot)
library(plotly)
library(ggpubr)
library(haven)
im_air <- read_dta("/Volumes/GoogleDrive/My Drive/Econ 432/Final Project1/Sustainablity Data/Infant-Mortality/IM-City/Final-Data/im_air.dta")
im_water <- read_dta("/Volumes/GoogleDrive/My Drive/Econ 432/Final Project1/Sustainablity Data/Infant-Mortality/IM-City/Final-Data/im_water.dta")
Describe variables you plan on using for your analysis including type of variable (e.g. numeric, factor, dummy) and its units if numeric
summary(im_air$e_spm_mean)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 9.0 140.2 216.3 228.0 302.1 838.4 1183
summary(im_air$e_so2_mean)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.4392 8.3235 13.9850 17.4624 21.9975 125.9810 1200
summary(im_air$e_no2_mean)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.011 15.856 24.200 27.959 34.692 209.607 1171
summary(im_air$c_IM)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0594 9.0238 19.9659 23.4621 33.0903 499.8054 1084
summary(im_water$lnfcoli)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 2.963 5.886 5.668 7.643 14.557 677
summary(im_water$bod)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.2667 1.7239 2.6818 5.2090 5.0909 100.0000 358
summary(im_water$do)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.400 6.250 7.038 6.833 7.721 11.000 355
summary(im_water$c_IM)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0594 5.9762 17.7755 21.2724 32.3459 168.2521 954
You are likely going to revise this as you continue your data analysis. However, it is important to start thinking about the model:
Choose a specification (level-level, log-log, log-level, level-log). In other words, is your outcome variable in log or level form? What about your control variables? (you may have level and log Xs in a model)
log-level because the y variable is needed to be log in order for the result to be statistically significant same for x variables.
Choose type of regression/model (multiple regression, panel regression, binary regression, multinomial logit regression). Explain your choice.
Multiple regression because we will be using multiple regression to evaluate statistically significance of the variables
Copy and Modify the following code to propose 3 different regression models. In other words, replace \(Y\) and the \(X\)’s for the names of your variables (e.g. instead of \(X_{1i}\), write \(age_i\)). Start from the simplest model and then add variables to make it more complex. You will be assessed based on your choice of variables (You should not include variables that are endogenous)
model.1: \(cIM=\beta_0+\beta_1e.spm.mean_{1i}+YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
model.2: \(cIM=\beta_0+\beta_1e.spm.mean_{1i}+\beta_2e.so2.mean_{2i} +\beta_3e.no2.mean_{3i} + YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
model.3: \(cIM =\beta_0+\beta_1nfcoli_{1i}+ YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
model.4: \(cIM=\beta_0+\beta_1nfcoli_{1i}+\beta_5bod_{2i}+\beta_6do_{3i} + YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
Most likely, the chosen data comes with multiple variables. Given your answer to 4, you want to use tidyverse
to subset your data and select the columns (both \(y\) and \(x\)’s that you will use). Name this subset: data
water = im_water %>% select(c_IM, bod, do, lnfcoli, pop_urban, state, year)
water = na.omit(water)
dim(water)
## [1] 578 7
air = im_air %>% select(e_spm_mean, e_so2_mean, e_no2_mean, state, year, pop_urban, c_IM)
air = na.omit(air)
water = water[-c(530:578),]
dim(air)
## [1] 529 7
a <- data(1:10, start=c(1987), frequency=12)
b <- data(1:12, start=c(2015,1), frequency=12)
library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))
head(air)
## # A tibble: 6 × 7
## e_spm_mean e_so2_mean e_no2_mean state year pop_urban c_IM
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 132. 8.57 15.0 Andhra Pradesh 1990 4180. 24.6
## 2 186. 14.9 21.9 Andhra Pradesh 1991 4351. 18.5
## 3 172. 12.6 27.0 Andhra Pradesh 1992 4493. 15.1
## 4 149. 7.66 15.9 Andhra Pradesh 1993 4634. 17.2
## 5 128. 15.2 18.7 Andhra Pradesh 1994 4776. 12.3
## 6 162. 16.3 34.7 Andhra Pradesh 1995 4918. 8.88
head(water)
## # A tibble: 6 × 7
## c_IM bod do lnfcoli pop_urban state year
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 17.2 16.7 5.08 6.07 4634. Andhra Pradesh 1993
## 2 24.6 16.9 5.05 3.81 5909. Andhra Pradesh 2002
## 3 26.4 17.5 7.52 7.93 6193. Andhra Pradesh 2004
## 4 5.60 3.08 7.83 1.74 1338. Andhra Pradesh 1998
## 5 2.70 2.5 7.05 2.05 1332. Andhra Pradesh 1995
## 6 8.09 0.867 7.41 7.40 1159. Andhra Pradesh 2002
Rename variables in an optimal way. Think about the name of the variables in your data and consider whether there is any improvement. Use tidyverse
and mutate()
to change the name of a variable.
For example, if the name of the variable is long or includes spaces, you should modify the name to a single word (e.g. if the name of column is what's your age
, modify the name to age
)
#```{r} data = data %>% mutate(lnc_IM = lnc_IM…43)
data = data %>% mutate(pop_urban = pop_urban…25)
data = data %>% mutate(year = year…3)
data = data %>% mutate(state = state…1)
data = data %>% mutate(city = city…2)
data <- data %>% mutate(lne_spm_mean = log(e_spm_mean))
data <- data %>% mutate(lne_so2_mean = log(e_so2_mean))
data <- data %>% mutate(lne_no2_mean = log(e_no2_mean))
data = select(data, -1, -2, -4, -5, -6, -7,-8,-9) #```
Write code to summarize the variables in your model from question 4 and write a sentence describing each. will be taken off if you use the wrong function.
Remember, use summary()
or table()
depending on the type of variable.
summary(air$c_IM)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05936 11.15546 20.57571 22.59976 32.45787 99.03008
summary(water$c_IM)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05936 5.57547 17.22860 20.25643 29.87463 89.40035
Description:
For each \(x\), complete the following:
summary(air$pop_urban)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 155.2 715.6 1310.9 2170.4 2396.3 12905.8
Description:
summary(air$state)
## Length Class Mode
## 529 character character
Description:
summary(air$year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1987 1992 1996 1996 2000 2004
Description:
Do these steps for all the \(x's\) in your model
tidyverse
and ggplot2
year_mean1 = air %>%
group_by(year) %>%
summarise_at(vars(e_spm_mean), list(e_spm_mean = mean))
p3.0 <- year_mean1 %>%
ggplot(aes(x = year, y = e_spm_mean)) +
geom_line(color="blue")+
geom_point(color="violetred2")+
geom_smooth(se = FALSE, method = lm, color = "aquamarine") +
ggtitle(paste("PM Average Pollution")) +
theme_bw() +
scale_x_continuous() +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=16)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=12)) +
ylab("Mean Pollution (mu g/m3)") + dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
year_mean3.0 = air %>%
group_by(year) %>%
summarise_at(vars(e_so2_mean), list(e_so2_mean = mean))
p5.0 <- year_mean3.0 %>%
ggplot(aes(x = year, y = e_so2_mean)) +
geom_line(color="blue")+
geom_point(color="violetred2")+
geom_smooth(se = FALSE, method = lm, color = "aquamarine") +
ggtitle(paste("SO2 Average Pollution")) +
theme_bw() +
scale_x_continuous() +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=16)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=12)) +
ylab("Mean Pollution (mu g/m3)") +
xlab("Years") + dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
year_mean4.0 = air %>%
group_by(year) %>%
summarise_at(vars(e_no2_mean), list(e_no2_mean = mean))
p4.0 <- year_mean4.0 %>%
ggplot(aes(x = year, y = e_no2_mean)) +
geom_smooth(se = FALSE, method = lm, color = "aquamarine") +
geom_line(color="blue")+
geom_point(color="violetred2")+
ggtitle(paste("NO2 Average Pollution")) +
theme_bw() +
scale_x_continuous() +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=16)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=12)) +
ylab("Mean Pollution (mu g/m3)") +
xlab("Years") + dark_mode() +
theme(plot.title = element_text(hjust = 0.5))
figure3 <- plot_grid(
p3.0, p4.0, p5.0,
align="center"
)
annotate_figure(figure3,
top = text_grob("Air Quality Indicator Average Levels from 1987-2007", color = "white", face = "bold", size = 18),
fig.lab.face = "bold"
) + dark_mode()
Description:
In this graph we are looking at the overall trends in average air quality in different states within India between 1987 and 2007. From the 1987 to 2007 SO2 dropped dramatically from ~22.5 micrograms to ~10 micrograms. Similarly, the particulate matter levels have decreased from ~287.5 micrograms to ~187.5 micrograms. In contrast, NO2 average pollution have been volatile where the mean pollution levels went from 34 micrograms in 1987 to 22 micrograms in 1995 then back to around 34 micrograms in the early 2000s (overall NO2 Pollution have increased.
air1 = air %>%
group_by(state, year) %>%
summarise_at(vars(e_spm_mean), list(e_spm_mean = mean))
p3.1 <- air1 %>%
ggplot(aes(x = year, y = log(e_spm_mean))) +
geom_line(color="blue") +
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("PM Average Pollution by State")) +
theme_bw() +
scale_x_continuous() +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=16)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=12)) +
ylab("Mean Pollution (mu g/m3)")+
xlab("Years") +
facet_wrap(~state) +
dark_mode() +
theme(plot.title = element_text(hjust = 0.5))
air2 = air %>%
group_by(state, year) %>%
summarise_at(vars(e_so2_mean), list(e_so2_mean = mean))
p5.1 <- air2 %>%
ggplot(aes(x = year, y = log(e_so2_mean))) +
geom_line(color="blue") +
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("SO2 Average Pollution by State")) +
theme_bw() +
scale_x_continuous() +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=16)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=12)) +
ylab("Mean Pollution (mu g/m3)") +
xlab("Years") +
facet_wrap(~state) +
dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
air3 = air %>%
group_by(state, year) %>%
summarise_at(vars(e_no2_mean), list(e_no2_mean = mean))
p4.1 <- air3 %>%
ggplot(aes(x = year, y = log(e_no2_mean))) +
geom_line(color="blue")+
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("NO2 Average Pollution by State")) +
theme_bw() +
scale_x_continuous() +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=16)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=12)) +
ylab("Mean Pollution (mu g/m3)") +
xlab("Years") +
facet_wrap(~state, scales="fixed" ) +
dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
p3.1
Description:
This graph is showing us the average particulate matter levels in India’s largest states from around 1887 to 2007. Look closely we can see a similar decreasing or stable trend for each state. The most drastic fall in average particulate matter was in Himachal Pradesh, decreasing at ~5.5 micrograms in 1987 to ~3 micrograms in the early 2000s. The two states that are suffering from relatively increasing particulate matter pollution is Karnataka and Rajasthan . From 1987 to 2007, Karnataka and Rajasthan’s particulate matter polution had increased almost 20%.
p4.1
Description:
This graph is showing us the average nitrous oxide levels in India’s largest states from around 1887 to 2007. Look closely we can see a similar steadily increasing trend for each state. The most drastic increase in average NO2 was in Madhya Pradesh and Karnataka, increasing around 2 micrograms from 1987 to 2007 (almost 100% increase). The two states that are suffering from relatively constant NO2 pollution are West Bengal and Delhi. Rajasthan has managed to reduce NO2 pollution from 1987 to 2007, where NO2 mean pollution dropped from ~5 micrograms to ~3 micrograms.
p5.1
Description:
This graph is showing us the average sulfur dioxide levels in India’s largest states from around 1887 to 2007. Look closely we can see a similar decreasing or stable trend for each state. The largest decrease in average SO2 levels was in West Bengal and Rajasthan, decreasing at ~4 micrograms in 1987 to ~2 micrograms in the late 2000s. The two states that are suffering from relatively increasing particulate matter pollution are Maharashtra and Kerala. From 1987 to 2007, Maharashtra and Kerala’s SO2 mean polution had increased almost 50%.
year_mean2 = water %>%
group_by(year) %>%
summarise_at(vars(bod), list(bod = mean))
p3 <- year_mean2 %>%
ggplot(aes(x = year, y = bod)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = 1.5 )+
geom_smooth(se = FALSE, method = lm, color = "aquamarine") +
ggtitle(paste("Biochemical Oxygen Demand in Water")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=12)) +
theme(legend.text=element_text(size=18)) +
ylab("Mean Pollution (mg/l) ") +
xlab("Years") + dark_mode() +
theme(plot.title = element_text(hjust = 0.5))
year_mean3 = water %>%
group_by(year) %>%
summarise_at(vars(lnfcoli), list(lnfcoli = mean))
p5 <- year_mean3 %>%
ggplot(aes(x = year, y = lnfcoli)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = 1.5 )+
geom_smooth(se = FALSE, method = lm, color = "aquamarine") +
ggtitle(paste("F-Coli Concentration in Water")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=12)) +
theme(legend.text=element_text(size=18)) +
ylab("Mean Pollution (1,000’s/100 ml)") +
xlab("Years") + dark_mode() +
theme(plot.title = element_text(hjust = 0.5))
year_mean4 = water %>%
group_by(year) %>%
summarise_at(vars(do), list(do = mean))
p4 <- year_mean4 %>%
ggplot(aes(x = year, y = do)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = 1.5 )+
geom_smooth(se = FALSE, method = lm, color = "aquamarine") +
ggtitle(paste("Dissolved Oxygen in Water")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=12)) +
theme(legend.text=element_text(size=18)) +
xlab("Years") + dark_mode() +
theme(plot.title = element_text(hjust = 0.5))
figure2 <- plot_grid(
p3, p4, p5,
align="hv"
)
annotate_figure(figure2,
top = text_grob("Water Quality Indicator Average Levels from 1987-2007", color = "white", face = "bold", size = 16),
fig.lab.face = "bold"
) + dark_mode()
Description
In this graph we are looking at the overall trends in average water quality in different states within India between 1987 and 2007. The overall trends are pretty mixed. Plot 3 demonstrates that BOD steadily got worse throughout the late 1980s to mid 1990s but began to improve after 1997, but the average shows us that BOD has only really gotten around 20% better from 1987 to 2007. DO has been on average decreasing over time suggesting worsening water quality. F-coli drops dramatically in the 1990s, but when going into the 2000s it begins to increase slightly. F-Coli’s decrease is interesting suggesting that despite the alarmingly fast growth in sewage generation, domestic water pollution is still decreasing.
water1 = water %>%
group_by(state, year) %>%
summarise_at(vars(bod), list(bod = mean))
p3.1.0 <- water1 %>%
ggplot(aes(x = year, y = bod)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("Biochemical Oxygen Demand in Water by State")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=12)) +
theme(legend.text=element_text(size=18)) +
ylab("Mean Pollution (mg/l) ") +
xlab("Years")+
facet_wrap(~state, scales="fixed" ) +
dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
water2 = water %>%
group_by(state, year) %>%
summarise_at(vars(lnfcoli), list(lnfcoli = mean))
p5.1.0 <- water2 %>%
ggplot(aes(x = year, y = lnfcoli)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("F-Coli Concentration in Water by State")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous(breaks = seq(1.9, 12, 2.5))+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=12)) +
theme(legend.text=element_text(size=18)) +
ylab("Mean Pollution (1,000’s/100 ml)") +
xlab("Years") +
facet_wrap(~state, scales="fixed" ) +
dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
water3 = water %>%
group_by(state, year) %>%
summarise_at(vars(do), list(do = mean))
summary(water3$do)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.950 6.021 6.724 6.472 7.242 9.058
p4.1.0 <- water3 %>%
ggplot(aes(x = year, y = do)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("Dissolved Oxygen in Water by State")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=12)) +
theme(legend.text=element_text(size=18)) +
xlab("Years") +
ylab("Mean Pollution (mg/l)") +
facet_wrap(~state, scales="fixed" ) +
dark_mode() +
theme(plot.title = element_text(hjust = 0.5))
p3.1.0
Description:
This graph is showing us the average biochemical oxygen demand in India’s largest states from around 1887 to 2007. Look closely we can see a similar slightly increasing or low decreased trend for each state. The largest decrease in average BOD was in Gujurat, decreasing at ~35mg/l in 1995 to ~5 mg/l in the late 2000s. The two states that are suffering from relatively increasing BOD are Maharashtra and Delhi. From 1987 to 2007, Maharashtra and Delhi’s BOD has increased more than 5% on average.
p4.1.0
Description:
This graph is showing us the average dissolved oxygen in India’s largest states from around 1887 to 2007. Look closely we can see a similar slightly decreasing or relatively constants trends for each state. Delhi has the worst water quality with their average DO levels below ~4 mg/l from 1987 to 2007 meaning water quality is so bad fish can barely survive to decompose particulate matter. The two states that are suffering from relatively increasing DO are Punjab and Gujurat. From 1987 to 2007, Punjab’s DO has inxrease more than 25% and Gujurat’s DO has increased more than 12.5% on average. The largest decrease in average DO was in Bihar, decreasing at ~8mg/l in 1990 to ~5 mg/l in 1992 .
p5.1.0
Description:
This graph is showing us the average F-Coli levels in India’s largest states from around 1887 to 2007. We can see a similar decreasing or relatively decreasing trend for each state. The two states that have decreased F-Coli levels the most are Karnataka and Gujurat. In 1987, Karnataka had ~6.9 micrograms where in 2007 it had decreases to ~4.4 micrograms.
year_mean = air %>%
group_by(year) %>%
summarise_at(vars(c_IM), list(c_im = mean))
year_mean %>%
ggplot(aes(x = year, y = c_im)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = 1.5 )+
ggtitle(paste("Average Infant Mortality Rate by State from 1987-2007")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=12)) +
theme(axis.text.y=element_text(size=9)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=18)) +
ylab("Infant Deaths Per 1000 Births") +
xlab("Years") + dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
Description: This graph is showing us the average infant deaths per 1000 births(c_IM) in India from around 1887 to 2007. Infant mortality rates are a good way to measure how effective environmental regulations are for several reasons. One is that, compared to measures of adult health, infant health is probably more affected by short- and medium-term changes in pollution. Another reason is that the first year is a vulnerable time, so a loss of life expectancy can be large. Infant mortality in urban India fell a lot between 1987 and 2004, from ~34.5 deaths per 1,000 live births to ~15.7. ## Plot 5
state_mean = air %>%
group_by(state, year) %>%
summarise_at(vars(c_IM), list(c_im = mean))
state_mean %>%
ggplot(aes(x = year ,y = c_im)) +
geom_line(color = "blue")+
geom_point(color="violetred2", size = .5 )+
ggtitle(paste("Average Infant Mortality Rate by State from 1987-2007")) +
theme_bw() +
scale_x_continuous(breaks = seq(1987,2007, 5)) +
scale_y_continuous()+
theme(plot.title = element_text(hjust = 0.5))+
theme(axis.text.x=element_text(size=5)) +
theme(axis.text.y=element_text(size=6)) +
theme(axis.title.x=element_text(size=16)) +
theme(axis.title.y=element_text(size=16, angle=90)) +
theme(plot.title=element_text(size=14)) +
theme(legend.text=element_text(size=18)) +
ylab("Infant Deaths Per 1000 Births") +
xlab("Years") +
facet_wrap(~state) + dark_mode()+
theme(plot.title = element_text(hjust = 0.5))
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
Description:
This graph is showing us the average infant deaths per 1000 births(c_IM) in India’s largest states from around 1887 to 2007. Look closely we can see a similar decreasing trend for each state. The most drastic fall in average c_IM was West Bengal, decreasing from a ~6% to ~1% infant mortality rate. The two states that are suffering from a relatively constant infant mortality rate are Delhi and Andhra Pradesh. FroM 1987 to 2007, Andhra Pradesh and Delhi had around a 2% infant mortality rate.
Revise your answer for question 6:
Level-level because the most important x factors are in level form. The control variables are in in level form in order to group accordingly to particular regions in India.
Multiple regression because we are looking at how much the levels of SPM in the air and F-Coli in the water are good estimators of infant mortality.
model.1: \(cIM=\beta_0+\beta_1e.spm.mean_{1i}+YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
model.2: \(cIM=\beta_0+\beta_1e.spm.mean_{1i}+\beta_2e.so2.mean_{2i} +\beta_3e.no2.mean_{3i} + YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
model.3: \(cIM=\beta_0+\beta_1nfcoli_{1i}+ YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
model.4: \(cIM=\beta_0+\beta_1nfcoli_{1i}+\beta_5bod_{2i}+\beta_6do_{3i} + YearFixedEffects+u_{it} + StateFixedEffects+u_{it} + Pop.UrbanFixedEffects+u_{it}\)
mult.mod.1.0 = lm(c_IM ~ e_spm_mean + pop_urban -1 + year -1
, (as.factor(state))
, data = air)
mult.mod.1.1 = lm(c_IM ~ e_spm_mean + e_so2_mean + e_no2_mean + pop_urban -1 + year -1
, (as.factor(state))
, data = air)
mult.mod.1.2 = lm(c_IM ~ lnfcoli + pop_urban -1 + year-1
, (as.factor(state))
, data = water )
mult.mod.1.3 = lm(c_IM ~ lnfcoli + bod + do + pop_urban -1 + year -1
, (as.factor(state))
, data = water )
rob_se2.1.0 <- list(sqrt(diag(vcovHC(mult.mod.1.0, type = "HC1"))),
sqrt(diag(vcovHC(mult.mod.1.1, type = "HC1"))),
sqrt(diag(vcovHC(mult.mod.1.2, type = "HC1"))),
sqrt(diag(vcovHC(mult.mod.1.3, type = "HC1"))))
stargazer(mult.mod.1.0, mult.mod.1.1, mult.mod.1.2, mult.mod.1.3,
digits = 2,
header = FALSE,
type = "latex",
se = rob_se2.1.0,
title = "OLS of Determinants of Infant Mortality Rate in India",
model.numbers = FALSE,
column.labels = c("(1)", "(2)", "(3)","(4)"),
column.sep.width = "-10pt")
Complete the following bullet describing the goodness of fit of the regression and state which regression is better. If you estimated more than 3 regressions, please specify model name for each bullet point. Add bullet as needed. Only compare models with the same \(Y\).
linearHypothesis(mult.mod.1.1, c("pop_urban=0", "year=0") , white.adjust = "hc1")
## Linear hypothesis test
##
## Hypothesis:
## pop_urban = 0
## year = 0
##
## Model 1: restricted model
## Model 2: c_IM ~ e_spm_mean + e_so2_mean + e_no2_mean + pop_urban - 1 +
## year - 1
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 526
## 2 524 2 5382.6 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is the overall regression F-statistic and the null hypothesis is obviously different from testing when the control variables are zero. We should keep both variables as they both add a high level of significance to the model.
The estimated coefficients of my best model can be interpreted because each variable has significant coeffecient values especially e_spm_mean and e_so2_mean because they have allowed me to reject the null hypothesis.
5 estimated coefficients interpreted from best model
mult.mod.1.1
##
## Call:
## lm(formula = c_IM ~ e_spm_mean + e_so2_mean + e_no2_mean + pop_urban -
## 1 + year - 1, data = air, subset = (as.factor(state)))
##
## Coefficients:
## e_spm_mean e_so2_mean e_no2_mean pop_urban year
## 0.119990 -1.219331 -0.012742 -0.003133 0.015993
cor(x = air$e_spm_mean, y=air$c_IM)
## [1] 0.05386448
(Infant mortality rate .1% change means 1 infant per 1000) - x
: e_spm_mean
- interpretation: For every 1 unit decrease in e_spm_mean, the infant mortality rate decreases by .1% .
- intuition: Particulate matter levels have had a positive correlation with infant mortality rate.
x
: e_so2_mean
x
: e_no2_mean
x
: pop_urban
x
: year
The potential for a selection bias is evident because the infant mortality rates vary greatly from the state level surveys to the vital statistics data while being highly correlated.
The potential for a downward bias is further evident because most infant deaths are not reported in India, with research also suggesting the levels are much higher. Which creates a equality gap that is represented in a bias distribution of partial observations that ultimately affect the individual outcomes.
The reason a confounding bias might exist is because there is almost always a systematic discrepancy in the measures of exposure effects and public health outcomes caused by the interchanging effect of the exposure to external risk factors. This can be assumed to be happening through the effect at which infant mortality rate has gone down because of better air quality but is volatily getting worst with poot water quality.
The intriguing observation was the drastic decreasing trend in India's infant mortality rate, coupled with improvements in air quality. This prompted an investigation into whether worsening air pollution, compounded by India's rising heat levels, exacerbates infant mortality rates more severely than water pollution. Remarkably, the correlation between decreasing air pollution and declining infant deaths appeared non-coincidental. Through rigorous modeling, estimating infant mortality rates based on air and water quality levels, the air pollution model exhibited a superior goodness-of-fit of 0.971, compared to 0.81 for the water pollution model. This finding propelled a deeper exploration into the geographic variations of this phenomenon across India's diverse urban landscapes. Notably, Delhi, grappling with the nation's worst water quality, maintained a relatively steady and low infant mortality rate. Conversely, regions like Himachal Pradesh witnessed worsening air quality yet decreasing infant mortality rates. These paradoxical observations underscore the urgency of investigating low-tech, localized solutions to bridge the gap between unreported and unobserved public health deaths linked to escalating pollution levels and climate hazards. This research endeavors to unmask the intricate interplay between air pollution, water quality, and infant mortality, shedding light on the disproportionate impact of air pollution while illuminating potential pathways for targeted interventions and policy reforms tailored to India's urban complexities.