library(lubridate)
library(dplyr)
library(stringr)
library(ggvis)
library(tidyr)
library(pander)
Abstract: I examined the employment and salary statistics of Baltimore City employees based on the 2011-2015 data from the Baltimore Open Data Portal. Among all City agencies, the Police Department had the largest number of employees over the 2011-2015 time period. Overall Baltimore City had a one-year employee retention rate of 87%, a two-year retention rate of 78%, a three-year retention rate of 71%, and a four-year retention rate of 65%. The number of employees who switched jobs among different City agencies were small, about 1.1%. The result indicates that the majority of employee attrition was due to employees leaving the City workforce all together. It also means that City workers, if they chose to stay working for the City, liked to stay with the same agencies. Analysis at a more granular scale shows that smaller agencies, if they continue to exist over the years, had higher retention rates than larger agencies. From analyzing salary data, I found that every year since 2011 there has been a steady, albeit modest, pay increase for City employees. The Police Department stood out as the only agency that has been experiencing a workforce reduction (since 2012) while increasing its salary spending at the same time. Data showed that the extra spending had impact on majority of the personnel in the Police Department, and was not the results of hiring a few high paying individuals. The data also showed that the Police Department had a higher pay scale than all the other City agencies. The data products from this analysis are available for download here.
Tags: data cleaning
data analysis
model fitting
data visualization
R
This project analyzes Baltimore City Employee Salaries data from 2011 to 2015 provided at the Baltimore Open Data Portal. In particular, by identifying trends and features in these data I hope to uncover some of the consequences of the current policies on personnel allocation and financial support. To carry out this analysis, I will be reading in the data from separate files, clean the data, visualize them, and fit appropriate models to them.
The five data files, one for each fiscal year, contain the same types of information. For example, each file records employee names, hiring dates, job titles, affiliating agencies, and salaries. While the variables in these data files may have names that differ from year to year, the fact that the same types of information were provided for all five years makes processing the data easy. Indeed, one can set up a loop that reads in the data, clean them up, and merge them into a master data set. This loop approach is scalable and saves memory space, so that’s what I did.
In the following, I will first define helper functions to carry out various tasks of cleaning the data and creating additional variables. The steps include:
HireDate
data,# Remove "$" signs in salary data and convert them from character type to numeric type
format_salary_data <- function(strings_salary) {
numeric_salary <- strings_salary %>%
str_replace("[$]", "") %>%
as.numeric()
}
I’ve learned from analyzing the 2015 data that the entries of Agency
include the last three digits in the entries of AgencyID
some of the times but not all the time. In the following I will remove the three digits reference in Agency
(if they appear) and standardize the spellings of the entries in Agency
.
Agency
columnThe following function removes the three digit location reference (if they appear). It removes location reference that comes in the form “(nnn)”, “Location xxx”, or simply “(”. Since few departments indicated their employees as part-time, wkly, or full-time, I decided to focus this analysis on the inter-departmental differences only. As a result, this function will also remove any “(part-time)” or “(wkly)” designations.
# Identify and remove any location reference in a given character string
rm_location_ref <- function(cstr) {
idx <- str_locate(cstr, "[(]|Location [:digit:]{3}|[:digit:]{3}")
if (is.na(idx[1])) return(cstr)
else {
idx <- idx[1] - 2
# return(substring(cstr, 1, idx))
return(str_trim(substring(cstr, 1, idx), "right"))
}
}
# Apply rm_location_ref function to each character string in the `Agency` column
rm_location_ref_apply2vec <- function(cstr_vec) {
unlist(lapply(cstr_vec, rm_location_ref))
}
Some agency names have been spelled in multiple ways. For example,
AgencyID
, “A64”, as the “Fire Department.” As a result, “Fire Academy Recruits” will be considered as a subsidiary agency of the “Fire Department.”AgencyID
, “W02”, as others under “Youth Summer”, the single entry of “OED-Participants-Youth Summer” will be converted into “Youth Summer.”The following function standardizes names for agencies and their subsidiary departments.
sync_name <- function(agency_vec) {
agency_vec <- rm_location_ref_apply2vec(agency_vec)
agency_vec <- agency_vec %>% str_replace("Fire Academy Recruits", "Fire Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept Locatio", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept. Locatio", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Heatlh Dept. Locatio", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Heatlh $", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept$", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept.$", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("HLTH-Heatlh Dept.$", "HLTH-Health Department")
agency_vec <- agency_vec %>% str_replace("Mayors Office", "Mayor's Office")
agency_vec <- agency_vec %>% str_replace("Civil Rights & Wage Enfor$", "Civil Rights & Wage Enforce")
agency_vec <- agency_vec %>% str_replace("COMP-Communication Ser$", "COMP-Communication Services")
agency_vec <- agency_vec %>% str_replace("COMP-Comptroller's O$", "COMP-Comptroller's Office")
agency_vec <- agency_vec %>% str_replace("Municipal & Zoning Appeal$", "Municipal & Zoning Appeals")
agency_vec <- agency_vec %>% str_replace("TRANS-Cross Guard-S$", "TRANS-Cross Guard-Summer")
agency_vec <- agency_vec %>% str_replace("OED-Participants-Youth Summer", "Youth Summer")
}
Among the agencies listed in each year’s salary file, some agencies were represented by their subsidiary departments. This resulted in 50+ different agencies. Since this level of refined differentiation can be overwhelming and may create confusion for a regular citizen, I limit the content of the Agency
variable to the (umbrella) agencies only. The original super-agency-subsidiary-department description will be preserved in a different variable called AgencyDept
. In this way, I can group data together at the super-agency level, and still be able to identify which department within an agency that a particular employee worked for by looking up content in the AgencyDept
variable.
# Group sub-agencies under super-agencies, the sub-agency's name is provided as a character string
sub2sup_agencies <- function(cstr) {
if (is.na(cstr)) return(cstr)
if (str_detect(cstr, "COMP")) return("Comptroller's Office")
if (str_detect(cstr, "DPW")) return("Dept of Public Works")
if (str_detect(cstr, "ERS")) return("Employee's Retirement Sys")
if (str_detect(cstr, "FIN")) return("Dept of Finance")
if (str_detect(cstr, "FPR")) return("Fire & Police Retirement Sys Admin")
if (str_detect(cstr, "HLTH")) return("Dept of Health")
if (str_detect(cstr, "HR")) return("Human Resourcese")
if (str_detect(cstr, "M-R")) return("Mayor's Office")
if (str_detect(cstr, "OED")) return("Office of Employment Dev")
if (str_detect(cstr, "R&P")) return("Recreation & Parks")
if (str_detect(cstr, "TANF")) return("Temp Assist for Needy Families")
if (str_detect(cstr, "TRANS")) return("Dept of Transportation")
return(cstr)
}
# Apply `sub2sup_agencies` function to a vector of sub-agency names that are character strings
sub2sup_agencies_apply2vec <- function(cstr_vec) {
unlist(lapply(cstr_vec, sub2sup_agencies))
}
HireDate
These are records that have generic names, such as “BPD 1”, “BPD 2”, …, “BPD 96”, in the Name
column. It appears that the payroll information of these records are credible, in the sense that they were actual salaries paid to certain individuals, However, these records can be linked to no identifiable persons. More specifically, for these unnamed individuals their JobTitle
’s are identical to their names; their Agency
were all listed as “Police Department”; and the first three digits of AgencyID
are all “A99.”
Since Name
and HireDate
are the only demographic information provided in these data files, I will be using these two pieces of information to match records and merge them into a master data set. This means that I need to flag records that have generic Name
’s and no HireDate
in order to correctly match them later. I will give “Jan-1-2200”, a future date, as the HireDate
to the these records.
flag_hiredate <- function(date_vec) {
idx <- which(is.na(date_vec))
# print(paste(as.character(length(idx)), "missing hire-date records identified.", sep=" "))
date_vec[idx] <- mdy("01-01-2200")
return(date_vec)
}
Some entries of Name
and Agency
have included trailing space(s), which will lead to separate entries when we merge data records later on.
trim_trailing_space_cstr <- function(cstr) {
str_trim(cstr, side="right")
}
trim_trailing_space2vec <- function(cstr_vec) {
unlist(lapply(cstr_vec, trim_trailing_space_cstr))
}
BmoreSalaries1115 <- data.frame()
for (i in 11:15) {
# Standardize names of variables in the incoming data file. This step is necessary since in some files, "AgencyID" is called "DEPTID", "Agency" is called "DESCR", etc.
new_names <- c("Name", "JobTitle", "AgencyID", "Agency", "HireDate", "AnnualSalary", "GrossPay")
# Read data from files
if (i == 11) {
new_csv = paste("Baltimore_City_Employee_Salaries_20",as.character(i),".csv", sep="")
} else {
new_csv = paste("Baltimore_City_Employee_Salaries_FY20",as.character(i),".csv", sep="")
}
generic_df <- read.csv(new_csv, stringsAsFactors=FALSE, col.names = new_names)
# Prepare data
generic_df$Name <- trim_trailing_space2vec(generic_df$Name)
generic_df$JobTitle <- trim_trailing_space2vec(generic_df$JobTitle)
generic_df$AgencyID <- trim_trailing_space2vec(generic_df$AgencyID)
generic_df$Agency <- trim_trailing_space2vec(generic_df$Agency)
generic_df$HireDate <- mdy(generic_df$HireDate)
generic_df$HireDate <- flag_hiredate(generic_df$HireDate)
# [1] "75 missing hire-date records identified."
# [1] "96 missing hire-date records identified."
# [1] "52 missing hire-date records identified."
# [1] "70 missing hire-date records identified."
# [1] "10 missing hire-date records identified."
generic_df$AnnualSalary <- format_salary_data(generic_df$AnnualSalary)
generic_df$GrossPay <- format_salary_data(generic_df$GrossPay)
generic_df$AgencyDept <- sync_name(generic_df$Agency)
generic_df$Agency <- sub2sup_agencies_apply2vec(generic_df$AgencyDept)
# Use "Name" and "HireDate" as unique employee identifier; arrange relevant variables to be next to each other in the data frame
generic_df <- generic_df %>%
select(Name, HireDate, JobTitle, Agency, AgencyDept, AgencyID, AnnualSalary, GrossPay)
# Re-name variables to reflect the year data were collected
new_names <- c("Name",
"HireDate",
paste("JobTitle",as.character(i), sep=""),
paste("Agency",as.character(i), sep=""),
paste("AgencyDept",as.character(i), sep=""),
paste("AgencyID",as.character(i), sep=""),
paste("AnnualSalary",as.character(i), sep=""),
paste("GrossPay",as.character(i), sep=""))
names(generic_df) <- new_names
# Merge data from different years into a single data frame
if (i==11) {
BmoreSalaries1115 <- generic_df
} else {
BmoreSalaries1115<-merge(generic_df, BmoreSalaries1115, by=c("Name", "HireDate"), all=TRUE)
}
}
At this point, all data have been read into the master data frame BmoreSalaries1115
. This data frame has 29,425 records with each record characterized by 32 variables. Here is a glimpse of the master data frame at the current stage:
glimpse(BmoreSalaries1115)
## Observations: 29,425
## Variables: 32
## $ Name (chr) "Aaron,Keairah T", "Aaron,Keontae E", "Aaron,Patricia G", "Aaron,Petra L", "Abaine...
## $ HireDate (date) 2013-06-19, 2013-06-10, 1979-10-24, 2006-09-25, 2009-07-23, 2013-07-24, 2013-05-0...
## $ JobTitle15 (chr) NA, NA, "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLO...
## $ Agency15 (chr) NA, NA, "Office of Employment Dev", "States Attorneys Office", "Dept of Health", "...
## $ AgencyDept15 (chr) NA, NA, "OED-Employment Dev", "States Attorneys Office", "HLTH-Health Department",...
## $ AgencyID15 (chr) NA, NA, "A03031", "A29045", "A65026", "A99005", "A40001", "A90005", "A64120", "A99...
## $ AnnualSalary15 (dbl) NA, NA, 55314, 74000, 64500, 46309, 60060, 42702, 62175, 77343, 44548, 18408, 1840...
## $ GrossPay15 (dbl) NA, NA, 53626, 73000, 64404, 59620, 54060, 20251, 83757, 92575, 55943, 1051, 1051,...
## $ JobTitle14 (chr) NA, "AIDE BLUE CHIP", "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY...
## $ Agency14 (chr) NA, "Youth Summer", "Office of Employment Dev", "States Attorneys Office", "Dept o...
## $ AgencyDept14 (chr) NA, "Youth Summer", "OED-Employment Dev", "States Attorneys Office", "HLTH-Health ...
## $ AgencyID14 (chr) NA, "W02200", "A03031", "A29005", "A65026", "A99416", "A40001", NA, "A64120", "A99...
## $ AnnualSalary14 (dbl) NA, 11310, 53428, 68300, 62000, 43999, 52000, NA, 62175, 70918, 42438, 11310, 1131...
## $ GrossPay14 (dbl) NA, 874, 52868, 67439, 58655, 39687, 47020, NA, 61452, 87900, 53668, NA, NA, NA, 1...
## $ JobTitle13 (chr) "AIDE BLUE CHIP", "AIDE BLUE CHIP", "Facilities/Office Services II", "ASSISTANT ST...
## $ Agency13 (chr) "Youth Summer", "Youth Summer", "Office of Employment Dev", "States Attorneys Offi...
## $ AgencyDept13 (chr) "Youth Summer", "Youth Summer", "OED-Employment Dev", "States Attorneys Office", "...
## $ AgencyID13 (chr) "W02278", "W02200", "A03031", "A29005", "A65026", NA, "A40001", NA, "A64120", "A99...
## $ AnnualSalary13 (dbl) 11310, 11310, 51862, 64000, 57900, NA, 52000, NA, 40650, 68847, 41194, NA, NA, NA,...
## $ GrossPay13 (dbl) NA, NA, 51771, 63909, 57429, NA, 3477, NA, 44160, 66496, 46254, NA, NA, NA, NA, 32...
## $ JobTitle12 (chr) NA, NA, "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLO...
## $ Agency12 (chr) NA, NA, "Office of Employment Dev", "States Attorneys Office", "Dept of Health", N...
## $ AgencyDept12 (chr) NA, NA, "OED-Employment Dev", "States Attorneys Office", "HLTH-Health Department",...
## $ AgencyID12 (chr) NA, NA, "A03031", "A29005", "A65026", NA, NA, NA, "A64215", "A99398", "A38410", NA...
## $ AnnualSalary12 (dbl) NA, NA, 51862, 64000, 57900, NA, NA, NA, 34146, 58244, 41194, NA, NA, NA, 44143, 3...
## $ GrossPay12 (dbl) NA, NA, 52247, 59027, 57130, NA, NA, NA, 35538, 62669, 46700, NA, NA, NA, 44638, 2...
## $ JobTitle11 (chr) NA, NA, "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLO...
## $ Agency11 (chr) NA, NA, "Office of Employment Dev", "States Attorneys Office", "Dept of Health", N...
## $ AgencyDept11 (chr) NA, NA, "OED-Employment Dev", "States Attorneys Office", "HLTH-Health Department",...
## $ AgencyID11 (chr) NA, NA, "A03031", "A29005", "A65026", NA, NA, NA, "A64063", "A99398", NA, NA, NA, ...
## $ AnnualSalary11 (dbl) NA, NA, 50845, 56595, 56500, NA, NA, NA, 33476, 50919, NA, NA, NA, NA, 43278, 2638...
## $ GrossPay11 (dbl) NA, NA, 45506, 51589, 50633, NA, NA, NA, 3889, 51422, NA, NA, NA, NA, 39117, 23758...
unique_names <- table(BmoreSalaries1115$Name)
length(unique_names)
## [1] 28832
There are 28,832 unique names in the data frame, which means that some of the records share the same employee names. So, do these records represent different people or the same ones? Given that I don’t have other demographic information of City employees, I looked up their HireDate
, which were different. Given that employees who had switched agencies over the years were linked to the same HireDate
, it seems that HireDate
is a valid piece of information to differentiate employees who had the same names. For this reason I decided to use both Name
and HireDate
as unique identifiers of individual employees.
Here is a summary of the data:
summary(BmoreSalaries1115)
## Name HireDate JobTitle15 Agency15 AgencyDept15
## Length:29425 Min. :1952-06-23 Length:29425 Length:29425 Length:29425
## Class :character 1st Qu.:2001-09-17 Class :character Class :character Class :character
## Mode :character Median :2009-06-11 Mode :character Mode :character Mode :character
## Mean :2006-04-02
## 3rd Qu.:2013-06-10
## Max. :2200-01-01
##
## AgencyID15 AnnualSalary15 GrossPay15 JobTitle14 Agency14
## Length:29425 Min. : 900 Min. : 0 Length:29425 Length:29425
## Class :character 1st Qu.: 33354 1st Qu.: 34666 Class :character Class :character
## Mode :character Median : 48126 Median : 52753 Mode :character Mode :character
## Mean : 51710 Mean : 55782
## 3rd Qu.: 68112 3rd Qu.: 75768
## Max. :238772 Max. :235768
## NA's :15406 NA's :15675
## AgencyDept14 AgencyID14 AnnualSalary14 GrossPay14 JobTitle13
## Length:29425 Length:29425 Min. : 0 Min. : 0 Length:29425
## Class :character Class :character 1st Qu.: 11310 1st Qu.: 22065 Class :character
## Mode :character Mode :character Median : 36400 Median : 45267 Mode :character
## Mean : 39757 Mean : 45634
## 3rd Qu.: 60598 3rd Qu.: 67816
## Max. :238772 Max. :238772
## NA's :10440 NA's :13664
## Agency13 AgencyDept13 AgencyID13 AnnualSalary13 GrossPay13
## Length:29425 Length:29425 Length:29425 Min. : 0 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 11310 1st Qu.: 24536
## Mode :character Mode :character Mode :character Median : 35947 Median : 44001
## Mean : 38293 Mean : 44313
## 3rd Qu.: 57000 3rd Qu.: 65615
## Max. :238772 Max. :238681
## NA's :10285 NA's :13504
## JobTitle12 Agency12 AgencyDept12 AgencyID12 AnnualSalary12
## Length:29425 Length:29425 Length:29425 Length:29425 Min. : 377
## Class :character Class :character Class :character Class :character 1st Qu.: 28726
## Mode :character Mode :character Mode :character Mode :character Median : 42853
## Mean : 43402
## 3rd Qu.: 59933
## Max. :238772
## NA's :13373
## GrossPay12 JobTitle11 Agency11 AgencyDept11 AgencyID11
## Min. : -77 Length:29425 Length:29425 Length:29425 Length:29425
## 1st Qu.: 23871 Class :character Class :character Class :character Class :character
## Median : 43061 Mode :character Mode :character Mode :character Mode :character
## Mean : 43176
## 3rd Qu.: 63551
## Max. :245353
## NA's :13373
## AnnualSalary11 GrossPay11
## Min. : 377 Min. : 7
## 1st Qu.: 31609 1st Qu.: 27609
## Median : 43614 Median : 41100
## Mean : 46009 Mean : 42323
## 3rd Qu.: 59916 3rd Qu.: 58322
## Max. :238772 Max. :170423
## NA's :14968 NA's :14968
It appears that in 2012 some employees received negative pay. Let’s take a closer look.
sum(BmoreSalaries1115$GrossPay12<0, na.rm=T)
## [1] 19
idx12<-which(BmoreSalaries1115$GrossPay12 < 0)
negative_pay12 <- BmoreSalaries1115[idx12,] %>%
select(JobTitle12, Agency12, GrossPay12)
negative_pay12$JobTitle12 <- as.factor(negative_pay12$JobTitle12)
negative_pay12$Agency12 <- as.factor(negative_pay12$Agency12)
summary(negative_pay12)
## JobTitle12 Agency12 GrossPay12
## AIDE BLUE CHIP : 2 Dept of Health:17 Min. :-77
## JTPA TRAINEE A MOMR :14 Youth Summer : 2 1st Qu.:-56
## SR COMPANION STIPEND HLTH: 3 Median :-49
## Mean :-45
## 3rd Qu.:-28
## Max. : -7
There were 19 people who received negative pay in 2012. Two of them were under “Youth Summer”, and the other 17 were under the “Dept of Health.” Those who served in “Youth Summer” had “AIDE BLUE CHIP” as JobTitle
, while those who worked for the the “Dept of Health” either served as “JTPA TRAINEE A MOMR” or as “SR COMPANION STIPEND HLTH.” All 19 people have been with the City for less than a year, and their negative pays were less than $100 each.
We can rule out the possibility that these negative gross pays were the results of overpay from the previous year since all 19 individuals had been working for the City for less than a year. Given that no negative pay records were found in years before and after 2012, it is likely that these are typos. A less likely but possible explanation is that the City offered benefits which required a “co-pay”" from the employees. Given that this is an exploratory project to practice using R and not an investigative report, I will leave the numbers as they are. Those who are interested in finding the exact answer may download the processed data files to continue with their investigations.
numeric_names <- function(cstr) {
str_detect(cstr, "^[:digit:]|[:digit:]$")
}
AnyNumericName <- unlist(lapply(BmoreSalaries1115$Name, numeric_names))
sum(AnyNumericName) # [1] 96
## [1] 96
NumericNameIndv <- subset(BmoreSalaries1115, AnyNumericName)
head(NumericNameIndv)
## Name HireDate JobTitle15 Agency15 AgencyDept15 AgencyID15 AnnualSalary15 GrossPay15
## 2656 BPD 1 2200-01-01 BPD 1 Police Department Police Department A99200 74517 82988
## 2657 BPD 10 2200-01-01 BPD 10 Police Department Police Department A99200 66086 104206
## 2658 BPD 11 2200-01-01 <NA> <NA> <NA> <NA> NA NA
## 2659 BPD 12 2200-01-01 <NA> <NA> <NA> <NA> NA NA
## 2660 BPD 13 2200-01-01 <NA> <NA> <NA> <NA> NA NA
## 2661 BPD 14 2200-01-01 <NA> <NA> <NA> <NA> NA NA
## JobTitle14 Agency14 AgencyDept14 AgencyID14 AnnualSalary14 GrossPay14 JobTitle13
## 2656 BPD Police Department Police Department A99200 60598 74946 BPD
## 2657 BPD Police Department Police Department A99200 56312 55117 BPD
## 2658 BPD Police Department Police Department A99200 65351 67579 BPD
## 2659 BPD Police Department Police Department A99200 60598 62915 BPD
## 2660 BPD Police Department Police Department A99200 64163 66005 BPD
## 2661 BPD Police Department Police Department A99200 64163 65174 BPD
## Agency13 AgencyDept13 AgencyID13 AnnualSalary13 GrossPay13 JobTitle12 Agency12
## 2656 Police Department Police Department A99334 61157 92411 BPD 1 Police Department
## 2657 Police Department Police Department A99334 65020 94672 BPD 10 Police Department
## 2658 Police Department Police Department A99334 69893 85584 BPD 11 Police Department
## 2659 Police Department Police Department A99347 65816 77536 BPD 12 Police Department
## 2660 Police Department Police Department A99295 62324 65310 BPD 13 Police Department
## 2661 Police Department Police Department A99347 62324 72900 BPD 14 Police Department
## AgencyDept12 AgencyID12 AnnualSalary12 GrossPay12 JobTitle11 Agency11 AgencyDept11
## 2656 Police Department A99 88565 93671 BPD Police Department Police Department
## 2657 Police Department A99 69893 77961 BPD Police Department Police Department
## 2658 Police Department A99 62905 74054 BPD Police Department Police Department
## 2659 Police Department A99 66982 81539 BPD Police Department Police Department
## 2660 Police Department A99 61157 104831 BPD Police Department Police Department
## 2661 Police Department A99 69893 79995 BPD Police Department Police Department
## AgencyID11 AnnualSalary11 GrossPay11
## 2656 A99 85372 89290
## 2657 A99 68523 85543
## 2658 A99 61672 64022
## 2659 A99 66810 103011
## 2660 A99 59388 95306
## 2661 A99 65669 85301
Over the year, there were 96 individuals whose names were “BPD”+digits. These individuals were all employees of the Police Department. These are the only records where individuals were represented under generic names in these data files. While we don’t know who these individuals are, we can still ask questions such as how many unnamed individuals were employed each year and how much they were paid each year. The following is a summary of answers to those questions. Again, I will leave the records as they are in the data set since they seem to represent real people and real salary expense.
Number of Employees (n) |
Minimum Gross Pay ($) |
1st Quarter Gross Pay ($) |
Median Gross Pay ($) |
Mean Gross Pay ($) |
3rd Quarter Gross Pay ($) |
Maximum Gross Pay ($) |
Total Gross Pay ($) |
|
---|---|---|---|---|---|---|---|---|
2011 | 75 | 43,200 | 63,600 | 72,100 | 74,500 | 85,000 | 127,000 | 5,591,031 |
2012 | 96 | 54,000 | 68,000 | 76,500 | 78,400 | 85,500 | 111,000 | 7,522,470 |
2013 | 52 | 59,300 | 73,100 | 82,200 | 84,700 | 95,400 | 113,000 | 4,402,887 |
2014 | 70 | 10,100 | 67,500 | 77,100 | 79,700 | 91,900 | 135,000 | 5,576,180 |
2015 | 10 | 59,300 | 73,100 | 75,600 | 79,700 | 92,000 | 104,000 | 797,282 |
In the boxplot above, each box presents the middle 50% of gross pays received by unnamed employees in a given year. The horizontal cut inside the box represents the median pay. The width of each box varies with respect to the number of unnamed employees serving in that year. That is, a wider box indicates a larger number of unnamed workers. The dots above and below each box represent cases of gross pays that were beyond 1.5 times the interquartile range (the middle 50% pay range) of gross pay of that particular year.
Our master data set shows that the City’s workforce had 14,019 employees in 2015, 18,985 employees in 2014, 19,140 employees in 2013, 16,052 employees in 2012, and 14,457 employees in 2011. So how was the workforce distributed among different City agencies?
We can answer this question by counting the number of employees listed under different agencies. For ease of analysis I will first slice the relevant data out of the master data set.
Agency1115 <- select(BmoreSalaries1115, Name, HireDate, contains("Agency1"), contains("GrossPay"))
sum_agency15 = as.data.frame(table(Agency1115$Agency15), responseName = "2015", stringsAsFactors = FALSE)
sum_agency14 = as.data.frame(table(Agency1115$Agency14), responseName = "2014", stringsAsFactors = FALSE)
sum_agency13 = as.data.frame(table(Agency1115$Agency13), responseName = "2013", stringsAsFactors = FALSE)
sum_agency12 = as.data.frame(table(Agency1115$Agency12), responseName = "2012", stringsAsFactors = FALSE)
sum_agency11 = as.data.frame(table(Agency1115$Agency11), responseName = "2011", stringsAsFactors = FALSE)
TotAgency1115 = merge(sum_agency11,sum_agency12,by="Var1",all = TRUE)
TotAgency1115 = merge(TotAgency1115,sum_agency13,by="Var1",all = TRUE)
TotAgency1115 = merge(TotAgency1115,sum_agency14,by="Var1",all = TRUE)
TotAgency1115 = merge(TotAgency1115,sum_agency15,by="Var1",all = TRUE)
names(TotAgency1115)[1]="Agency"
# Convert all NA's into 0
for (i in 2:6) {
idx <- which(is.na(TotAgency1115[[i]]))
TotAgency1115[[i]][idx] <- 0
}
The above plot shows that most agencies had a steady workforce over the years. One striking feature is the spike of “Youth Summer” employees in 2013 (4,725 participants) and 2014 (4,819 participants). It appears that the summer program, which gave participants work experience and a small stipend, had a modest enrollment in 2012 ( 1,755 participants) and flourished in 2013 and 2014. We know that the summer programs expanded in 2015, hosting more than 8000 youths. However, the 2015 program participants were either no longer considered as city employees or somehow their data were not incorporated into the data set like in the previous years.
Another notable feature of the plot is that the Police Department has consistently been the largest agencies in the City. Indeed, it had the largest city workforce in 2011 (3,428 employees), 2012 (3,475 employees), and 2015 (3,097 employees). Its head counts in 2013 and 2014 were only surpassed by the Summer Youth Program.
While maintaining its dominance in the City’s workforce, the police workforce has been on the decline since 2012. And it is one of the few agencies what were doing so in recent years. The readers may be interested in comparing this plot with the plot of salary spending, which will be examined in a later section.
Between 2011 and 2015 only the “Mayor’s Office” and “Recreation & Parks” experienced a steady increase in the workforce. Between the two agencies, “Recreation & Parks” was about one-and-a-half the size of the “Mayor’s Office”. Nonetheless, the “Mayor’s Office” had a salary budget twice that of “Recreation & Parks” (see salary analysis). This likely means that majority of the “Recreation & Parks” employees were part-time, and that the positions at the “Mayor’s Office” were mostly full-time.
Finally, you may wonder if “Youth Cust”, “Youth Summer”, and “Youth Temp Adult” should be grouped together since their names imply that these are programs for youths. Based on the data in the salary data files, it is not a clear call. First of all, each of these designations has a distinct three-digit AgencyID
. Secondly, they co-existed in four of the five years–2015 is the only year where all three programs had zero participant. Based on these observations, I decided to leave them as separate Agency
’s in the analysis.
Here are the numbers that generated the plot above.
2011 (n) |
2012 (n) |
2013 (n) |
2014 (n) |
2015 (n) |
|
---|---|---|---|---|---|
Circuit Court | 154 | 149 | 139 | 140 | 141 |
City Council | 88 | 90 | 87 | 93 | 96 |
Civil Rights & Wage Enforce | 0 | 0 | 12 | 12 | 11 |
Community Relations | 10 | 12 | 0 | 0 | 0 |
Comptroller’s Office | 91 | 86 | 95 | 90 | 91 |
Council Services | 5 | 6 | 6 | 6 | 6 |
Dept of Finance | 265 | 281 | 286 | 288 | 290 |
Dept of Health | 941 | 870 | 957 | 915 | 958 |
Dept of Public Works | 2,356 | 2,327 | 2,332 | 2,337 | 2,369 |
Dept of Transportation | 1,490 | 1,535 | 1,576 | 1,423 | 1,368 |
Elections | 6 | 11 | 8 | 8 | 8 |
Employee’s Retirement Sys | 27 | 28 | 29 | 33 | 32 |
Enoch Pratt Free Library | 442 | 439 | 475 | 463 | 473 |
Fire & Police Retirement Sys Admin | 21 | 22 | 22 | 25 | 27 |
Fire Department | 1,726 | 1,730 | 1,640 | 1,576 | 1,615 |
General Services | 396 | 411 | 400 | 389 | 357 |
Housing & Community Dev | 513 | 533 | 433 | 446 | 384 |
Human Resourcese | 52 | 55 | 59 | 66 | 66 |
Law Department | 97 | 105 | 107 | 110 | 108 |
Legislative Reference | 6 | 6 | 5 | 5 | 5 |
Liquor License Board | 30 | 36 | 29 | 20 | 24 |
Mayor’s Office | 424 | 401 | 672 | 687 | 724 |
Municipal & Zoning Appeals | 8 | 9 | 9 | 9 | 9 |
Office of Employment Dev | 224 | 208 | 195 | 165 | 162 |
Orphan’s Court | 10 | 11 | 12 | 11 | 10 |
Planning Department | 47 | 45 | 47 | 48 | 50 |
Police Department | 3,428 | 3,475 | 3,278 | 3,211 | 3,097 |
Recreation & Parks | 650 | 756 | 874 | 953 | 966 |
Sheriff’s Office | 175 | 182 | 191 | 206 | 210 |
Special City Services | 2 | 2 | 3 | 4 | 4 |
States Attorneys Office | 375 | 362 | 364 | 358 | 358 |
Temp Assist for Needy Families | 45 | 50 | 0 | 0 | 0 |
Wage Commissioner | 3 | 0 | 0 | 0 | 0 |
War Memorial Commission | 4 | 4 | 0 | 0 | 0 |
Youth Cust | 328 | 29 | 6 | 0 | 0 |
Youth Summer | 0 | 1,755 | 4,725 | 4,819 | 0 |
Youth Temp Adult | 18 | 31 | 67 | 69 | 0 |
Employee attrition is costly to an organization. Here I examined the retention rates of Baltimore City agencies. I first look at the retention of employees across all agencies as a whole, and then break down the numbers by agency.
I decided to count an employee as retained only if he/she remained working for the same agency from one year to the next. This means that if an employee left the agency either by transferring to another agency within the City government or by dropping out of the City workforce altogether, he/she would be counted toward the attrition.
First, let’s calculate the size of the workforce for each year of the data we have.
# Workforce in '15
Workforce15 = sum(!is.na(Agency1115$Agency15)) # [1] 14019
# Workforce in '14
Workforce14 = sum(!is.na(Agency1115$Agency14)) # [1] 18985
# Workforce in '13
Workforce13 = sum(!is.na(Agency1115$Agency13)) # [1] 19140
# Workforce in '12
Workforce12 = sum(!is.na(Agency1115$Agency12)) # [1] 16052
# Workforce in '11
Workforce11 = sum(!is.na(Agency1115$Agency11)) # [1] 14457
Next, I am going to calculate, with each year’s passing,
AgencyOnly1115 <- Agency1115 %>% select(Name, HireDate, contains("Agency1"))
AgencyOnly1115 <- AgencyOnly1115 %>% mutate(same_agency1112=Agency11==Agency12)
AgencyOnly1115 = AgencyOnly1115 %>% mutate(same_agency1213=Agency12==Agency13)
AgencyOnly1115 = AgencyOnly1115 %>% mutate(same_agency1314=Agency13==Agency14)
AgencyOnly1115 = AgencyOnly1115 %>% mutate(same_agency1415=Agency14==Agency15)
# Drop out of City workforce '11 to '12
NLeftTot11 = sum(!is.na(AgencyOnly1115$Agency11) & is.na(AgencyOnly1115$Agency12)) # [1] 1845
LeftTot11 = NLeftTot11/Workforce11 # [1] 0.13
# New hires to the City workforce `11 to `12
NNewHires12 = sum(!is.na(AgencyOnly1115$Agency12) & is.na(AgencyOnly1115$Agency11)) # [1] 3440
NewHires12 = NNewHires12/Workforce12 # [1] 0.21
# Switched agencies from '11 - '12
NSwitchAgency1112 = sum(!AgencyOnly1115$same_agency1112, na.rm=T) # [1] 84
SwitchAgency1112 = NSwitchAgency1112/Workforce11 # [1] 0.0058
# Stayed with the same agencies from '11 - '12
NSameAgency1112 = sum(AgencyOnly1115$same_agency1112, na.rm=T) # [1] 12528
# Employee retention rate after one year (ref. 2011)
Retain1112 = NSameAgency1112/Workforce11 # [1] 0.87
# Let's check the numbers:
Workforce11 - NLeftTot11 + NNewHires12
## [1] 16052
Workforce12
## [1] 16052
Workforce11 - NLeftTot11 - NSwitchAgency1112
## [1] 12528
NSameAgency1112
## [1] 12528
# ------------------------------------------------
# Drop off from '12 to '13
NLeftTot12 = sum(!is.na(AgencyOnly1115$Agency12) & is.na(AgencyOnly1115$Agency13)) # [1] 2754
LeftTot12 = NLeftTot12/Workforce12 # [1] 0.17
# New hires to the City workforce `12 to `13
NNewHires13 = sum(!is.na(AgencyOnly1115$Agency13) & is.na(AgencyOnly1115$Agency12)) # [1] 5842
NewHires13 = NNewHires13/Workforce13 # [1] 0.31
# Switched agencies from '12 - '13
NSwitchAgency1213 = sum(!AgencyOnly1115$same_agency1213, na.rm=T) # [1] 337
SwitchAgency1213 = NSwitchAgency1213/Workforce12 # [1] 0.021
# Stayed with the same agencies from '12 - '13
NSameAgency1213 = sum(AgencyOnly1115$same_agency1213, na.rm=T) # [1] 12961
# City worker retention rate after one year (ref. 2012)
Retain1213 = NSameAgency1213/Workforce12 # [1] 0.81
# Let's check the numbers:
Workforce12 - NLeftTot12 + NNewHires13
## [1] 19140
Workforce13
## [1] 19140
Workforce12 - NLeftTot12 - NSwitchAgency1213
## [1] 12961
NSameAgency1213
## [1] 12961
# Stayed with same agency from '11 - '13
NSameAgency1113 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1112, na.rm=T) # [1] 11070
# City worker retention rate after two year (ref. 2011)
Retain1113 = NSameAgency1113/Workforce11 # [1] 0.77
# ------------------------------------------------
# Drop off from '13 to '14
NLeftTot13 = sum(!is.na(AgencyOnly1115$Agency13) & is.na(AgencyOnly1115$Agency14)) # [1] 4799
LeftTot13 = NLeftTot13/Workforce13 # [1] 0.25
# New hires to the City workforce `13 to `14
NNewHires14 = sum(!is.na(AgencyOnly1115$Agency14) & is.na(AgencyOnly1115$Agency13)) # [1] 4644
NewHires14 = NNewHires14/Workforce14 # [1] 0.24
# Switched agencies from '13 - '14
NSwitchAgency1314 = sum(!AgencyOnly1115$same_agency1314, na.rm=T) # [1] 86
SwitchAgency1314 = NSwitchAgency1314/Workforce13 # [1] 0.0045
# Stayed with same agency from '13 - '14
NSameAgency1314 = sum(AgencyOnly1115$same_agency1314, na.rm=T) # [1] 14255
# City worker retention rate after one year (ref. 2013)
Retain1314 = NSameAgency1314/Workforce13 # [1] 0.74
# Let's check the numbers:
Workforce13 - NLeftTot13 + NNewHires14
## [1] 18985
Workforce14
## [1] 18985
Workforce13 - NLeftTot13 - NSwitchAgency1314
## [1] 14255
NSameAgency1314
## [1] 14255
# Stayed with same agency from '12 - '14
NSameAgency1214 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1314, na.rm=T) # [1] 11295
# City worker retention rate after two year (ref. 2012)
Retain1214 = NSameAgency1214/Workforce12 # [1] 0.7
# Stayed with same agency from '11 - '14
NSameAgency1114 = sum(AgencyOnly1115$same_agency1112 & AgencyOnly1115$same_agency1213
& AgencyOnly1115$same_agency1314, na.rm=T) # [1] 10076
# City worker retention rate after three year (ref. 2011)
Retain1114 = NSameAgency1114/Workforce11 # [1] 0.7
# ------------------------------------------------
# Drop off from '14 to '15
NLeftTot14 = sum(!is.na(AgencyOnly1115$Agency14) & is.na(AgencyOnly1115$Agency15)) # [1] 6491
LeftTot14 = NLeftTot14/Workforce14 # [1] 0.34
NNewHires15 = sum(!is.na(AgencyOnly1115$Agency15) & is.na(AgencyOnly1115$Agency14)) # [1] 1525
NewHires15 = NNewHires15/Workforce15 # [1] 0.11
# Switched agencies from '14 - '15
NSwitchAgency1415 = sum(!AgencyOnly1115$same_agency1415, na.rm=T) # [1] 191
SwitchAgency1415 = NSwitchAgency1415/Workforce14 # [1] 0.01
# Stayed with same agency from '14 - '15
NSameAgency1415 = sum(AgencyOnly1115$same_agency1415, na.rm=T) # [1] 12303
# City worker retention rate after one year (ref. 2014)
Retain1415 = NSameAgency1415/Workforce14 # [1] 0.65
# Let's check the numbers:
Workforce14 - NLeftTot14 + NNewHires15
## [1] 14019
Workforce15
## [1] 14019
Workforce14 - NLeftTot14 - NSwitchAgency1415
## [1] 12303
NSameAgency1415
## [1] 12303
# Stayed with same agency from '13 - '15
NSameAgency1315 = sum(AgencyOnly1115$same_agency1314 & AgencyOnly1115$same_agency1415, na.rm=T) # [1] 11143
# City worker retention rate after two years (ref. 2013)
Retain1315 = NSameAgency1315/Workforce13 # [1] 0.58
# Stayed with same agency from '12 - '15
NSameAgency1215 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1314
& AgencyOnly1115$same_agency1415, na.rm=T) # [1] 9956
# City worker retention rate after three years (ref. 2012)
Retain1215 = NSameAgency1215/Workforce12 # [1] 0.62
# Stayed with same agency from '11 - '15
NSameAgency1115 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1112
& AgencyOnly1115$same_agency1314 & AgencyOnly1115$same_agency1415, na.rm=T) # [1] 9167
# City worker retention rate after four years (ref. 2011)
Retain1115 = NSameAgency1115/Workforce11 # [1] 0.63
Let’s organize the numbers into tables and visualize them.
Beginning Workforce (n) |
New Hires (n) |
Left City After One Yr (n) |
Switched Agencies (n) |
Same Agencies After One Yr (n) |
Same Agencies After Two Yrs (n) |
Same Agencies After Three Yrs (n) |
Same Agencies After Four Yrs (n) |
|
---|---|---|---|---|---|---|---|---|
2011 | 14,457 | NA | 1,845 | 84 | 12,528 | 11,070 | 10,076 | 9,167 |
2012 | 16,052 | 3,440 | 2,754 | 337 | 12,961 | 11,295 | 9,956 | NA |
2013 | 19,140 | 5,842 | 4,799 | 86 | 14,255 | 11,143 | NA | NA |
2014 | 18,985 | 4,644 | 6,491 | 191 | 12,303 | NA | NA | NA |
2015 | 14,019 | 1,525 | NA | NA | NA | NA | NA | NA |
Beginning Workforce (%) |
Hiring Rates (%) |
Leaving Rates (%) |
Switching Rates (%) |
Remained After One Yr (%) |
Remained After Two Yrs (%) |
Remained After Three Yrs (%) |
Remained After Four Yrs (%) |
|
---|---|---|---|---|---|---|---|---|
2011 | 100 | NA | 12.76 | 0.58 | 86.66 | 76.57 | 69.7 | 63.41 |
2012 | 100 | 21.43 | 17.16 | 2.1 | 80.74 | 70.37 | 62.02 | NA |
2013 | 100 | 30.52 | 25.07 | 0.45 | 74.48 | 58.22 | NA | NA |
2014 | 100 | 24.46 | 34.19 | 1.01 | 64.8 | NA | NA | NA |
2015 | 100 | 10.88 | NA | NA | NA | NA | NA | NA |
At first glance, one may be surprised by the appearance of progressively worsening retention rates of the City workforce over the years. Indeed, you may even question whether these numbers are believable. I had the same question, and here is what I found: The retention rate of 65% from 2014 to 2015 seems really low. However, that’s because we did not account for the fact that youth programs accounted for 4,888 employees in 2014 and none in 2015. In fact, those 4,888 participants amounted to 75% of the workforce reduction in 2015. So, much of the workforce fluctuation we see in this plot were related to the three youth programs the City hosted between 2011 and 2014.
Knowing that the above statistics are biased by the youth programs, which by design were not meant to keep participants on long-term basis, I decided to recalculate the statistics for City employees who were not participants of any of the three youth programs over the years.
Before we leave the summary table and plot though, I want to point out that over the years the number of employees who switched jobs among different City agencies remained small, between 0.45% and 2.1%. This result carries two pieces of information: First, it shows that the majority of employee attrition was due to employees leaving the City workforce all together. Secondly, it shows that City workers, if they chose to stay working for the City, liked to stay with the same agencies.
In the following, let’s isolate a data set to include only City workers who were not participants of any of the three youth programs.
youth_idx = grepl("Youth", Agency1115$Agency14) | grepl("Youth", Agency1115$Agency13) |
grepl("Youth", Agency1115$Agency12) | grepl("Youth", Agency1115$Agency11)
AgencyNoYouth1115 = AgencyOnly1115[!youth_idx,]
Removing youth program participants reduces the number of records from 29,425 to 20,134 in the years between 2011 and 2015. Let’s run the same analysis as before and look at the numbers.
Beginning Workforce (n) |
New Hires (n) |
Left City After One Yr (n) |
Switched Agencies (n) |
Same Agencies After One Yr (n) |
Same Agencies After Two Yrs (n) |
Same Agencies After Three Yrs (n) |
Same Agencies After Four Yrs (n) |
|
---|---|---|---|---|---|---|---|---|
2011 | 14,108 | NA | 1,511 | 78 | 12,519 | 11,063 | 10,069 | 9,167 |
2012 | 14,235 | 1,638 | 1,640 | 321 | 12,274 | 11,012 | 9,956 | NA |
2013 | 14,325 | 1,730 | 1,737 | 71 | 12,517 | 11,136 | NA | NA |
2014 | 14,067 | 1,479 | 1,612 | 171 | 12,284 | NA | NA | NA |
2015 | 13,960 | 1,505 | NA | NA | NA | NA | NA | NA |
Beginning Workforce (%) |
Hiring Rates (%) |
Leaving Rates (%) |
Switching Rates (%) |
Remained After One Yr (%) |
Remained After Two Yrs (%) |
Remained After Three Yrs (%) |
Remained After Four Yrs (%) |
|
---|---|---|---|---|---|---|---|---|
2011 | 100 | NA | 10.71 | 0.55 | 88.74 | 78.42 | 71.37 | 64.98 |
2012 | 100 | 11.51 | 11.52 | 2.26 | 86.22 | 77.36 | 69.94 | NA |
2013 | 100 | 12.08 | 12.13 | 0.5 | 87.38 | 77.74 | NA | NA |
2014 | 100 | 10.51 | 11.46 | 1.22 | 87.32 | NA | NA | NA |
2015 | 100 | 10.78 | NA | NA | NA | NA | NA | NA |
This plot shows a stable City workforce, which is what most of us expect. Specifically, it shows that the City has a one-year retention rate of 87%, a two-year retention rate of 78%, a three-year retention rate of 71%, and a four-year retention rate of 65%. That the attrition rate decreases over the year shows that the workforce stabilizes as worker tenure increases.
Let’s break down the aggregate numbers to get a clearer picture of how the City’s workforce changed over the years. Note, I included Youth Programs participants in the following analysis since it is useful to know whether a Youth Program participant stayed with the same program or switched to work for a different City agency.
same_Agency1112 = as.data.frame(table(AgencyOnly1115$Agency12[AgencyOnly1115$same_agency1112==T]),
responseName = "11-12", stringsAsFactors = FALSE)
same_Agency1113 = as.data.frame(table(AgencyOnly1115$Agency13[AgencyOnly1115$same_agency1112==T
& AgencyOnly1115$same_agency1213==T]),
responseName = "11-13", stringsAsFactors = FALSE)
same_Agency1114 = as.data.frame(table(AgencyOnly1115$Agency14[AgencyOnly1115$same_agency1112==T
& AgencyOnly1115$same_agency1213==T
& AgencyOnly1115$same_agency1314==T]),
responseName = "11-14", stringsAsFactors = FALSE)
same_Agency1115 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1112==T
& AgencyOnly1115$same_agency1213==T
& AgencyOnly1115$same_agency1314==T
& AgencyOnly1115$same_agency1415==T]),
responseName = "11-15", stringsAsFactors = FALSE)
same_Agency1213 = as.data.frame(table(AgencyOnly1115$Agency13[AgencyOnly1115$same_agency1213==T]),
responseName = "12-13", stringsAsFactors = FALSE)
same_Agency1214 = as.data.frame(table(AgencyOnly1115$Agency14[AgencyOnly1115$same_agency1213==T
& AgencyOnly1115$same_agency1314==T]),
responseName = "12-14", stringsAsFactors = FALSE)
same_Agency1215 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1213==T
& AgencyOnly1115$same_agency1314==T
& AgencyOnly1115$same_agency1415==T]),
responseName = "12-15", stringsAsFactors = FALSE)
same_Agency1314 = as.data.frame(table(AgencyOnly1115$Agency14[AgencyOnly1115$same_agency1314==T]),
responseName = "13-14", stringsAsFactors = FALSE)
same_Agency1315 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1314==T
& AgencyOnly1115$same_agency1415==T]),
responseName = "13-15", stringsAsFactors = FALSE)
same_Agency1415 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1415==T]),
responseName = "14-15", stringsAsFactors = FALSE)
TotSameAgency1115 = merge(same_Agency1112,same_Agency1113,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1114,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1115,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1213,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1214,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1215,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1314,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1315,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1415,by="Var1",all = TRUE)
names(TotSameAgency1115)[1]="Agency"
# Convert all NA's into 0
for (i in 2:11) {
idx <- which(is.na(TotSameAgency1115[[i]]))
TotSameAgency1115[[i]][idx] <- 0
}
# dim(TotSameAgency1115)
# The Agencies "Wage Commissioner" and "Youth Cust" did not retain any employees over the years
RetentionRateByAgency = rbind(TotSameAgency1115, c("Wage Commissioner", numeric(10)), c("Youth Cust", numeric(10))) %>%
arrange(Agency)
for (i in 2:11) {
RetentionRateByAgency[[i]] <- as.numeric(RetentionRateByAgency[[i]])
}
RetentionRateByAgency$X11.12=RetentionRateByAgency$X11.12/TotAgency1115$X2011*100
RetentionRateByAgency$X11.13=RetentionRateByAgency$X11.13/TotAgency1115$X2011*100
RetentionRateByAgency$X11.14=RetentionRateByAgency$X11.14/TotAgency1115$X2011*100
RetentionRateByAgency$X11.15=RetentionRateByAgency$X11.15/TotAgency1115$X2011*100
RetentionRateByAgency$X12.13=RetentionRateByAgency$X12.13/TotAgency1115$X2012*100
RetentionRateByAgency$X12.14=RetentionRateByAgency$X12.14/TotAgency1115$X2012*100
RetentionRateByAgency$X12.15=RetentionRateByAgency$X12.15/TotAgency1115$X2012*100
RetentionRateByAgency$X13.14=RetentionRateByAgency$X13.14/TotAgency1115$X2013*100
RetentionRateByAgency$X13.15=RetentionRateByAgency$X13.15/TotAgency1115$X2013*100
RetentionRateByAgency$X14.15=RetentionRateByAgency$X14.15/TotAgency1115$X2014*100
The following set of plots shows the retention rates of each of the 37 City agencies over one-, two-, three-, and four-year periods.
Clearly, those agencies that achieved 100% retention rate were doing a good job in retaining their employees. At the same time, these agencies were relatively small, most of them had less than 10 employees (see workforce plot), so it is not entirely fair to compare them with agencies that had thousands of employees, such as the Police Department, the Department of Public Works, or the Fire Department.
Those who experience 0% retention rate can be separated into two groups: In one group the agencies (e.g., Temp Assist for Needy Families) simply no longer existed; in the other group the agencies (e.g., the youth programs) simply did not mean to keep employees on long term basis. As a result, the 0% retention rate was not too surprising.
The Fire Department had the second highest retention rate over a four-year period. Given that it had over 1500 employees over the years, this is indeed an achievement. The third highest retention rate belongs to the Dept of Public Works, which had over 2300 employees and among whom was the longest-tenure employee in all City agencies.
First let’s take a look at the pay distribution over the years.
gross_pay1115 <- Agency1115 %>%
select(GrossPay11, GrossPay12, GrossPay13, GrossPay14, GrossPay15) %>%
gather(Yr, GrossPay, na.rm=TRUE)
lut <- c("GrossPay15"="2015","GrossPay14"="2014","GrossPay13"="2013",
"GrossPay12"="2012","GrossPay11"="2011")
gross_pay1115$Yr <- lut[gross_pay1115$Yr]
The distributions for all five years clearly skew to the right (the end for higher gross pay), suggesting that a small number of employees received much higher pay than others–we expect them to be the managers, heads of departments/agencies.
In addition, the distributions are bimodal and possibly trimodal. The peaks on the left in 2012 - 2014’s distributions are most likely due to the large number of participants of the youth programs. The antimode/valley at ~$20,000 might be a natural breaking point between part-time and full-time employees. Among those who were paid ~$20,000 and above, the distributions shows two peaks: The main peak with lower pay corresponds to the rank-and-file employees, while the other peak with higher pay represents those in leadership positions. Given that a position’s pay also depends on the nature of the work and tenure of the employee, the division between the two groups is not likely to be clear-cut.
Let’s look at a numerical summary of the pay distributions.
GrossPay1115 <- select(Agency1115, GrossPay11, GrossPay12, GrossPay13, GrossPay14, GrossPay15)
summary(GrossPay1115)
## GrossPay11 GrossPay12 GrossPay13 GrossPay14 GrossPay15
## Min. : 7 Min. : -77 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 27609 1st Qu.: 23871 1st Qu.: 24536 1st Qu.: 22065 1st Qu.: 34666
## Median : 41100 Median : 43061 Median : 44001 Median : 45267 Median : 52753
## Mean : 42323 Mean : 43176 Mean : 44313 Mean : 45634 Mean : 55782
## 3rd Qu.: 58322 3rd Qu.: 63551 3rd Qu.: 65615 3rd Qu.: 67816 3rd Qu.: 75768
## Max. :170423 Max. :245353 Max. :238681 Max. :238772 Max. :235768
## NA's :14968 NA's :13373 NA's :13504 NA's :13664 NA's :15675
You may notice the negative pays in the 2012 data. We have discussed them in the anomalies section of this report, so I will not repeat it here. The NA counts in each year’s summary are due to combining all five year’s data into one master data set–not all employees worked for the City for all five years.
The above summary shows that median salary of City employees rose above $50,000 for the first time in 2015. At first, I did not think that this was due to a pay raise given to City employees in 2015. Instead, remembering that 2015 data included zero youth program participant instead of the thousands we saw in previous years, I assumed that removing thousands of lowest pay entries from the 2015 distribution would naturally resulted in a higher median value for the 2015 pay distribution.
To check this hypothesis, I look at the pay summary of City employees who were not youth program participants.
NYGrossPay1115 <- Agency1115[!youth_idx,] %>% select(GrossPay11, GrossPay12, GrossPay13,
GrossPay14, GrossPay15)
summary(NYGrossPay1115)
## GrossPay11 GrossPay12 GrossPay13 GrossPay14 GrossPay15
## Min. : 60 Min. : -77 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 28406 1st Qu.: 31899 1st Qu.: 32082 1st Qu.: 33075 1st Qu.: 34862
## Median : 41821 Median : 47224 Median : 47859 Median : 50031 Median : 52943
## Mean : 43316 Mean : 48544 Mean : 49483 Mean : 51866 Mean : 55983
## 3rd Qu.: 58715 3rd Qu.: 65584 3rd Qu.: 67802 3rd Qu.: 70674 3rd Qu.: 75866
## Max. :170423 Max. :245353 Max. :238681 Max. :238772 Max. :235768
## NA's :6026 NA's :5899 NA's :5916 NA's :6307 NA's :6440
Clearly, my assumption was not correct! On the contrary, after removing youth programs participants, the remaining distributions suggest a steady increase in salary for City employees every year since 2011!
Let’s look at the overall pay distributions without youth programs participants. In the following histograms, the solid vertical line in the middle marks the median pay, and the dashed vertical lines mark the 25 and 75 percentiles of pay respectively.
In the following, I aggregate salary data based on the agency of the employee, and examine the relation between number of positions and total salary spending by agency.
SalaryExpend15 <- Agency1115 %>% group_by(Agency15) %>% summarise(TotExpend15 = sum(GrossPay15, na.rm=TRUE)) %>% na.omit()
SalaryExpend14 <- Agency1115 %>% group_by(Agency14) %>% summarise(TotExpend14 = sum(GrossPay14, na.rm=TRUE)) %>% na.omit()
SalaryExpend13 <- Agency1115 %>% group_by(Agency13) %>% summarise(TotExpend13 = sum(GrossPay13, na.rm=TRUE)) %>% na.omit()
SalaryExpend12 <- Agency1115 %>% group_by(Agency12) %>% summarise(TotExpend12 = sum(GrossPay12, na.rm=TRUE)) %>% na.omit()
SalaryExpend11 <- Agency1115 %>% group_by(Agency11) %>% summarise(TotExpend11 = sum(GrossPay11, na.rm=TRUE)) %>% na.omit()
TotSalaryExpend1115 = merge(SalaryExpend11,SalaryExpend12,by.x="Agency11", by.y="Agency12",all = TRUE)
TotSalaryExpend1115 = merge(TotSalaryExpend1115,SalaryExpend13,by.x="Agency11", by.y="Agency13",all = TRUE)
TotSalaryExpend1115 = merge(TotSalaryExpend1115,SalaryExpend14,by.x="Agency11", by.y="Agency14",all = TRUE)
TotSalaryExpend1115 = merge(TotSalaryExpend1115,SalaryExpend15,by.x="Agency11", by.y="Agency15",all = TRUE)
names(TotSalaryExpend1115)[1]="Agency"
# Convert all NA's into 0
for (i in 2:6) {
idx <- which(is.na(TotSalaryExpend1115[[i]]))
TotSalaryExpend1115[[i]][idx] <- 0
}
Salary spending of the different departments had been steady, and a few departments even saw a moderate increase over the years. One striking feature, if you recall the number of positions by agency plot earlier, is the almost none-existent spending on the youth programs in comparison with other City agencies. This reinforce my belief that participants of the youth programs were not the traditional employees in other City agencies.
While most of the increase were relatively modest and gradual, two features jump out of the plot. First, spending of the Mayor’s Office was double from 2012 to 2013 and had stayed at that level since. Secondly, spending at the Police Department experienced two stages of large increase: one from 2011 to 2012, and the other from 2014 to 2015. Each of these increases was about $30,000,000.
In comparison with the number of positions by agency plot, which shows a steady decrease of the police workforce since 2012, the rather large salary spending increase by the Police Department in 2015 is even more intriguing. Did the increase impact majority of the personnel in the police workforce? Or, was it the result of hiring a few top managers? I will answer this question in a later section.
Here are the numbers that generated the plot above.
2011 ($) |
2012 ($) |
2013 ($) |
2014 ($) |
2015 ($) |
|
---|---|---|---|---|---|
Circuit Court | 6,992,230 | 7,418,225 | 6,905,444 | 6,930,433 | 7,600,118 |
City Council | 2,930,632 | 3,220,308 | 3,486,672 | 3,642,744 | 3,699,890 |
Civil Rights & Wage Enforce | 0 | 0 | 641,071 | 568,643 | 501,455 |
Community Relations | 431,928 | 609,175 | 0 | 0 | 0 |
Comptroller’s Office | 4,064,869 | 4,602,130 | 4,743,322 | 4,665,648 | 5,098,746 |
Council Services | 321,864 | 382,368 | 443,624 | 365,916 | 388,532 |
Dept of Finance | 10,617,647 | 12,368,165 | 12,860,218 | 13,376,656 | 14,156,029 |
Dept of Health | 29,572,629 | 32,256,450 | 31,131,996 | 30,085,148 | 31,313,971 |
Dept of Public Works | 84,146,174 | 94,566,693 | 95,628,158 | 99,802,845 | 106,487,490 |
Dept of Transportation | 42,344,106 | 48,484,935 | 47,636,882 | 48,785,120 | 51,615,941 |
Elections | 57,217 | 143,771 | 96,500 | 87,038 | 74,679 |
Employee’s Retirement Sys | 1,224,019 | 1,277,099 | 1,310,384 | 1,575,017 | 1,757,640 |
Enoch Pratt Free Library | 14,084,339 | 16,056,586 | 16,891,040 | 17,076,783 | 17,344,375 |
Fire & Police Retirement Sys Admin | 1,019,820 | 1,200,031 | 1,224,966 | 1,332,487 | 1,581,667 |
Fire Department | 101,253,987 | 113,689,987 | 113,686,926 | 121,142,250 | 125,612,177 |
General Services | 16,069,391 | 18,692,706 | 19,551,601 | 19,402,148 | 17,874,549 |
Housing & Community Dev | 20,302,024 | 23,216,475 | 18,515,544 | 18,877,111 | 18,598,793 |
Human Resourcese | 2,108,471 | 2,271,436 | 2,625,640 | 3,109,768 | 3,062,501 |
Law Department | 5,720,770 | 6,609,865 | 6,776,727 | 6,963,661 | 7,566,482 |
Legislative Reference | 320,193 | 358,489 | 274,506 | 292,461 | 326,889 |
Liquor License Board | 1,191,696 | 1,414,959 | 1,304,304 | 863,551 | 1,037,420 |
Mayor’s Office | 14,657,869 | 15,676,067 | 29,291,244 | 31,350,202 | 34,096,255 |
Municipal & Zoning Appeals | 206,074 | 290,158 | 268,809 | 240,884 | 245,083 |
Office of Employment Dev | 8,737,769 | 9,126,557 | 9,241,597 | 8,113,639 | 7,988,847 |
Orphan’s Court | 228,469 | 317,985 | 342,348 | 311,950 | 328,993 |
Planning Department | 2,110,358 | 2,513,763 | 2,470,085 | 2,444,078 | 2,919,818 |
Police Department | 199,573,347 | 229,243,749 | 230,392,751 | 228,004,357 | 256,359,188 |
Recreation & Parks | 13,458,586 | 15,113,724 | 15,048,382 | 16,059,910 | 17,008,779 |
Sheriff’s Office | 7,325,127 | 8,561,306 | 9,110,996 | 10,199,283 | 10,839,221 |
Special City Services | 22,831 | 31,774 | 41,331 | 21,215 | 35,563 |
States Attorneys Office | 19,729,108 | 21,037,262 | 21,653,893 | 21,593,103 | 21,486,610 |
Temp Assist for Needy Families | 75,982 | 106,872 | 0 | 0 | 0 |
Wage Commissioner | 154,204 | 0 | 0 | 0 | 0 |
War Memorial Commission | 144,577 | 166,438 | 0 | 0 | 0 |
Youth Cust | 502,534 | 52,877 | 3,640 | 0 | 0 |
Youth Summer | 0 | 1,749,083 | 1,635,391 | 1,719,443 | 0 |
Youth Temp Adult | 167,336 | 227,664 | 269,957 | 235,644 | 0 |
Combining results from the employment and salary analyses above, the following plot explores the relation between the City’s workforce and total salary spending.
In the plot above, the colors of the dots are different for different years, and the sizes of the dots are proportional to the sizes of the workforce. The visualization reinforces some of the results we obtained earlier: The years 2013 and 2014 had particularly large workforce because of the addition of youth programs participants. At the same time, these were not traditional employees since they certainly weren’t paid much–here we see that the total salary spending didn’t increase much from 2012 to 2014 despite the fact that thousands of people were added to the payroll. In 2015, youth programs participants weren’t included in the salary data file, and so the City’s workforce appears to have shrunk considerably from 2014 to 2015. Nonetheless, the overall increase in salary spending from 2014 to 2015 is one of the largest over the years. Here are the numbers.
Size of Workforce (n) |
Salary Spending ($) |
|
---|---|---|
2011 | 14,457 | 611,868,176 |
2012 | 16,052 | 693,055,130 |
2013 | 19,140 | 705,505,949 |
2014 | 18,985 | 719,239,135 |
2015 | 14,019 | 767,007,701 |
Let’s look at the numbers at a more granular level and examine salary spending vs. number of employees by agency.
The above is a log-log plot. It shows each year’s data in a different color. Visually most of the data points lie along a straight line, so I chose to fit a power law to the data. The best fit lines are displayed in the same colors as the data that were used to derive them. One may notice that the power law describes the 2015 data particularly well because there are few outlier points. The best fit parameter values, RMSE, and the \(R^2\) of the fits are given in the following table.
Coefficient | Exponent | RMSE | R-sq | |
---|---|---|---|---|
2011 | 24.82 | 1.04 | 0.85 | 0.86 |
2012 | 27.52 | 1.02 | 1.01 | 0.8 |
2013 | 28.26 | 1.01 | 1.2 | 0.75 |
2014 | 39.1 | 0.96 | 1.05 | 0.77 |
2015 | 29.14 | 1.08 | 0.49 | 0.95 |
The increase in the Police Department’s salary spending from 2014 to 2015 is most intriguing, so I chose it as the subject of my focus study. The aim was to find out where the extra spending went into during this time period. First, let’s extract a subset of the data for those who were in the police workforce in 2014 or 2015 and calculate the relevant statistics.
police1415 = BmoreSalaries1115 %>%
subset(BmoreSalaries1115$Agency15=="Police Department"
| BmoreSalaries1115$Agency14=="Police Department") %>%
select(Name, HireDate, JobTitle15, JobTitle14, Agency15, Agency14,
GrossPay15, GrossPay14, AnnualSalary15, AnnualSalary14)
policeWF14 = sum(police1415$Agency14=="Police Department", na.rm=T) # [1] 3211
policeWF15 = sum(police1415$Agency15=="Police Department", na.rm=T) # [1] 3097
LeftIdx = (police1415$Agency14=="Police Department") & ((police1415$Agency15!="Police Department")
| is.na(police1415$Agency15))
NLeft = sum(LeftIdx, na.rm=T) # [1] 328
RetainIdx = (police1415$Agency14=="Police Department") & (police1415$Agency15=="Police Department")
NRetain = sum(RetainIdx, na.rm=T) # [1] 2883
RetainRate = NRetain/policeWF14 # [1] 0.9
NewHireIdx = (police1415$Agency15=="Police Department") & (is.na(police1415$Agency14)
| (police1415$Agency14!="Police Department"))
NNewHire = sum(NewHireIdx) # [1] 214
The police workforce in 2014 and 2015 were 3,211 and 3,097 respectively. 328 individuals left the department in 2014, and 214 were hired into the department in 2015. Overall, the workforce retention rate from 2014 to 2015 was 90%. The following plot shows the department’s salary distributions.
Clearly, both years’ distributions skew to the right, meaning that a few individuals received much higher paid than most of the other employees. In general, the 2015 distribution appears to have shifted to the right, which indicates that the increase spending had a broad impact. Let’s look more specifically at the distribution of the pay difference in the plot below.
The above plot shows that for those who remained in the police workforce from 2014 to 2015, the distribution of pay difference was nearly normally distributed. The average of the pay difference was $13,700, and the median of the pay difference was $12,000. The 25 and 75 percentiles were at $6,740 and $18,700. Again, this plot reinforces our earlier conclusion that the increase in salary spending had impact on majority of the personnel in the Police Department.
Finally, let’s take a look at the data of those who garnered the largest pay increase from 2014 to 2015.
police1415 <- police1415 %>%
mutate(PayInc=police1415$GrossPay15-police1415$GrossPay14) %>%
arrange(desc(PayInc))
police1415 %>%
select(Name, HireDate, JobTitle15, JobTitle14, GrossPay15, GrossPay14,
AnnualSalary15, AnnualSalary14, PayInc) %>%
head(n=10)
## Name HireDate JobTitle15 JobTitle14 GrossPay15 GrossPay14
## 1 Banach,Paul F 2014-05-29 Operations Officer III POLICE COMMAND STAFF I 105820 3923
## 2 Swinton,Dwayne L 1990-10-23 POLICE SERGEANT POLICE SERGEANT 134567 40409
## 3 De Jesus,Michael 1994-02-09 POLICE OFFICER POLICE OFFICER 100900 8049
## 4 Adams,Damon R 2003-10-23 POLICE OFFICER POLICE OFFICER 97765 7688
## 5 Gladstone,Keith A 1992-11-20 POLICE SERGEANT POLICE SERGEANT 132188 52263
## 6 Fleurimond,Richard 1999-10-12 POLICE OFFICER (EID) POLICE OFFICER (EID) 166056 86326
## 7 Walrath,Mark A 1992-05-05 POLICE LIEUTENANT EID POLICE LIEUTENANT EID 194656 115639
## 8 Camarote,Marc J 1996-10-03 POLICE SERGEANT EID POLICE SERGEANT EID 210509 136940
## 9 Paradise,John J 1994-03-28 POLICE LIEUTENANT EID POLICE LIEUTENANT EID 161310 92518
## 10 Leitch,Eric R 2000-02-10 POLICE SERGEANT POLICE SERGEANT 182621 119683
## AnnualSalary15 AnnualSalary14 PayInc
## 1 101200 102000 101897
## 2 90273 81327 94158
## 3 76496 68322 92852
## 4 68013 62380 90078
## 5 88682 79249 79925
## 6 74290 66320 79730
## 7 103808 92706 79017
## 8 86874 77564 73568
## 9 101999 91156 68793
## 10 84130 75288 62938
It appears that those who received the largest pay increase were not new hires. Instead, for some of them, for some reason, their were paid well below the projected salaries in 2014, and in 2015 they received the normal pay for their positions. At the same time, there were also others who received close to $80,000 of pay increase on top of their normal salary, resulting in double the amount of salary for their positions. Could those be overtime pay?
Whatever the reason behind the pay increase, it appears that the pay scale of the Police Department is higher than that of other agencies. The two boxplots below compare the salary distributions of the Police Department and all City agencies in 2014 and 2015. It shows that Police Department had a higher median pay than the rest of the City and, indeed, it got a larger pay increase from 2014 to 2015 than the rest of the City.
So far in this project, I have examined the employment and salary statistics of Baltimore City employees based on the 2011-2015 data from the Baltimore Open Data Portal.
In terms of employment statistics, I found that the Police Department had the largest number of employees among all City agencies over the 2011-2015 time period. The size of the police workforce, nonetheless, has declined modestly since 2012. I then examined the City-wide employee retention rates and the retention rates of each City agencies. Overall, Baltimore City had a one-year employee retention rate of 87%, a two-year retention rate of 78%, a three-year retention rate of 71%, and a four-year retention rate of 65%. The number of employees who switched jobs among different City agencies were small, about 1.1%. The result indicates that the majority of employee attrition was due to employees leaving the City workforce all together. It also means that City workers, if they chose to stay working for the City, liked to stay with the same agencies. Analysis at a more granular scale shows that smaller agencies, if they continue to exist over the years, had higher retention rates than larger agencies. Nonetheless, over long-term periods (four years), larger departments such as the Fire Department and the Dept of Public Works have been able to achieve some of the highest retention rates among all City agencies.
The salary distributions of City employees show that every year since 2011 there has been a steady, albeit modest, pay increase. One interesting observation was the Police Department’s workforce reduction that coincided with its increase in salary spending. Upon further examination of the data, I found that the extra spending had impact on majority of the personnel in the Police Department, and was not the results of hiring a few high paying individuals. The data also showed that the Police Department had a higher pay scale than all the other City agencies.
The data products from this analysis are available for download here.