library(lubridate)
library(dplyr)
library(stringr)
library(ggvis)
library(tidyr)
library(pander)

Abstract: I examined the employment and salary statistics of Baltimore City employees based on the 2011-2015 data from the Baltimore Open Data Portal. Among all City agencies, the Police Department had the largest number of employees over the 2011-2015 time period. Overall Baltimore City had a one-year employee retention rate of 87%, a two-year retention rate of 78%, a three-year retention rate of 71%, and a four-year retention rate of 65%. The number of employees who switched jobs among different City agencies were small, about 1.1%. The result indicates that the majority of employee attrition was due to employees leaving the City workforce all together. It also means that City workers, if they chose to stay working for the City, liked to stay with the same agencies. Analysis at a more granular scale shows that smaller agencies, if they continue to exist over the years, had higher retention rates than larger agencies. From analyzing salary data, I found that every year since 2011 there has been a steady, albeit modest, pay increase for City employees. The Police Department stood out as the only agency that has been experiencing a workforce reduction (since 2012) while increasing its salary spending at the same time. Data showed that the extra spending had impact on majority of the personnel in the Police Department, and was not the results of hiring a few high paying individuals. The data also showed that the Police Department had a higher pay scale than all the other City agencies. The data products from this analysis are available for download here.

Tags: data cleaning data analysis model fitting data visualization R

1 Introduction

This project analyzes Baltimore City Employee Salaries data from 2011 to 2015 provided at the Baltimore Open Data Portal. In particular, by identifying trends and features in these data I hope to uncover some of the consequences of the current policies on personnel allocation and financial support. To carry out this analysis, I will be reading in the data from separate files, clean the data, visualize them, and fit appropriate models to them.

2 Data Preparation

The five data files, one for each fiscal year, contain the same types of information. For example, each file records employee names, hiring dates, job titles, affiliating agencies, and salaries. While the variables in these data files may have names that differ from year to year, the fact that the same types of information were provided for all five years makes processing the data easy. Indeed, one can set up a loop that reads in the data, clean them up, and merge them into a master data set. This loop approach is scalable and saves memory space, so that’s what I did.

In the following, I will first define helper functions to carry out various tasks of cleaning the data and creating additional variables. The steps include:

Read in data from individual files,
Format HireDate data,
Standardize the spelling of agency names and create necessary variables to group subsidiary departments with their umbrella agencies,
Convert salary data to numeric values,
Merge data into a master data frame.

2.1 Convert Salary Data from Character Strings to Numerics

# Remove "$" signs in salary data and convert them from character type to numeric type 
format_salary_data <- function(strings_salary) {
  numeric_salary <- strings_salary %>% 
    str_replace("[$]", "") %>%
    as.numeric()
}

2.2 Standardize Agency Names

I’ve learned from analyzing the 2015 data that the entries of Agency include the last three digits in the entries of AgencyID some of the times but not all the time. In the following I will remove the three digits reference in Agency (if they appear) and standardize the spellings of the entries in Agency.

2.2.1 Remove redundant location reference in the `Agency` column

The following function removes the three digit location reference (if they appear). It removes location reference that comes in the form “(nnn)”, “Location xxx”, or simply “(”. Since few departments indicated their employees as part-time, wkly, or full-time, I decided to focus this analysis on the inter-departmental differences only. As a result, this function will also remove any “(part-time)” or “(wkly)” designations.

# Identify and remove any location reference in a given character string
rm_location_ref <- function(cstr) {
  idx <- str_locate(cstr, "[(]|Location [:digit:]{3}|[:digit:]{3}")
  if (is.na(idx[1])) return(cstr) 
  else {
    idx <- idx[1] - 2
    # return(substring(cstr, 1, idx))
    return(str_trim(substring(cstr, 1, idx), "right"))
  }
}

# Apply rm_location_ref function to each character string in the `Agency` column
rm_location_ref_apply2vec <- function(cstr_vec) {
  unlist(lapply(cstr_vec, rm_location_ref))
}

2.2.2 Standardize the spellings of agency names

Some agency names have been spelled in multiple ways. For example,

“HLTH-Health Department”, “HLTH-Health Dept”, “HLTH-Heatlh Dept.”, and “HLTH-Health Dept.” will be unified as “HLTH-Health Department.”
“Mayor’s Office” and “Mayors Office” will be unified as “Mayor’s Office.”
The designation “Fire Academy Recruits” was only used in the 2011 data and it shared the same first three-digit AgencyID, “A64”, as the “Fire Department.” As a result, “Fire Academy Recruits” will be considered as a subsidiary agency of the “Fire Department.”
Out of data from all five year, only one person in 2013 was designated as “OED-Participants-Youth Summer.” Since the entry has the same AgencyID, “W02”, as others under “Youth Summer”, the single entry of “OED-Participants-Youth Summer” will be converted into “Youth Summer.”

The following function standardizes names for agencies and their subsidiary departments.

sync_name <- function(agency_vec) {
  agency_vec <- rm_location_ref_apply2vec(agency_vec)
  agency_vec <- agency_vec %>% str_replace("Fire Academy Recruits", "Fire Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept Locatio", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept. Locatio", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Heatlh Dept. Locatio", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Heatlh $", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept$", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Health Dept.$", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("HLTH-Heatlh Dept.$", "HLTH-Health Department")
  agency_vec <- agency_vec %>% str_replace("Mayors Office", "Mayor's Office")
  agency_vec <- agency_vec %>% str_replace("Civil Rights & Wage Enfor$", "Civil Rights & Wage Enforce")
  agency_vec <- agency_vec %>% str_replace("COMP-Communication Ser$", "COMP-Communication Services")
  agency_vec <- agency_vec %>% str_replace("COMP-Comptroller's O$", "COMP-Comptroller's Office")
  agency_vec <- agency_vec %>% str_replace("Municipal & Zoning Appeal$", "Municipal & Zoning Appeals")
  agency_vec <- agency_vec %>% str_replace("TRANS-Cross Guard-S$", "TRANS-Cross Guard-Summer")
  agency_vec <- agency_vec %>% str_replace("OED-Participants-Youth Summer", "Youth Summer")
}

2.2.3 Group subsidiary departments with their umbrella agencies

Among the agencies listed in each year’s salary file, some agencies were represented by their subsidiary departments. This resulted in 50+ different agencies. Since this level of refined differentiation can be overwhelming and may create confusion for a regular citizen, I limit the content of the Agency variable to the (umbrella) agencies only. The original super-agency-subsidiary-department description will be preserved in a different variable called AgencyDept. In this way, I can group data together at the super-agency level, and still be able to identify which department within an agency that a particular employee worked for by looking up content in the AgencyDept variable.

# Group sub-agencies under super-agencies, the sub-agency's name is provided as a character string
sub2sup_agencies <- function(cstr) {
  if (is.na(cstr)) return(cstr)
  if (str_detect(cstr, "COMP")) return("Comptroller's Office")
  if (str_detect(cstr, "DPW")) return("Dept of Public Works")
  if (str_detect(cstr, "ERS")) return("Employee's Retirement Sys")
  if (str_detect(cstr, "FIN")) return("Dept of Finance")
  if (str_detect(cstr, "FPR")) return("Fire & Police Retirement Sys Admin")
  if (str_detect(cstr, "HLTH")) return("Dept of Health")
  if (str_detect(cstr, "HR")) return("Human Resourcese")
  if (str_detect(cstr, "M-R")) return("Mayor's Office")
  if (str_detect(cstr, "OED")) return("Office of Employment Dev")
  if (str_detect(cstr, "R&P")) return("Recreation & Parks")
  if (str_detect(cstr, "TANF")) return("Temp Assist for Needy Families")
  if (str_detect(cstr, "TRANS")) return("Dept of Transportation")
  return(cstr)
}

# Apply `sub2sup_agencies` function to a vector of sub-agency names that are character strings
sub2sup_agencies_apply2vec <- function(cstr_vec) {
  unlist(lapply(cstr_vec, sub2sup_agencies))
}

2.3 Flag Records with No `HireDate`

These are records that have generic names, such as “BPD 1”, “BPD 2”, …, “BPD 96”, in the Name column. It appears that the payroll information of these records are credible, in the sense that they were actual salaries paid to certain individuals, However, these records can be linked to no identifiable persons. More specifically, for these unnamed individuals their JobTitle’s are identical to their names; their Agency were all listed as “Police Department”; and the first three digits of AgencyID are all “A99.”

Since Name and HireDate are the only demographic information provided in these data files, I will be using these two pieces of information to match records and merge them into a master data set. This means that I need to flag records that have generic Name’s and no HireDate in order to correctly match them later. I will give “Jan-1-2200”, a future date, as the HireDate to the these records.

flag_hiredate <- function(date_vec) {
  idx <- which(is.na(date_vec))
  # print(paste(as.character(length(idx)), "missing hire-date records identified.", sep=" "))
  date_vec[idx] <- mdy("01-01-2200")  
  return(date_vec)
}

2.4 Trim Trailing Space in Data Entries

Some entries of Name and Agency have included trailing space(s), which will lead to separate entries when we merge data records later on.

trim_trailing_space_cstr <- function(cstr) {
  str_trim(cstr, side="right")
}

trim_trailing_space2vec <- function(cstr_vec) {
  unlist(lapply(cstr_vec, trim_trailing_space_cstr))
}

2.5 Merge Data into One Master Data Set

BmoreSalaries1115 <- data.frame()

for (i in 11:15) {
  # Standardize names of variables in the incoming data file. This step is necessary since in some files, "AgencyID" is called "DEPTID", "Agency" is called "DESCR", etc.
  new_names <- c("Name", "JobTitle", "AgencyID", "Agency", "HireDate", "AnnualSalary", "GrossPay")
  
  # Read data from files
  if (i == 11) {
    new_csv = paste("Baltimore_City_Employee_Salaries_20",as.character(i),".csv", sep="")
  } else {
    new_csv = paste("Baltimore_City_Employee_Salaries_FY20",as.character(i),".csv", sep="")
  }
  generic_df <- read.csv(new_csv, stringsAsFactors=FALSE, col.names = new_names)
  
  # Prepare data
  generic_df$Name <- trim_trailing_space2vec(generic_df$Name)
  generic_df$JobTitle <- trim_trailing_space2vec(generic_df$JobTitle)
  generic_df$AgencyID <- trim_trailing_space2vec(generic_df$AgencyID)
  generic_df$Agency <- trim_trailing_space2vec(generic_df$Agency)
  
  generic_df$HireDate <- mdy(generic_df$HireDate)
  
  generic_df$HireDate <- flag_hiredate(generic_df$HireDate)
# [1] "75 missing hire-date records identified."
# [1] "96 missing hire-date records identified."
# [1] "52 missing hire-date records identified."
# [1] "70 missing hire-date records identified."
# [1] "10 missing hire-date records identified."
  
  generic_df$AnnualSalary <- format_salary_data(generic_df$AnnualSalary)
  generic_df$GrossPay <- format_salary_data(generic_df$GrossPay)
  generic_df$AgencyDept <- sync_name(generic_df$Agency)
  generic_df$Agency <- sub2sup_agencies_apply2vec(generic_df$AgencyDept)
  
  # Use "Name" and "HireDate" as unique employee identifier; arrange relevant variables to be next to each other in the data frame
  generic_df <- generic_df %>% 
    select(Name, HireDate, JobTitle, Agency, AgencyDept, AgencyID, AnnualSalary, GrossPay)
  
  # Re-name variables to reflect the year data were collected
  new_names <- c("Name", 
                 "HireDate",
                 paste("JobTitle",as.character(i), sep=""),
                 paste("Agency",as.character(i), sep=""),
                 paste("AgencyDept",as.character(i), sep=""),
                 paste("AgencyID",as.character(i), sep=""),
                 paste("AnnualSalary",as.character(i), sep=""),
                 paste("GrossPay",as.character(i), sep=""))
  names(generic_df) <- new_names

  # Merge data from different years into a single data frame
  if (i==11) {
    BmoreSalaries1115 <- generic_df
  } else {
    BmoreSalaries1115<-merge(generic_df, BmoreSalaries1115, by=c("Name", "HireDate"), all=TRUE)   
  }
}

At this point, all data have been read into the master data frame BmoreSalaries1115. This data frame has 29,425 records with each record characterized by 32 variables. Here is a glimpse of the master data frame at the current stage:

glimpse(BmoreSalaries1115)

## Observations: 29,425
## Variables: 32
## $ Name           (chr) "Aaron,Keairah T", "Aaron,Keontae E", "Aaron,Patricia G", "Aaron,Petra L", "Abaine...
## $ HireDate       (date) 2013-06-19, 2013-06-10, 1979-10-24, 2006-09-25, 2009-07-23, 2013-07-24, 2013-05-0...
## $ JobTitle15     (chr) NA, NA, "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLO...
## $ Agency15       (chr) NA, NA, "Office of Employment Dev", "States Attorneys Office", "Dept of Health", "...
## $ AgencyDept15   (chr) NA, NA, "OED-Employment Dev", "States Attorneys Office", "HLTH-Health Department",...
## $ AgencyID15     (chr) NA, NA, "A03031", "A29045", "A65026", "A99005", "A40001", "A90005", "A64120", "A99...
## $ AnnualSalary15 (dbl) NA, NA, 55314, 74000, 64500, 46309, 60060, 42702, 62175, 77343, 44548, 18408, 1840...
## $ GrossPay15     (dbl) NA, NA, 53626, 73000, 64404, 59620, 54060, 20251, 83757, 92575, 55943, 1051, 1051,...
## $ JobTitle14     (chr) NA, "AIDE BLUE CHIP", "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY...
## $ Agency14       (chr) NA, "Youth Summer", "Office of Employment Dev", "States Attorneys Office", "Dept o...
## $ AgencyDept14   (chr) NA, "Youth Summer", "OED-Employment Dev", "States Attorneys Office", "HLTH-Health ...
## $ AgencyID14     (chr) NA, "W02200", "A03031", "A29005", "A65026", "A99416", "A40001", NA, "A64120", "A99...
## $ AnnualSalary14 (dbl) NA, 11310, 53428, 68300, 62000, 43999, 52000, NA, 62175, 70918, 42438, 11310, 1131...
## $ GrossPay14     (dbl) NA, 874, 52868, 67439, 58655, 39687, 47020, NA, 61452, 87900, 53668, NA, NA, NA, 1...
## $ JobTitle13     (chr) "AIDE BLUE CHIP", "AIDE BLUE CHIP", "Facilities/Office Services II", "ASSISTANT ST...
## $ Agency13       (chr) "Youth Summer", "Youth Summer", "Office of Employment Dev", "States Attorneys Offi...
## $ AgencyDept13   (chr) "Youth Summer", "Youth Summer", "OED-Employment Dev", "States Attorneys Office", "...
## $ AgencyID13     (chr) "W02278", "W02200", "A03031", "A29005", "A65026", NA, "A40001", NA, "A64120", "A99...
## $ AnnualSalary13 (dbl) 11310, 11310, 51862, 64000, 57900, NA, 52000, NA, 40650, 68847, 41194, NA, NA, NA,...
## $ GrossPay13     (dbl) NA, NA, 51771, 63909, 57429, NA, 3477, NA, 44160, 66496, 46254, NA, NA, NA, NA, 32...
## $ JobTitle12     (chr) NA, NA, "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLO...
## $ Agency12       (chr) NA, NA, "Office of Employment Dev", "States Attorneys Office", "Dept of Health", N...
## $ AgencyDept12   (chr) NA, NA, "OED-Employment Dev", "States Attorneys Office", "HLTH-Health Department",...
## $ AgencyID12     (chr) NA, NA, "A03031", "A29005", "A65026", NA, NA, NA, "A64215", "A99398", "A38410", NA...
## $ AnnualSalary12 (dbl) NA, NA, 51862, 64000, 57900, NA, NA, NA, 34146, 58244, 41194, NA, NA, NA, 44143, 3...
## $ GrossPay12     (dbl) NA, NA, 52247, 59027, 57130, NA, NA, NA, 35538, 62669, 46700, NA, NA, NA, 44638, 2...
## $ JobTitle11     (chr) NA, NA, "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLO...
## $ Agency11       (chr) NA, NA, "Office of Employment Dev", "States Attorneys Office", "Dept of Health", N...
## $ AgencyDept11   (chr) NA, NA, "OED-Employment Dev", "States Attorneys Office", "HLTH-Health Department",...
## $ AgencyID11     (chr) NA, NA, "A03031", "A29005", "A65026", NA, NA, NA, "A64063", "A99398", NA, NA, NA, ...
## $ AnnualSalary11 (dbl) NA, NA, 50845, 56595, 56500, NA, NA, NA, 33476, 50919, NA, NA, NA, NA, 43278, 2638...
## $ GrossPay11     (dbl) NA, NA, 45506, 51589, 50633, NA, NA, NA, 3889, 51422, NA, NA, NA, NA, 39117, 23758...

unique_names <- table(BmoreSalaries1115$Name)
length(unique_names)

## [1] 28832

There are 28,832 unique names in the data frame, which means that some of the records share the same employee names. So, do these records represent different people or the same ones? Given that I don’t have other demographic information of City employees, I looked up their HireDate, which were different. Given that employees who had switched agencies over the years were linked to the same HireDate, it seems that HireDate is a valid piece of information to differentiate employees who had the same names. For this reason I decided to use both Name and HireDate as unique identifiers of individual employees.

Here is a summary of the data:

summary(BmoreSalaries1115)

##      Name              HireDate           JobTitle15          Agency15         AgencyDept15      
##  Length:29425       Min.   :1952-06-23   Length:29425       Length:29425       Length:29425      
##  Class :character   1st Qu.:2001-09-17   Class :character   Class :character   Class :character  
##  Mode  :character   Median :2009-06-11   Mode  :character   Mode  :character   Mode  :character  
##                     Mean   :2006-04-02                                                           
##                     3rd Qu.:2013-06-10                                                           
##                     Max.   :2200-01-01                                                           
##                                                                                                  
##   AgencyID15        AnnualSalary15     GrossPay15      JobTitle14          Agency14        
##  Length:29425       Min.   :   900   Min.   :     0   Length:29425       Length:29425      
##  Class :character   1st Qu.: 33354   1st Qu.: 34666   Class :character   Class :character  
##  Mode  :character   Median : 48126   Median : 52753   Mode  :character   Mode  :character  
##                     Mean   : 51710   Mean   : 55782                                        
##                     3rd Qu.: 68112   3rd Qu.: 75768                                        
##                     Max.   :238772   Max.   :235768                                        
##                     NA's   :15406    NA's   :15675                                         
##  AgencyDept14        AgencyID14        AnnualSalary14     GrossPay14      JobTitle13       
##  Length:29425       Length:29425       Min.   :     0   Min.   :     0   Length:29425      
##  Class :character   Class :character   1st Qu.: 11310   1st Qu.: 22065   Class :character  
##  Mode  :character   Mode  :character   Median : 36400   Median : 45267   Mode  :character  
##                                        Mean   : 39757   Mean   : 45634                     
##                                        3rd Qu.: 60598   3rd Qu.: 67816                     
##                                        Max.   :238772   Max.   :238772                     
##                                        NA's   :10440    NA's   :13664                      
##    Agency13         AgencyDept13        AgencyID13        AnnualSalary13     GrossPay13    
##  Length:29425       Length:29425       Length:29425       Min.   :     0   Min.   :     0  
##  Class :character   Class :character   Class :character   1st Qu.: 11310   1st Qu.: 24536  
##  Mode  :character   Mode  :character   Mode  :character   Median : 35947   Median : 44001  
##                                                           Mean   : 38293   Mean   : 44313  
##                                                           3rd Qu.: 57000   3rd Qu.: 65615  
##                                                           Max.   :238772   Max.   :238681  
##                                                           NA's   :10285    NA's   :13504   
##   JobTitle12          Agency12         AgencyDept12        AgencyID12        AnnualSalary12  
##  Length:29425       Length:29425       Length:29425       Length:29425       Min.   :   377  
##  Class :character   Class :character   Class :character   Class :character   1st Qu.: 28726  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median : 42853  
##                                                                              Mean   : 43402  
##                                                                              3rd Qu.: 59933  
##                                                                              Max.   :238772  
##                                                                              NA's   :13373   
##    GrossPay12      JobTitle11          Agency11         AgencyDept11        AgencyID11       
##  Min.   :   -77   Length:29425       Length:29425       Length:29425       Length:29425      
##  1st Qu.: 23871   Class :character   Class :character   Class :character   Class :character  
##  Median : 43061   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 43176                                                                              
##  3rd Qu.: 63551                                                                              
##  Max.   :245353                                                                              
##  NA's   :13373                                                                               
##  AnnualSalary11     GrossPay11    
##  Min.   :   377   Min.   :     7  
##  1st Qu.: 31609   1st Qu.: 27609  
##  Median : 43614   Median : 41100  
##  Mean   : 46009   Mean   : 42323  
##  3rd Qu.: 59916   3rd Qu.: 58322  
##  Max.   :238772   Max.   :170423  
##  NA's   :14968    NA's   :14968

2.6 Anomalies in Data Entries

2.6.1 Negative gross pay in 2012

It appears that in 2012 some employees received negative pay. Let’s take a closer look.

sum(BmoreSalaries1115$GrossPay12<0, na.rm=T)

## [1] 19

idx12<-which(BmoreSalaries1115$GrossPay12 < 0)
negative_pay12 <- BmoreSalaries1115[idx12,] %>% 
  select(JobTitle12, Agency12, GrossPay12)
negative_pay12$JobTitle12 <- as.factor(negative_pay12$JobTitle12)
negative_pay12$Agency12 <- as.factor(negative_pay12$Agency12)
summary(negative_pay12)

##                      JobTitle12           Agency12    GrossPay12 
##  AIDE BLUE CHIP           : 2   Dept of Health:17   Min.   :-77  
##  JTPA TRAINEE A MOMR      :14   Youth Summer  : 2   1st Qu.:-56  
##  SR COMPANION STIPEND HLTH: 3                       Median :-49  
##                                                     Mean   :-45  
##                                                     3rd Qu.:-28  
##                                                     Max.   : -7

There were 19 people who received negative pay in 2012. Two of them were under “Youth Summer”, and the other 17 were under the “Dept of Health.” Those who served in “Youth Summer” had “AIDE BLUE CHIP” as JobTitle, while those who worked for the the “Dept of Health” either served as “JTPA TRAINEE A MOMR” or as “SR COMPANION STIPEND HLTH.” All 19 people have been with the City for less than a year, and their negative pays were less than $100 each.

We can rule out the possibility that these negative gross pays were the results of overpay from the previous year since all 19 individuals had been working for the City for less than a year. Given that no negative pay records were found in years before and after 2012, it is likely that these are typos. A less likely but possible explanation is that the City offered benefits which required a “co-pay”" from the employees. Given that this is an exploratory project to practice using R and not an investigative report, I will leave the numbers as they are. Those who are interested in finding the exact answer may download the processed data files to continue with their investigations.

2.6.2 Unnamed employees at the police department

numeric_names <- function(cstr) {
  str_detect(cstr, "^[:digit:]|[:digit:]$")
}

AnyNumericName <- unlist(lapply(BmoreSalaries1115$Name, numeric_names))
sum(AnyNumericName)    # [1] 96

## [1] 96

NumericNameIndv <- subset(BmoreSalaries1115, AnyNumericName)
head(NumericNameIndv)

##        Name   HireDate JobTitle15          Agency15      AgencyDept15 AgencyID15 AnnualSalary15 GrossPay15
## 2656  BPD 1 2200-01-01      BPD 1 Police Department Police Department     A99200          74517      82988
## 2657 BPD 10 2200-01-01     BPD 10 Police Department Police Department     A99200          66086     104206
## 2658 BPD 11 2200-01-01       <NA>              <NA>              <NA>       <NA>             NA         NA
## 2659 BPD 12 2200-01-01       <NA>              <NA>              <NA>       <NA>             NA         NA
## 2660 BPD 13 2200-01-01       <NA>              <NA>              <NA>       <NA>             NA         NA
## 2661 BPD 14 2200-01-01       <NA>              <NA>              <NA>       <NA>             NA         NA
##      JobTitle14          Agency14      AgencyDept14 AgencyID14 AnnualSalary14 GrossPay14 JobTitle13
## 2656        BPD Police Department Police Department     A99200          60598      74946        BPD
## 2657        BPD Police Department Police Department     A99200          56312      55117        BPD
## 2658        BPD Police Department Police Department     A99200          65351      67579        BPD
## 2659        BPD Police Department Police Department     A99200          60598      62915        BPD
## 2660        BPD Police Department Police Department     A99200          64163      66005        BPD
## 2661        BPD Police Department Police Department     A99200          64163      65174        BPD
##               Agency13      AgencyDept13 AgencyID13 AnnualSalary13 GrossPay13 JobTitle12          Agency12
## 2656 Police Department Police Department     A99334          61157      92411      BPD 1 Police Department
## 2657 Police Department Police Department     A99334          65020      94672     BPD 10 Police Department
## 2658 Police Department Police Department     A99334          69893      85584     BPD 11 Police Department
## 2659 Police Department Police Department     A99347          65816      77536     BPD 12 Police Department
## 2660 Police Department Police Department     A99295          62324      65310     BPD 13 Police Department
## 2661 Police Department Police Department     A99347          62324      72900     BPD 14 Police Department
##           AgencyDept12 AgencyID12 AnnualSalary12 GrossPay12 JobTitle11          Agency11      AgencyDept11
## 2656 Police Department        A99          88565      93671        BPD Police Department Police Department
## 2657 Police Department        A99          69893      77961        BPD Police Department Police Department
## 2658 Police Department        A99          62905      74054        BPD Police Department Police Department
## 2659 Police Department        A99          66982      81539        BPD Police Department Police Department
## 2660 Police Department        A99          61157     104831        BPD Police Department Police Department
## 2661 Police Department        A99          69893      79995        BPD Police Department Police Department
##      AgencyID11 AnnualSalary11 GrossPay11
## 2656        A99          85372      89290
## 2657        A99          68523      85543
## 2658        A99          61672      64022
## 2659        A99          66810     103011
## 2660        A99          59388      95306
## 2661        A99          65669      85301

Over the year, there were 96 individuals whose names were “BPD”+digits. These individuals were all employees of the Police Department. These are the only records where individuals were represented under generic names in these data files. While we don’t know who these individuals are, we can still ask questions such as how many unnamed individuals were employed each year and how much they were paid each year. The following is a summary of answers to those questions. Again, I will leave the records as they are in the data set since they seem to represent real people and real salary expense.

Salary Spending on Unnamed Employees in the Police Department 2011-2015
	Number of Employees (n)	Minimum Gross Pay ($)	1st Quarter Gross Pay ($)	Median Gross Pay ($)	Mean Gross Pay ($)	3rd Quarter Gross Pay ($)	Maximum Gross Pay ($)	Total Gross Pay ($)
2011	75	43,200	63,600	72,100	74,500	85,000	127,000	5,591,031
2012	96	54,000	68,000	76,500	78,400	85,500	111,000	7,522,470
2013	52	59,300	73,100	82,200	84,700	95,400	113,000	4,402,887
2014	70	10,100	67,500	77,100	79,700	91,900	135,000	5,576,180
2015	10	59,300	73,100	75,600	79,700	92,000	104,000	797,282

In the boxplot above, each box presents the middle 50% of gross pays received by unnamed employees in a given year. The horizontal cut inside the box represents the median pay. The width of each box varies with respect to the number of unnamed employees serving in that year. That is, a wider box indicates a larger number of unnamed workers. The dots above and below each box represent cases of gross pays that were beyond 1.5 times the interquartile range (the middle 50% pay range) of gross pay of that particular year.

3 Analysis & Results

Our master data set shows that the City’s workforce had 14,019 employees in 2015, 18,985 employees in 2014, 19,140 employees in 2013, 16,052 employees in 2012, and 14,457 employees in 2011. So how was the workforce distributed among different City agencies?

3.1 Distribution of Workforce among Agencies

We can answer this question by counting the number of employees listed under different agencies. For ease of analysis I will first slice the relevant data out of the master data set.

Agency1115 <- select(BmoreSalaries1115, Name, HireDate, contains("Agency1"), contains("GrossPay"))

sum_agency15 = as.data.frame(table(Agency1115$Agency15), responseName = "2015", stringsAsFactors = FALSE)
sum_agency14 = as.data.frame(table(Agency1115$Agency14), responseName = "2014", stringsAsFactors = FALSE)
sum_agency13 = as.data.frame(table(Agency1115$Agency13), responseName = "2013", stringsAsFactors = FALSE)
sum_agency12 = as.data.frame(table(Agency1115$Agency12), responseName = "2012", stringsAsFactors = FALSE)
sum_agency11 = as.data.frame(table(Agency1115$Agency11), responseName = "2011", stringsAsFactors = FALSE)

TotAgency1115 = merge(sum_agency11,sum_agency12,by="Var1",all = TRUE)
TotAgency1115 = merge(TotAgency1115,sum_agency13,by="Var1",all = TRUE)
TotAgency1115 = merge(TotAgency1115,sum_agency14,by="Var1",all = TRUE)
TotAgency1115 = merge(TotAgency1115,sum_agency15,by="Var1",all = TRUE)

names(TotAgency1115)[1]="Agency"

# Convert all NA's into 0
for (i in 2:6) {
  idx <- which(is.na(TotAgency1115[[i]]))
  TotAgency1115[[i]][idx] <- 0
}

The above plot shows that most agencies had a steady workforce over the years. One striking feature is the spike of “Youth Summer” employees in 2013 (4,725 participants) and 2014 (4,819 participants). It appears that the summer program, which gave participants work experience and a small stipend, had a modest enrollment in 2012 ( 1,755 participants) and flourished in 2013 and 2014. We know that the summer programs expanded in 2015, hosting more than 8000 youths. However, the 2015 program participants were either no longer considered as city employees or somehow their data were not incorporated into the data set like in the previous years.

Another notable feature of the plot is that the Police Department has consistently been the largest agencies in the City. Indeed, it had the largest city workforce in 2011 (3,428 employees), 2012 (3,475 employees), and 2015 (3,097 employees). Its head counts in 2013 and 2014 were only surpassed by the Summer Youth Program.

While maintaining its dominance in the City’s workforce, the police workforce has been on the decline since 2012. And it is one of the few agencies what were doing so in recent years. The readers may be interested in comparing this plot with the plot of salary spending, which will be examined in a later section.

Between 2011 and 2015 only the “Mayor’s Office” and “Recreation & Parks” experienced a steady increase in the workforce. Between the two agencies, “Recreation & Parks” was about one-and-a-half the size of the “Mayor’s Office”. Nonetheless, the “Mayor’s Office” had a salary budget twice that of “Recreation & Parks” (see salary analysis). This likely means that majority of the “Recreation & Parks” employees were part-time, and that the positions at the “Mayor’s Office” were mostly full-time.

Finally, you may wonder if “Youth Cust”, “Youth Summer”, and “Youth Temp Adult” should be grouped together since their names imply that these are programs for youths. Based on the data in the salary data files, it is not a clear call. First of all, each of these designations has a distinct three-digit AgencyID. Secondly, they co-existed in four of the five years–2015 is the only year where all three programs had zero participant. Based on these observations, I decided to leave them as separate Agency’s in the analysis.

Here are the numbers that generated the plot above.

Number of Positions by Agency 2011-2015
	2011 (n)	2012 (n)	2013 (n)	2014 (n)	2015 (n)
Circuit Court	154	149	139	140	141
City Council	88	90	87	93	96
Civil Rights & Wage Enforce	0	0	12	12	11
Community Relations	10	12	0	0	0
Comptroller’s Office	91	86	95	90	91
Council Services	5	6	6	6	6
Dept of Finance	265	281	286	288	290
Dept of Health	941	870	957	915	958
Dept of Public Works	2,356	2,327	2,332	2,337	2,369
Dept of Transportation	1,490	1,535	1,576	1,423	1,368
Elections	6	11	8	8	8
Employee’s Retirement Sys	27	28	29	33	32
Enoch Pratt Free Library	442	439	475	463	473
Fire & Police Retirement Sys Admin	21	22	22	25	27
Fire Department	1,726	1,730	1,640	1,576	1,615
General Services	396	411	400	389	357
Housing & Community Dev	513	533	433	446	384
Human Resourcese	52	55	59	66	66
Law Department	97	105	107	110	108
Legislative Reference	6	6	5	5	5
Liquor License Board	30	36	29	20	24
Mayor’s Office	424	401	672	687	724
Municipal & Zoning Appeals	8	9	9	9	9
Office of Employment Dev	224	208	195	165	162
Orphan’s Court	10	11	12	11	10
Planning Department	47	45	47	48	50
Police Department	3,428	3,475	3,278	3,211	3,097
Recreation & Parks	650	756	874	953	966
Sheriff’s Office	175	182	191	206	210
Special City Services	2	2	3	4	4
States Attorneys Office	375	362	364	358	358
Temp Assist for Needy Families	45	50	0	0	0
Wage Commissioner	3	0	0	0	0
War Memorial Commission	4	4	0	0	0
Youth Cust	328	29	6	0	0
Youth Summer	0	1,755	4,725	4,819	0
Youth Temp Adult	18	31	67	69	0

3.2 Retention of City Employees

Employee attrition is costly to an organization. Here I examined the retention rates of Baltimore City agencies. I first look at the retention of employees across all agencies as a whole, and then break down the numbers by agency.

I decided to count an employee as retained only if he/she remained working for the same agency from one year to the next. This means that if an employee left the agency either by transferring to another agency within the City government or by dropping out of the City workforce altogether, he/she would be counted toward the attrition.

First, let’s calculate the size of the workforce for each year of the data we have.

# Workforce in '15
Workforce15 = sum(!is.na(Agency1115$Agency15))    # [1] 14019

# Workforce in '14
Workforce14 = sum(!is.na(Agency1115$Agency14))    # [1] 18985

# Workforce in '13
Workforce13 = sum(!is.na(Agency1115$Agency13))    # [1] 19140

# Workforce in '12
Workforce12 = sum(!is.na(Agency1115$Agency12))    # [1] 16052

# Workforce in '11
Workforce11 = sum(!is.na(Agency1115$Agency11))    # [1] 14457

Next, I am going to calculate, with each year’s passing,

number of City workers dropping out of the system,
number of new hires into the system,
number of City workers who switched agencies,
number of City workers who stayed with the same agencies, and
retention rate as a function the number of City workers who stayed with the same agencies.

AgencyOnly1115 <- Agency1115 %>% select(Name, HireDate, contains("Agency1"))

AgencyOnly1115 <- AgencyOnly1115 %>% mutate(same_agency1112=Agency11==Agency12)
AgencyOnly1115 = AgencyOnly1115 %>% mutate(same_agency1213=Agency12==Agency13)
AgencyOnly1115 = AgencyOnly1115 %>% mutate(same_agency1314=Agency13==Agency14)
AgencyOnly1115 = AgencyOnly1115 %>% mutate(same_agency1415=Agency14==Agency15)


# Drop out of City workforce '11 to '12
NLeftTot11 = sum(!is.na(AgencyOnly1115$Agency11) & is.na(AgencyOnly1115$Agency12))    # [1] 1845
LeftTot11 = NLeftTot11/Workforce11 # [1] 0.13

# New hires to the City workforce `11 to `12
NNewHires12 = sum(!is.na(AgencyOnly1115$Agency12) & is.na(AgencyOnly1115$Agency11))     # [1] 3440
NewHires12 = NNewHires12/Workforce12 # [1] 0.21

# Switched agencies from '11 - '12
NSwitchAgency1112 = sum(!AgencyOnly1115$same_agency1112, na.rm=T)   # [1] 84
SwitchAgency1112 = NSwitchAgency1112/Workforce11  # [1] 0.0058

# Stayed with the same agencies from '11 - '12
NSameAgency1112 = sum(AgencyOnly1115$same_agency1112, na.rm=T)    # [1] 12528

# Employee retention rate after one year (ref. 2011)
Retain1112 = NSameAgency1112/Workforce11    # [1] 0.87

# Let's check the numbers:
Workforce11 - NLeftTot11 + NNewHires12

## [1] 16052

Workforce12

## [1] 16052

Workforce11 - NLeftTot11 - NSwitchAgency1112

## [1] 12528

NSameAgency1112

## [1] 12528

# ------------------------------------------------
# Drop off from '12 to '13
NLeftTot12 = sum(!is.na(AgencyOnly1115$Agency12) & is.na(AgencyOnly1115$Agency13))    # [1] 2754
LeftTot12 = NLeftTot12/Workforce12    # [1] 0.17

# New hires to the City workforce `12 to `13
NNewHires13 = sum(!is.na(AgencyOnly1115$Agency13) & is.na(AgencyOnly1115$Agency12))   # [1] 5842
NewHires13 = NNewHires13/Workforce13    # [1] 0.31

# Switched agencies from '12 - '13
NSwitchAgency1213 = sum(!AgencyOnly1115$same_agency1213, na.rm=T)   # [1] 337
SwitchAgency1213 = NSwitchAgency1213/Workforce12    # [1] 0.021

# Stayed with the same agencies from '12 - '13
NSameAgency1213 = sum(AgencyOnly1115$same_agency1213, na.rm=T)    # [1] 12961

# City worker retention rate after one year (ref. 2012)
Retain1213 = NSameAgency1213/Workforce12    # [1] 0.81

# Let's check the numbers:
Workforce12 - NLeftTot12 + NNewHires13

## [1] 19140

Workforce13

## [1] 19140

Workforce12 - NLeftTot12 - NSwitchAgency1213

## [1] 12961

NSameAgency1213

## [1] 12961

# Stayed with same agency from '11 - '13
NSameAgency1113 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1112, na.rm=T)   # [1] 11070

# City worker retention rate after two year (ref. 2011)
Retain1113 = NSameAgency1113/Workforce11    # [1] 0.77

# ------------------------------------------------
# Drop off from '13 to '14
NLeftTot13 = sum(!is.na(AgencyOnly1115$Agency13) & is.na(AgencyOnly1115$Agency14))   # [1] 4799
LeftTot13 = NLeftTot13/Workforce13    # [1] 0.25

# New hires to the City workforce `13 to `14
NNewHires14 = sum(!is.na(AgencyOnly1115$Agency14) & is.na(AgencyOnly1115$Agency13))  # [1] 4644
NewHires14 = NNewHires14/Workforce14    # [1] 0.24

# Switched agencies from '13 - '14
NSwitchAgency1314 = sum(!AgencyOnly1115$same_agency1314, na.rm=T)   # [1] 86
SwitchAgency1314 = NSwitchAgency1314/Workforce13    # [1] 0.0045

# Stayed with same agency from '13 - '14
NSameAgency1314 = sum(AgencyOnly1115$same_agency1314, na.rm=T)    # [1] 14255

# City worker retention rate after one year (ref. 2013)
Retain1314 = NSameAgency1314/Workforce13    # [1] 0.74

# Let's check the numbers:
Workforce13 - NLeftTot13 + NNewHires14

## [1] 18985

Workforce14

## [1] 18985

Workforce13 - NLeftTot13 - NSwitchAgency1314

## [1] 14255

NSameAgency1314

## [1] 14255

# Stayed with same agency from '12 - '14
NSameAgency1214 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1314, na.rm=T)   # [1] 11295

# City worker retention rate after two year (ref. 2012)
Retain1214 = NSameAgency1214/Workforce12    # [1] 0.7


# Stayed with same agency from '11 - '14
NSameAgency1114 = sum(AgencyOnly1115$same_agency1112 & AgencyOnly1115$same_agency1213 
                      & AgencyOnly1115$same_agency1314, na.rm=T)    # [1] 10076

# City worker retention rate after three year (ref. 2011)
Retain1114 = NSameAgency1114/Workforce11    # [1] 0.7

# ------------------------------------------------
# Drop off from '14 to '15
NLeftTot14 = sum(!is.na(AgencyOnly1115$Agency14) & is.na(AgencyOnly1115$Agency15))    # [1] 6491
LeftTot14 = NLeftTot14/Workforce14    # [1] 0.34

NNewHires15 = sum(!is.na(AgencyOnly1115$Agency15) & is.na(AgencyOnly1115$Agency14))   # [1] 1525
NewHires15 = NNewHires15/Workforce15  # [1] 0.11

# Switched agencies from '14 - '15
NSwitchAgency1415 = sum(!AgencyOnly1115$same_agency1415, na.rm=T)   # [1] 191
SwitchAgency1415 = NSwitchAgency1415/Workforce14    # [1] 0.01

# Stayed with same agency from '14 - '15
NSameAgency1415 = sum(AgencyOnly1115$same_agency1415, na.rm=T)    # [1] 12303

# City worker retention rate after one year (ref. 2014)
Retain1415 = NSameAgency1415/Workforce14    # [1] 0.65

# Let's check the numbers:
Workforce14 - NLeftTot14 + NNewHires15

## [1] 14019

Workforce15

## [1] 14019

Workforce14 - NLeftTot14 - NSwitchAgency1415

## [1] 12303

NSameAgency1415

## [1] 12303

# Stayed with same agency from '13 - '15
NSameAgency1315 = sum(AgencyOnly1115$same_agency1314 & AgencyOnly1115$same_agency1415, na.rm=T)   # [1] 11143

# City worker retention rate after two years (ref. 2013)
Retain1315 = NSameAgency1315/Workforce13   # [1] 0.58


# Stayed with same agency from '12 - '15
NSameAgency1215 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1314 
                      & AgencyOnly1115$same_agency1415, na.rm=T)    # [1] 9956

# City worker retention rate after three years (ref. 2012)
Retain1215 = NSameAgency1215/Workforce12    # [1] 0.62


# Stayed with same agency from '11 - '15
NSameAgency1115 = sum(AgencyOnly1115$same_agency1213 & AgencyOnly1115$same_agency1112 
                      & AgencyOnly1115$same_agency1314 & AgencyOnly1115$same_agency1415, na.rm=T)    # [1] 9167

# City worker retention rate after four years (ref. 2011)
Retain1115 = NSameAgency1115/Workforce11    # [1] 0.63

Let’s organize the numbers into tables and visualize them.

City Employee Retention 2011-2015
	Beginning Workforce (n)	New Hires (n)	Left City After One Yr (n)	Switched Agencies (n)	Same Agencies After One Yr (n)	Same Agencies After Two Yrs (n)	Same Agencies After Three Yrs (n)	Same Agencies After Four Yrs (n)
2011	14,457	NA	1,845	84	12,528	11,070	10,076	9,167
2012	16,052	3,440	2,754	337	12,961	11,295	9,956	NA
2013	19,140	5,842	4,799	86	14,255	11,143	NA	NA
2014	18,985	4,644	6,491	191	12,303	NA	NA	NA
2015	14,019	1,525	NA	NA	NA	NA	NA	NA

City Employee Retention Rates 2011-2015
	Beginning Workforce (%)	Hiring Rates (%)	Leaving Rates (%)	Switching Rates (%)	Remained After One Yr (%)	Remained After Two Yrs (%)	Remained After Three Yrs (%)	Remained After Four Yrs (%)
2011	100	NA	12.76	0.58	86.66	76.57	69.7	63.41
2012	100	21.43	17.16	2.1	80.74	70.37	62.02	NA
2013	100	30.52	25.07	0.45	74.48	58.22	NA	NA
2014	100	24.46	34.19	1.01	64.8	NA	NA	NA
2015	100	10.88	NA	NA	NA	NA	NA	NA

At first glance, one may be surprised by the appearance of progressively worsening retention rates of the City workforce over the years. Indeed, you may even question whether these numbers are believable. I had the same question, and here is what I found: The retention rate of 65% from 2014 to 2015 seems really low. However, that’s because we did not account for the fact that youth programs accounted for 4,888 employees in 2014 and none in 2015. In fact, those 4,888 participants amounted to 75% of the workforce reduction in 2015. So, much of the workforce fluctuation we see in this plot were related to the three youth programs the City hosted between 2011 and 2014.

Knowing that the above statistics are biased by the youth programs, which by design were not meant to keep participants on long-term basis, I decided to recalculate the statistics for City employees who were not participants of any of the three youth programs over the years.

Before we leave the summary table and plot though, I want to point out that over the years the number of employees who switched jobs among different City agencies remained small, between 0.45% and 2.1%. This result carries two pieces of information: First, it shows that the majority of employee attrition was due to employees leaving the City workforce all together. Secondly, it shows that City workers, if they chose to stay working for the City, liked to stay with the same agencies.

In the following, let’s isolate a data set to include only City workers who were not participants of any of the three youth programs.

youth_idx = grepl("Youth", Agency1115$Agency14) | grepl("Youth", Agency1115$Agency13) | 
            grepl("Youth", Agency1115$Agency12) | grepl("Youth", Agency1115$Agency11)

AgencyNoYouth1115 = AgencyOnly1115[!youth_idx,]

Removing youth program participants reduces the number of records from 29,425 to 20,134 in the years between 2011 and 2015. Let’s run the same analysis as before and look at the numbers.

City Employee Retention 2011-2015
(Not Including Youth Programs Participants)
	Beginning Workforce (n)	New Hires (n)	Left City After One Yr (n)	Switched Agencies (n)	Same Agencies After One Yr (n)	Same Agencies After Two Yrs (n)	Same Agencies After Three Yrs (n)	Same Agencies After Four Yrs (n)
2011	14,108	NA	1,511	78	12,519	11,063	10,069	9,167
2012	14,235	1,638	1,640	321	12,274	11,012	9,956	NA
2013	14,325	1,730	1,737	71	12,517	11,136	NA	NA
2014	14,067	1,479	1,612	171	12,284	NA	NA	NA
2015	13,960	1,505	NA	NA	NA	NA	NA	NA

City Employee Retention Rates 2011-2015
(Not Including Youth Programs Participants)
	Beginning Workforce (%)	Hiring Rates (%)	Leaving Rates (%)	Switching Rates (%)	Remained After One Yr (%)	Remained After Two Yrs (%)	Remained After Three Yrs (%)	Remained After Four Yrs (%)
2011	100	NA	10.71	0.55	88.74	78.42	71.37	64.98
2012	100	11.51	11.52	2.26	86.22	77.36	69.94	NA
2013	100	12.08	12.13	0.5	87.38	77.74	NA	NA
2014	100	10.51	11.46	1.22	87.32	NA	NA	NA
2015	100	10.78	NA	NA	NA	NA	NA	NA

This plot shows a stable City workforce, which is what most of us expect. Specifically, it shows that the City has a one-year retention rate of 87%, a two-year retention rate of 78%, a three-year retention rate of 71%, and a four-year retention rate of 65%. That the attrition rate decreases over the year shows that the workforce stabilizes as worker tenure increases.

3.2.1 Retention Rates by Agency

Let’s break down the aggregate numbers to get a clearer picture of how the City’s workforce changed over the years. Note, I included Youth Programs participants in the following analysis since it is useful to know whether a Youth Program participant stayed with the same program or switched to work for a different City agency.

same_Agency1112 = as.data.frame(table(AgencyOnly1115$Agency12[AgencyOnly1115$same_agency1112==T]),
                                responseName = "11-12", stringsAsFactors = FALSE)
same_Agency1113 = as.data.frame(table(AgencyOnly1115$Agency13[AgencyOnly1115$same_agency1112==T 
                                                              & AgencyOnly1115$same_agency1213==T]),
                                responseName = "11-13", stringsAsFactors = FALSE)
same_Agency1114 = as.data.frame(table(AgencyOnly1115$Agency14[AgencyOnly1115$same_agency1112==T 
                                                              & AgencyOnly1115$same_agency1213==T 
                                                              & AgencyOnly1115$same_agency1314==T]),
                                responseName = "11-14", stringsAsFactors = FALSE)
same_Agency1115 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1112==T 
                                                              & AgencyOnly1115$same_agency1213==T 
                                                              & AgencyOnly1115$same_agency1314==T 
                                                              & AgencyOnly1115$same_agency1415==T]),
                                responseName = "11-15", stringsAsFactors = FALSE)

same_Agency1213 = as.data.frame(table(AgencyOnly1115$Agency13[AgencyOnly1115$same_agency1213==T]),
                                responseName = "12-13", stringsAsFactors = FALSE)
same_Agency1214 = as.data.frame(table(AgencyOnly1115$Agency14[AgencyOnly1115$same_agency1213==T 
                                                              & AgencyOnly1115$same_agency1314==T]),
                                responseName = "12-14", stringsAsFactors = FALSE)
same_Agency1215 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1213==T 
                                                              & AgencyOnly1115$same_agency1314==T 
                                                              & AgencyOnly1115$same_agency1415==T]),
                                responseName = "12-15", stringsAsFactors = FALSE)

same_Agency1314 = as.data.frame(table(AgencyOnly1115$Agency14[AgencyOnly1115$same_agency1314==T]),
                                responseName = "13-14", stringsAsFactors = FALSE)
same_Agency1315 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1314==T 
                                                              & AgencyOnly1115$same_agency1415==T]),
                                responseName = "13-15", stringsAsFactors = FALSE)

same_Agency1415 = as.data.frame(table(AgencyOnly1115$Agency15[AgencyOnly1115$same_agency1415==T]),
                                responseName = "14-15", stringsAsFactors = FALSE)


TotSameAgency1115 = merge(same_Agency1112,same_Agency1113,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1114,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1115,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1213,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1214,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1215,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1314,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1315,by="Var1",all = TRUE)
TotSameAgency1115 = merge(TotSameAgency1115,same_Agency1415,by="Var1",all = TRUE)


names(TotSameAgency1115)[1]="Agency"

# Convert all NA's into 0
for (i in 2:11) {
  idx <- which(is.na(TotSameAgency1115[[i]]))
  TotSameAgency1115[[i]][idx] <- 0
}

# dim(TotSameAgency1115)

# The Agencies "Wage Commissioner" and "Youth Cust" did not retain any employees over the years
RetentionRateByAgency = rbind(TotSameAgency1115, c("Wage Commissioner", numeric(10)), c("Youth Cust", numeric(10))) %>%
  arrange(Agency) 

for (i in 2:11) {
  RetentionRateByAgency[[i]] <- as.numeric(RetentionRateByAgency[[i]])
}


RetentionRateByAgency$X11.12=RetentionRateByAgency$X11.12/TotAgency1115$X2011*100
RetentionRateByAgency$X11.13=RetentionRateByAgency$X11.13/TotAgency1115$X2011*100
RetentionRateByAgency$X11.14=RetentionRateByAgency$X11.14/TotAgency1115$X2011*100
RetentionRateByAgency$X11.15=RetentionRateByAgency$X11.15/TotAgency1115$X2011*100

RetentionRateByAgency$X12.13=RetentionRateByAgency$X12.13/TotAgency1115$X2012*100
RetentionRateByAgency$X12.14=RetentionRateByAgency$X12.14/TotAgency1115$X2012*100
RetentionRateByAgency$X12.15=RetentionRateByAgency$X12.15/TotAgency1115$X2012*100

RetentionRateByAgency$X13.14=RetentionRateByAgency$X13.14/TotAgency1115$X2013*100
RetentionRateByAgency$X13.15=RetentionRateByAgency$X13.15/TotAgency1115$X2013*100

RetentionRateByAgency$X14.15=RetentionRateByAgency$X14.15/TotAgency1115$X2014*100

The following set of plots shows the retention rates of each of the 37 City agencies over one-, two-, three-, and four-year periods.

The above one-year retention rates plot shows some extreme cases. For example, Community Relations, Council Services, Fire and Police retirement Sys Admin, Legislative Reference, Municipal and Zoning Appeals, Special City Services, and War Memorial Commission had been able to retain 100% of their workforce in some years. At the same time, the six agencies on the right hand side of the plot had experienced a complete loss of their workforce at least once over the five years.

Clearly, those agencies that achieved 100% retention rate were doing a good job in retaining their employees. At the same time, these agencies were relatively small, most of them had less than 10 employees (see workforce plot), so it is not entirely fair to compare them with agencies that had thousands of employees, such as the Police Department, the Department of Public Works, or the Fire Department.

Those who experience 0% retention rate can be separated into two groups: In one group the agencies (e.g., Temp Assist for Needy Families) simply no longer existed; in the other group the agencies (e.g., the youth programs) simply did not mean to keep employees on long term basis. As a result, the 0% retention rate was not too surprising.

In this four-year retention rates plot, Legislative Reference stands out to have the highest retention rate across all City agencies. There are at least two factors contributing to this high retention rate: First, Legislative Reference is a relatively small agency, having 6 people in two of the years and 5 in the other three years. Secondly, employee tenure tend to be high at Legislative Reference, as we found from the in-depth analysis of the 2015 data.

The Fire Department had the second highest retention rate over a four-year period. Given that it had over 1500 employees over the years, this is indeed an achievement. The third highest retention rate belongs to the Dept of Public Works, which had over 2300 employees and among whom was the longest-tenure employee in all City agencies.

3.3 Distribution of Gross Pay

First let’s take a look at the pay distribution over the years.

gross_pay1115 <- Agency1115 %>% 
  select(GrossPay11, GrossPay12, GrossPay13, GrossPay14, GrossPay15) %>% 
  gather(Yr, GrossPay, na.rm=TRUE)
lut <- c("GrossPay15"="2015","GrossPay14"="2014","GrossPay13"="2013",
         "GrossPay12"="2012","GrossPay11"="2011")
gross_pay1115$Yr <- lut[gross_pay1115$Yr]

The distributions for all five years clearly skew to the right (the end for higher gross pay), suggesting that a small number of employees received much higher pay than others–we expect them to be the managers, heads of departments/agencies.

In addition, the distributions are bimodal and possibly trimodal. The peaks on the left in 2012 - 2014’s distributions are most likely due to the large number of participants of the youth programs. The antimode/valley at ~$20,000 might be a natural breaking point between part-time and full-time employees. Among those who were paid ~$20,000 and above, the distributions shows two peaks: The main peak with lower pay corresponds to the rank-and-file employees, while the other peak with higher pay represents those in leadership positions. Given that a position’s pay also depends on the nature of the work and tenure of the employee, the division between the two groups is not likely to be clear-cut.

Let’s look at a numerical summary of the pay distributions.

GrossPay1115 <- select(Agency1115, GrossPay11, GrossPay12, GrossPay13, GrossPay14, GrossPay15)
summary(GrossPay1115)

##    GrossPay11       GrossPay12       GrossPay13       GrossPay14       GrossPay15    
##  Min.   :     7   Min.   :   -77   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.: 27609   1st Qu.: 23871   1st Qu.: 24536   1st Qu.: 22065   1st Qu.: 34666  
##  Median : 41100   Median : 43061   Median : 44001   Median : 45267   Median : 52753  
##  Mean   : 42323   Mean   : 43176   Mean   : 44313   Mean   : 45634   Mean   : 55782  
##  3rd Qu.: 58322   3rd Qu.: 63551   3rd Qu.: 65615   3rd Qu.: 67816   3rd Qu.: 75768  
##  Max.   :170423   Max.   :245353   Max.   :238681   Max.   :238772   Max.   :235768  
##  NA's   :14968    NA's   :13373    NA's   :13504    NA's   :13664    NA's   :15675

You may notice the negative pays in the 2012 data. We have discussed them in the anomalies section of this report, so I will not repeat it here. The NA counts in each year’s summary are due to combining all five year’s data into one master data set–not all employees worked for the City for all five years.

The above summary shows that median salary of City employees rose above $50,000 for the first time in 2015. At first, I did not think that this was due to a pay raise given to City employees in 2015. Instead, remembering that 2015 data included zero youth program participant instead of the thousands we saw in previous years, I assumed that removing thousands of lowest pay entries from the 2015 distribution would naturally resulted in a higher median value for the 2015 pay distribution.

To check this hypothesis, I look at the pay summary of City employees who were not youth program participants.

NYGrossPay1115 <- Agency1115[!youth_idx,] %>% select(GrossPay11, GrossPay12, GrossPay13, 
                                                     GrossPay14, GrossPay15)
summary(NYGrossPay1115)

##    GrossPay11       GrossPay12       GrossPay13       GrossPay14       GrossPay15    
##  Min.   :    60   Min.   :   -77   Min.   :     0   Min.   :     0   Min.   :     0  
##  1st Qu.: 28406   1st Qu.: 31899   1st Qu.: 32082   1st Qu.: 33075   1st Qu.: 34862  
##  Median : 41821   Median : 47224   Median : 47859   Median : 50031   Median : 52943  
##  Mean   : 43316   Mean   : 48544   Mean   : 49483   Mean   : 51866   Mean   : 55983  
##  3rd Qu.: 58715   3rd Qu.: 65584   3rd Qu.: 67802   3rd Qu.: 70674   3rd Qu.: 75866  
##  Max.   :170423   Max.   :245353   Max.   :238681   Max.   :238772   Max.   :235768  
##  NA's   :6026     NA's   :5899     NA's   :5916     NA's   :6307     NA's   :6440

Clearly, my assumption was not correct! On the contrary, after removing youth programs participants, the remaining distributions suggest a steady increase in salary for City employees every year since 2011!

Let’s look at the overall pay distributions without youth programs participants. In the following histograms, the solid vertical line in the middle marks the median pay, and the dashed vertical lines mark the 25 and 75 percentiles of pay respectively.

3.4 Salary Spending by Agency

In the following, I aggregate salary data based on the agency of the employee, and examine the relation between number of positions and total salary spending by agency.

SalaryExpend15 <- Agency1115 %>% group_by(Agency15) %>% summarise(TotExpend15 = sum(GrossPay15, na.rm=TRUE)) %>% na.omit()
SalaryExpend14 <- Agency1115 %>% group_by(Agency14) %>% summarise(TotExpend14 = sum(GrossPay14, na.rm=TRUE)) %>% na.omit()
SalaryExpend13 <- Agency1115 %>% group_by(Agency13) %>% summarise(TotExpend13 = sum(GrossPay13, na.rm=TRUE)) %>% na.omit()
SalaryExpend12 <- Agency1115 %>% group_by(Agency12) %>% summarise(TotExpend12 = sum(GrossPay12, na.rm=TRUE)) %>% na.omit()
SalaryExpend11 <- Agency1115 %>% group_by(Agency11) %>% summarise(TotExpend11 = sum(GrossPay11, na.rm=TRUE)) %>% na.omit()

TotSalaryExpend1115 = merge(SalaryExpend11,SalaryExpend12,by.x="Agency11", by.y="Agency12",all = TRUE)
TotSalaryExpend1115 = merge(TotSalaryExpend1115,SalaryExpend13,by.x="Agency11", by.y="Agency13",all = TRUE)
TotSalaryExpend1115 = merge(TotSalaryExpend1115,SalaryExpend14,by.x="Agency11", by.y="Agency14",all = TRUE)
TotSalaryExpend1115 = merge(TotSalaryExpend1115,SalaryExpend15,by.x="Agency11", by.y="Agency15",all = TRUE)

names(TotSalaryExpend1115)[1]="Agency"

# Convert all NA's into 0
for (i in 2:6) {
  idx <- which(is.na(TotSalaryExpend1115[[i]]))
  TotSalaryExpend1115[[i]][idx] <- 0
}

Salary spending of the different departments had been steady, and a few departments even saw a moderate increase over the years. One striking feature, if you recall the number of positions by agency plot earlier, is the almost none-existent spending on the youth programs in comparison with other City agencies. This reinforce my belief that participants of the youth programs were not the traditional employees in other City agencies.

While most of the increase were relatively modest and gradual, two features jump out of the plot. First, spending of the Mayor’s Office was double from 2012 to 2013 and had stayed at that level since. Secondly, spending at the Police Department experienced two stages of large increase: one from 2011 to 2012, and the other from 2014 to 2015. Each of these increases was about $30,000,000.

In comparison with the number of positions by agency plot, which shows a steady decrease of the police workforce since 2012, the rather large salary spending increase by the Police Department in 2015 is even more intriguing. Did the increase impact majority of the personnel in the police workforce? Or, was it the result of hiring a few top managers? I will answer this question in a later section.

Here are the numbers that generated the plot above.

Salary Spending by Agency 2011-2015
	2011 ($)	2012 ($)	2013 ($)	2014 ($)	2015 ($)
Circuit Court	6,992,230	7,418,225	6,905,444	6,930,433	7,600,118
City Council	2,930,632	3,220,308	3,486,672	3,642,744	3,699,890
Civil Rights & Wage Enforce	0	0	641,071	568,643	501,455
Community Relations	431,928	609,175	0	0	0
Comptroller’s Office	4,064,869	4,602,130	4,743,322	4,665,648	5,098,746
Council Services	321,864	382,368	443,624	365,916	388,532
Dept of Finance	10,617,647	12,368,165	12,860,218	13,376,656	14,156,029
Dept of Health	29,572,629	32,256,450	31,131,996	30,085,148	31,313,971
Dept of Public Works	84,146,174	94,566,693	95,628,158	99,802,845	106,487,490
Dept of Transportation	42,344,106	48,484,935	47,636,882	48,785,120	51,615,941
Elections	57,217	143,771	96,500	87,038	74,679
Employee’s Retirement Sys	1,224,019	1,277,099	1,310,384	1,575,017	1,757,640
Enoch Pratt Free Library	14,084,339	16,056,586	16,891,040	17,076,783	17,344,375
Fire & Police Retirement Sys Admin	1,019,820	1,200,031	1,224,966	1,332,487	1,581,667
Fire Department	101,253,987	113,689,987	113,686,926	121,142,250	125,612,177
General Services	16,069,391	18,692,706	19,551,601	19,402,148	17,874,549
Housing & Community Dev	20,302,024	23,216,475	18,515,544	18,877,111	18,598,793
Human Resourcese	2,108,471	2,271,436	2,625,640	3,109,768	3,062,501
Law Department	5,720,770	6,609,865	6,776,727	6,963,661	7,566,482
Legislative Reference	320,193	358,489	274,506	292,461	326,889
Liquor License Board	1,191,696	1,414,959	1,304,304	863,551	1,037,420
Mayor’s Office	14,657,869	15,676,067	29,291,244	31,350,202	34,096,255
Municipal & Zoning Appeals	206,074	290,158	268,809	240,884	245,083
Office of Employment Dev	8,737,769	9,126,557	9,241,597	8,113,639	7,988,847
Orphan’s Court	228,469	317,985	342,348	311,950	328,993
Planning Department	2,110,358	2,513,763	2,470,085	2,444,078	2,919,818
Police Department	199,573,347	229,243,749	230,392,751	228,004,357	256,359,188
Recreation & Parks	13,458,586	15,113,724	15,048,382	16,059,910	17,008,779
Sheriff’s Office	7,325,127	8,561,306	9,110,996	10,199,283	10,839,221
Special City Services	22,831	31,774	41,331	21,215	35,563
States Attorneys Office	19,729,108	21,037,262	21,653,893	21,593,103	21,486,610
Temp Assist for Needy Families	75,982	106,872	0	0	0
Wage Commissioner	154,204	0	0	0	0
War Memorial Commission	144,577	166,438	0	0	0
Youth Cust	502,534	52,877	3,640	0	0
Youth Summer	0	1,749,083	1,635,391	1,719,443	0
Youth Temp Adult	167,336	227,664	269,957	235,644	0

3.5 A Power-Law Relation between The Size of An Agency And Salary Spending

Combining results from the employment and salary analyses above, the following plot explores the relation between the City’s workforce and total salary spending.

In the plot above, the colors of the dots are different for different years, and the sizes of the dots are proportional to the sizes of the workforce. The visualization reinforces some of the results we obtained earlier: The years 2013 and 2014 had particularly large workforce because of the addition of youth programs participants. At the same time, these were not traditional employees since they certainly weren’t paid much–here we see that the total salary spending didn’t increase much from 2012 to 2014 despite the fact that thousands of people were added to the payroll. In 2015, youth programs participants weren’t included in the salary data file, and so the City’s workforce appears to have shrunk considerably from 2014 to 2015. Nonetheless, the overall increase in salary spending from 2014 to 2015 is one of the largest over the years. Here are the numbers.

Employment & Salary Spending 2011-2015
	Size of Workforce (n)	Salary Spending ($)
2011	14,457	611,868,176
2012	16,052	693,055,130
2013	19,140	705,505,949
2014	18,985	719,239,135
2015	14,019	767,007,701

Let’s look at the numbers at a more granular level and examine salary spending vs. number of employees by agency.

The above is a log-log plot. It shows each year’s data in a different color. Visually most of the data points lie along a straight line, so I chose to fit a power law to the data. The best fit lines are displayed in the same colors as the data that were used to derive them. One may notice that the power law describes the 2015 data particularly well because there are few outlier points. The best fit parameter values, RMSE, and the $R^2$ of the fits are given in the following table.

Best-Fitting Parameters
	Coefficient	Exponent	RMSE	R-sq
2011	24.82	1.04	0.85	0.86
2012	27.52	1.02	1.01	0.8
2013	28.26	1.01	1.2	0.75
2014	39.1	0.96	1.05	0.77
2015	29.14	1.08	0.49	0.95

3.5.1 Case study: Workforce reduction that coincided with increased salary spending

The increase in the Police Department’s salary spending from 2014 to 2015 is most intriguing, so I chose it as the subject of my focus study. The aim was to find out where the extra spending went into during this time period. First, let’s extract a subset of the data for those who were in the police workforce in 2014 or 2015 and calculate the relevant statistics.

police1415 = BmoreSalaries1115 %>%
  subset(BmoreSalaries1115$Agency15=="Police Department" 
         | BmoreSalaries1115$Agency14=="Police Department") %>%
  select(Name, HireDate, JobTitle15, JobTitle14, Agency15, Agency14, 
         GrossPay15, GrossPay14, AnnualSalary15, AnnualSalary14)

policeWF14 = sum(police1415$Agency14=="Police Department", na.rm=T)    # [1] 3211
policeWF15 = sum(police1415$Agency15=="Police Department", na.rm=T)    # [1] 3097

LeftIdx = (police1415$Agency14=="Police Department") & ((police1415$Agency15!="Police Department") 
                                                        | is.na(police1415$Agency15))
NLeft = sum(LeftIdx, na.rm=T)    # [1] 328

RetainIdx = (police1415$Agency14=="Police Department") & (police1415$Agency15=="Police Department")
NRetain = sum(RetainIdx, na.rm=T)    # [1] 2883
RetainRate = NRetain/policeWF14    # [1] 0.9

NewHireIdx = (police1415$Agency15=="Police Department") & (is.na(police1415$Agency14) 
                                                           | (police1415$Agency14!="Police Department"))
NNewHire = sum(NewHireIdx)    # [1] 214

The police workforce in 2014 and 2015 were 3,211 and 3,097 respectively. 328 individuals left the department in 2014, and 214 were hired into the department in 2015. Overall, the workforce retention rate from 2014 to 2015 was 90%. The following plot shows the department’s salary distributions.

Clearly, both years’ distributions skew to the right, meaning that a few individuals received much higher paid than most of the other employees. In general, the 2015 distribution appears to have shifted to the right, which indicates that the increase spending had a broad impact. Let’s look more specifically at the distribution of the pay difference in the plot below.

The above plot shows that for those who remained in the police workforce from 2014 to 2015, the distribution of pay difference was nearly normally distributed. The average of the pay difference was $13,700, and the median of the pay difference was $12,000. The 25 and 75 percentiles were at $6,740 and $18,700. Again, this plot reinforces our earlier conclusion that the increase in salary spending had impact on majority of the personnel in the Police Department.

Finally, let’s take a look at the data of those who garnered the largest pay increase from 2014 to 2015.

police1415 <- police1415 %>%
  mutate(PayInc=police1415$GrossPay15-police1415$GrossPay14) %>%
  arrange(desc(PayInc))

police1415 %>%
  select(Name, HireDate, JobTitle15, JobTitle14, GrossPay15, GrossPay14, 
         AnnualSalary15, AnnualSalary14, PayInc) %>%
  head(n=10)

##                  Name   HireDate             JobTitle15             JobTitle14 GrossPay15 GrossPay14
## 1       Banach,Paul F 2014-05-29 Operations Officer III POLICE COMMAND STAFF I     105820       3923
## 2    Swinton,Dwayne L 1990-10-23        POLICE SERGEANT        POLICE SERGEANT     134567      40409
## 3    De Jesus,Michael 1994-02-09         POLICE OFFICER         POLICE OFFICER     100900       8049
## 4       Adams,Damon R 2003-10-23         POLICE OFFICER         POLICE OFFICER      97765       7688
## 5   Gladstone,Keith A 1992-11-20        POLICE SERGEANT        POLICE SERGEANT     132188      52263
## 6  Fleurimond,Richard 1999-10-12   POLICE OFFICER (EID)   POLICE OFFICER (EID)     166056      86326
## 7      Walrath,Mark A 1992-05-05  POLICE LIEUTENANT EID  POLICE LIEUTENANT EID     194656     115639
## 8     Camarote,Marc J 1996-10-03    POLICE SERGEANT EID    POLICE SERGEANT EID     210509     136940
## 9     Paradise,John J 1994-03-28  POLICE LIEUTENANT EID  POLICE LIEUTENANT EID     161310      92518
## 10      Leitch,Eric R 2000-02-10        POLICE SERGEANT        POLICE SERGEANT     182621     119683
##    AnnualSalary15 AnnualSalary14 PayInc
## 1          101200         102000 101897
## 2           90273          81327  94158
## 3           76496          68322  92852
## 4           68013          62380  90078
## 5           88682          79249  79925
## 6           74290          66320  79730
## 7          103808          92706  79017
## 8           86874          77564  73568
## 9          101999          91156  68793
## 10          84130          75288  62938

It appears that those who received the largest pay increase were not new hires. Instead, for some of them, for some reason, their were paid well below the projected salaries in 2014, and in 2015 they received the normal pay for their positions. At the same time, there were also others who received close to $80,000 of pay increase on top of their normal salary, resulting in double the amount of salary for their positions. Could those be overtime pay?

Whatever the reason behind the pay increase, it appears that the pay scale of the Police Department is higher than that of other agencies. The two boxplots below compare the salary distributions of the Police Department and all City agencies in 2014 and 2015. It shows that Police Department had a higher median pay than the rest of the City and, indeed, it got a larger pay increase from 2014 to 2015 than the rest of the City.

4 Conclusion

So far in this project, I have examined the employment and salary statistics of Baltimore City employees based on the 2011-2015 data from the Baltimore Open Data Portal.

In terms of employment statistics, I found that the Police Department had the largest number of employees among all City agencies over the 2011-2015 time period. The size of the police workforce, nonetheless, has declined modestly since 2012. I then examined the City-wide employee retention rates and the retention rates of each City agencies. Overall, Baltimore City had a one-year employee retention rate of 87%, a two-year retention rate of 78%, a three-year retention rate of 71%, and a four-year retention rate of 65%. The number of employees who switched jobs among different City agencies were small, about 1.1%. The result indicates that the majority of employee attrition was due to employees leaving the City workforce all together. It also means that City workers, if they chose to stay working for the City, liked to stay with the same agencies. Analysis at a more granular scale shows that smaller agencies, if they continue to exist over the years, had higher retention rates than larger agencies. Nonetheless, over long-term periods (four years), larger departments such as the Fire Department and the Dept of Public Works have been able to achieve some of the highest retention rates among all City agencies.

The salary distributions of City employees show that every year since 2011 there has been a steady, albeit modest, pay increase. One interesting observation was the Police Department’s workforce reduction that coincided with its increase in salary spending. Upon further examination of the data, I found that the extra spending had impact on majority of the personnel in the Police Department, and was not the results of hiring a few high paying individuals. The data also showed that the Police Department had a higher pay scale than all the other City agencies.

The data products from this analysis are available for download here.

Analyzing Baltimore City Employee Salaries in 2011-2015 Using R

Johann Liang

May 25, 2016

1 Introduction

2 Data Preparation

2.1 Convert Salary Data from Character Strings to Numerics

2.2 Standardize Agency Names

2.2.1 Remove redundant location reference in the `Agency` column

2.2.2 Standardize the spellings of agency names

2.2.3 Group subsidiary departments with their umbrella agencies

2.3 Flag Records with No `HireDate`

2.4 Trim Trailing Space in Data Entries

2.5 Merge Data into One Master Data Set

2.6 Anomalies in Data Entries

2.6.1 Negative gross pay in 2012

2.6.2 Unnamed employees at the police department

3 Analysis & Results

3.1 Distribution of Workforce among Agencies

3.2 Retention of City Employees

3.2.1 Retention Rates by Agency

3.3 Distribution of Gross Pay

3.4 Salary Spending by Agency

3.5 A Power-Law Relation between The Size of An Agency And Salary Spending

3.5.1 Case study: Workforce reduction that coincided with increased salary spending

4 Conclusion

Analyzing Baltimore City Employee Salaries in 2011-2015 Using R

Johann Liang

May 25, 2016

1 Introduction

2 Data Preparation

2.1 Convert Salary Data from Character Strings to Numerics

2.2 Standardize Agency Names

2.2.1 Remove redundant location reference in the Agency column

2.2.2 Standardize the spellings of agency names

2.2.3 Group subsidiary departments with their umbrella agencies

2.3 Flag Records with No HireDate

2.4 Trim Trailing Space in Data Entries

2.5 Merge Data into One Master Data Set

2.6 Anomalies in Data Entries

2.6.1 Negative gross pay in 2012

2.6.2 Unnamed employees at the police department

3 Analysis & Results

3.1 Distribution of Workforce among Agencies

3.2 Retention of City Employees

3.2.1 Retention Rates by Agency

3.3 Distribution of Gross Pay

3.4 Salary Spending by Agency

3.5 A Power-Law Relation between The Size of An Agency And Salary Spending

3.5.1 Case study: Workforce reduction that coincided with increased salary spending

4 Conclusion

2.2.1 Remove redundant location reference in the `Agency` column

2.3 Flag Records with No `HireDate`