Abstract: I analyzed the Baltimore City Employee Salaries FY2015 data set at Baltimore Open Data Portal. The data set recorded 14017 employees working for the City in 2015, and it provided employee information in 7 categories, which are name, job title, Agency ID, Agency, Hire Date, Annual Salary, and Gross Pay. After cleaning and preparing the data set, I analyzed the distributions of gross pay and tenure of City employees. The overall gross pay distribution has a median of $52753.13, and the middle 50% of city workers received pay between $34669.21 and $75771.01. The highest salary was $235767.92. The overall tenure distribution has a median of 9.95 years. The longest tenure was 63.02 years. The middle 50% of people had put in 4.58 to 19.78 years working for the City. While analyzing the distributions of gross pay and tenure with respect to the 30 agencies, I found that the Police Department, with 3097 employees, was the largest employer among different city agencies in 2015. The ratios of highest to median pays among different city agencies were around 3 except Municipal & Zoning Appeals and Recreation & Parks, whose ratios were more than 12. In terms of tenure, Legislative Reference had the longest median tenure (26.56 years) among different agencies.
Tags: data cleaning
data analysis
R
As part of the efforts to provide transparency and build trust in the local government, Baltimore City has given the public online access to a slew of data related to the City’s operation. Among the data sets available at Baltimore Open Data Portal are city employee salaries since 2011. In this project, I will focus on analyzing the salary data from fiscal year 2015. In the follow-up projects, I will look at trends in the salary data from 2011 to 2015 and try to model the pay-scale based on the information available in the data sets.
library(lubridate)
library(dplyr)
library(tidyr)
library(stringr)
library(ggvis)
First, let read in the data file and see what kind of information it contains.
BmoreSalaries2015 <- read.csv("Baltimore_City_Employee_Salaries_FY2015.csv", stringsAsFactors=FALSE)
BmoreSalaries2015 <- tbl_df(BmoreSalaries2015)
glimpse(BmoreSalaries2015)
## Observations: 14,017
## Variables: 7
## $ name (chr) "Aaron,Patricia G", "Aaron,Petra L", "Abaineh,Yohannes T", "Abbene,Anthony M", "Abbe...
## $ JobTitle (chr) "Facilities/Office Services II", "ASSISTANT STATE'S ATTORNEY", "EPIDEMIOLOGIST", "PO...
## $ AgencyID (chr) "A03031", "A29045", "A65026", "A99005", "A40001", "A90005", "A64120", "A99127", "A38...
## $ Agency (chr) "OED-Employment Dev (031)", "States Attorneys Office (045)", "HLTH-Health Department...
## $ HireDate (chr) "10/24/1979", "09/25/2006", "07/23/2009", "07/24/2013", "05/01/2013", "11/28/2014", ...
## $ AnnualSalary (chr) "$55314.00", "$74000.00", "$64500.00", "$46309.00", "$60060.00", "$42702.00", "$6217...
## $ GrossPay (chr) "$53626.04", "$73000.08", "$64403.84", "$59620.16", "$54059.60", "$20250.80", "$8375...
The data set records 14017 employees working for the City in 2015. For each employee, the data set contains information on 7 categories, which are:
names(BmoreSalaries2015)
## [1] "name" "JobTitle" "AgencyID" "Agency" "HireDate" "AnnualSalary" "GrossPay"
We see that all information, including AnnualSalary
, GrossPay
and HireDate
, are read in as character strings. In order to do calculation and make plots using these data, we need to first convert them to the proper data types. In the following, I will first take steps to prepare the data set for analysis, and then use the data to answer the these questions:
These are the major steps I will take to prepare the data set for analysis:
First, let’s convert the name of the name
feature to upper camel case so we don’t have to remember which variable starts with an upper case letter and which one starts with a lower case letter.
names(BmoreSalaries2015)[1] <- "Name"
AnnualSalary
is currently represented as character strings starting with a “$” sign. For analysis, I need to get rid of the “$” signs and convert those values to numeric. Note that AnnualSalary
data do not include cents.
BmoreSalaries2015$AnnualSalary <- str_replace(BmoreSalaries2015$AnnualSalary, "[$]", "")
BmoreSalaries2015$AnnualSalary <- as.numeric(BmoreSalaries2015$AnnualSalary)
Do the same to GrossPay
data. Note that GrossPay
data do include cents.
BmoreSalaries2015$GrossPay <- str_replace(BmoreSalaries2015$GrossPay, "[$]", "")
BmoreSalaries2015$GrossPay <- as.numeric(BmoreSalaries2015$GrossPay)
According to Baltimore Open Data Portal,
Gross salary includes overtime, furloughs, adjustments. Contractual, temp, part-time or summer clerks/fellows, gross reflects actual amounts paid according to terms of employment. Comp & leave time excluded.
On the other hand,
Annual salary is projected over the full fiscal year. Contractual, temp, part-time or summer clerks/fellows, annual salary is the equivalent full-time annual salary. Comp & leave time excluded.
In other words, for someone who worked for the City during the summer months, his/her AnnualSalary
was calculated based on the rate he/she was paid each month in the summer, multiplied with 12 to get the full year salary irrespective of his/her employment situation with the City in months other than the summer. In light of this information, I will use GrossPay
to gauge the actual compensation an employee received.
Looking at the GrossPay
data, we find that 269 individuals with records in the data set did not have GrossPay
information. Upon further investigation, it appears that these individuals’ records did not have information on GrossPay
because they were all hired in the second half of June, leaving them less than a two-week pay cycle before the data were collected.
Not counting the new hires, the following plot shows the distribution of gross pay received by all City employees in 2015. The distribution clearly skews to the right (direction of high gross pay). The median salary was $52753.13. The middle 50% of people received pay between $34669.21 and $75771.01. In the following plot, the red vertical line marks the median pay, and the two blue dashed vertical lines mark the 25 and 75 percentiles of pay respectively.
The highest salary, claimed by Nalewajko Jr,Stephen C of the Police Department (260), was $235767.92. The lowest salary, $0, was recorded for 11 individuals who had job titles such as “SECURITY GUARD CONV. CENTER”, “CONTRACT SERV SPEC II”, “EMT Firefighter Suppression”, “SR COMPANION STIPEND HLTH”, “RECREATION ARTS INSTRUCTOR”, “RECREATION ACTIVITIES ASSISTAN”, “LIFEGUARD I”, and “COMMUNITY AIDE” working at the Convention Center, the Fire Department, the Health Department, and Recreation and Parks. Each of these 11 individuals had at least 2.22 years of service recorded with the City.
I will estimate the tenure of City employees using their HireDate
and the date of data collection.
BmoreSalaries2015$HireDate <- mdy(BmoreSalaries2015$HireDate)
DateOfData15 <- mdy("06-30-2015")
BmoreSalaries2015$YrOfSrv <- round(decimal_date(DateOfData15) - decimal_date(BmoreSalaries2015$HireDate), digits=nDecimalPlace)
The distribution for YrOfSrv
skews to the right, which is consistent with the expectation for an organization that continues to recruit workers to replace those who had either retired or left the organization. The median tenure of City employees was 9.95 years. The longest tenure, 63.02 years, was achieved by Walter,William of the DPW-Water & Waste Water (101). The middle 50% of people had put in 4.58 to 19.78 years working for the City.
The following plot shows the distribution of tenure of City employees in 2015. The red vertical line marks the median tenure, and the two blue dashed vertical lines mark the 25 and 75 percentiles of tenure respectively.
The data set contains three variables JobTitle
, AgencyID
, and Agency
that are related to the position held by a particular individual. Upon close examination, one finds that majority of the entries for Agency
contain a three digit code that repeats the last three digits in AgencyID
.
Given this pattern, I was curious if I can simply use AgencyID
as the job identifier since its first three digits (first one is alphabetical and the second and the third are numeric) seem to identify a particular agency while the last three numeric digits seem to identify something more specific with regard to that agency. For instance, someone who served as “OFFICE SUPERVISOR” in the “TRANS-Traffic Department” was given an AgencyID
of “A90002.” At the same time, someone who served as “CONTRACT SERV SPEC II” in the same department was given the AgencyID
of “A90005.”
Were the last three digits of AgencyID
used to identify job title? If we re-arrange the observations by AgencyID
, we see that AgencyID
does not actually include info of the position. This is because multiple JobTitle
can be found to correspond to the same AgencyID
. As an example, the JobTitle
s within “Mayors Office” with AgencyID
“A01001” are: “Operations Officer I”, “CRIMINAL JUSTICE ASSOCIATE”, “Operations Manager I”, “Senior Criminal Justice Asso”, “Senior Criminal Justice Asso”, etc. This finding rules out the possibility that the last three digits of AgencyID
corresponds to JobTitle
.
If we look further into the data, we will find that a few entries of Agency
’s precede the three digits with the word “Location” (this time the three digits are not inside parenthesis). So, it appears that the last three digits of AgencyID
were used to identify location related info of the agency, not the job title.
So far our investigation suggests that there is no redundancy between JobTitle
and AgencyID
—the two features should be preserved in the data set for modeling. On the other hand, the last three digits in the entries of AgencyID
, which make their appearance in Agency
inconsistently, are redundant. So, in the following I will remove the three digits reference in Agency
(if they appear) and standardize the spellings of entries in Agency
.
Agency
columnSince not all agency names are followed by the three-digit location reference, I will create a new column that consists of agency names only for the ease of grouping agency-wide entries together in later analysis.
The following function removes the three digit location reference (if they appear). It also removes location reference that comes in the form “Location xxx” or simply “(”. The latter case is likely the result of a long entry being truncated in the online data set. Since few departments indicated their employees as part-time, wkly or full-time, I will focus on the inter-departmental differences only. So the function will also remove any “(part-time)” or “(wkly)” designations.
match_paran2b <- function(cstr) {
idx <- str_locate(cstr, "[(]|Location [:digit:]{3}|[:digit:]{3}")
if (is.na(idx[1])) return(cstr)
else {
idx <- idx[1] - 2
return(substring(cstr, 1, idx))
}
}
match_paran3b <- function(cstr_vec) {
unlist(lapply(cstr_vec, match_paran2b))
}
BmoreSalaries2015 <- BmoreSalaries2015 %>% mutate(AgencyDept = match_paran3b(Agency))
Here we check that entries in the new AgencyDept
column match entries in the original Agency
column after we removed location reference:
BmoreSalaries2015 %>%
select(Name, JobTitle, Agency, AgencyDept) %>%
head(n=20) %>%
print.data.frame()
## Name JobTitle Agency AgencyDept
## 1 Aaron,Patricia G Facilities/Office Services II OED-Employment Dev (031) OED-Employment Dev
## 2 Aaron,Petra L ASSISTANT STATE'S ATTORNEY States Attorneys Office (045) States Attorneys Office
## 3 Abaineh,Yohannes T EPIDEMIOLOGIST HLTH-Health Department (026) HLTH-Health Department
## 4 Abbene,Anthony M POLICE OFFICER Police Department (005) Police Department
## 5 Abbey,Emmanuel CONTRACT SERV SPEC II M-R Info Technology (001) M-R Info Technology
## 6 Abbott-Cole,Michelle CONTRACT SERV SPEC II TRANS-Traffic (005) TRANS-Traffic
## 7 Abdal-Rahim,Naim A EMT Firefighter Suppression Fire Department (120) Fire Department
## 8 Abdi,Ezekiel W POLICE SERGEANT Police Department (127) Police Department
## 9 Abdul Adl,Attrice A RADIO DISPATCHER SHERIFF Sheriff's Office (410) Sheriff's Office
## 10 Abdul Aziz,Hajr E LIFEGUARD I R&P-Recreation (part-time) ( R&P-Recreation
## 11 Abdul Aziz,Jennah A LIFEGUARD I R&P-Recreation (part-time) ( R&P-Recreation
## 12 Abdul Aziz,Yaqub M LIFEGUARD I R&P-Recreation (part-time) ( R&P-Recreation
## 13 Abdul Hamid,Umar SECRETARY II Housing & Community Dev (009) Housing & Community Dev
## 14 Abdul Saboor,Jamillah SECRETARY II Enoch Pratt Free Library (054) Enoch Pratt Free Library
## 15 Abdul-Aziz,Muhammad COMMUNITY AIDE R&P-Recreation (part-time) ( R&P-Recreation
## 16 Abdul-Jabbar,Bushra A SOCIAL SERVICES COORDINATOR Housing & Community Dev (015) Housing & Community Dev
## 17 Abdullah,Aisha W OFFICE ASSISTANT III General Services (301) General Services
## 18 Abdullah,Beverly A TYPIST III Housing & Community Dev (004) Housing & Community Dev
## 19 Abdullahi,Sharon M 911 OPERATOR M-R Info Technology (301) M-R Info Technology
## 20 Abebe,Miraf E AUDITOR II COMP-Audits (002) COMP-Audits
To reduce clutter in the data set, I will replace the Agency
column with the AgencyDept
column.
BmoreSalaries2015 <- BmoreSalaries2015 %>%
select(Name, JobTitle, AgencyID, AgencyDept, HireDate, AnnualSalary, GrossPay, YrOfSrv)
It appears that some agencies have multiple designations. For example, “HLTH-Health Department”, “HLTH-Health Dept”, “HLTH-Heatlh Dept.”, and “HLTH-Health Dept.” all seem to reference the same agency. Similarly, “Mayor’s Office” and “Mayors Office” clearly mean the same agency.
BmoreSalaries2015$AgencyDept <- BmoreSalaries2015$AgencyDept %>% str_replace("HLTH-Health Department", "HLTH-Health Dept")
BmoreSalaries2015$AgencyDept <- BmoreSalaries2015$AgencyDept %>% str_replace("HLTH-Health Dept.", "HLTH-Health Dept")
BmoreSalaries2015$AgencyDept <- BmoreSalaries2015$AgencyDept %>% str_replace("HLTH-Heatlh Dept.", "HLTH-Health Dept")
BmoreSalaries2015$AgencyDept <- BmoreSalaries2015$AgencyDept %>% str_replace("Mayors Office", "Mayor's Office")
If we examine the data set at this point, we find 674 unique values of AgencyID
’s and 52 of AgencyDept
’s. For an ordinary citizen, this level of refined distinctions is probably not very intelligible. Indeed, if we look more closely at this large number of agencies, it becomes clear that some of the “agencies” on the list are in fact departments within a larger umbrella agency that ordinary citizens would have recognized. For example, “R&P-Administration”, “R&P-Parks” and “R&P-Recreation” are likely departments within the better known “Recreation & Parks.” Similarly, we expect that “COMP-Audits”, “COMP-Communication Services”, “COMP-Comptroller’s Office”, and “COMP-Real Estate” are all parts of the “Comptroller’s Office.”
In addition to having a large agency being represented by its subdivisions, some agency’s names have been shortened in a way that’s hard to decipher by an untrained eye. For example, “ERS” is meant to present “EMPLOYEE’S RETIREMENT SYSTEM.” Similarly, “FPR Admin” means “FIRE & POLICE RETIREMENT Admin.”
Since the original agency names correspond to different AgencyID
’s, which probably indicates that they are funded separately and follow different pay scales, we don’t want to lose this info. In the following, I will create a new variable that includes only the names of the highest level of agencies identifiable from the City’s online directory. I will also try to convert shortened names of agencies to their corresponding long forms.
get_agency_name <- function(cstr) {
if (str_detect(cstr, "COMP")) cstr <- "Comptroller's Office"
if (str_detect(cstr, "DPW")) cstr <- "Dept of Public Works"
if (str_detect(cstr, "ERS")) cstr <- "Employee's Retirement Sys"
if (str_detect(cstr, "FIN")) cstr <- "Dept of Finance"
if (str_detect(cstr, "FPR")) cstr <- "Fire & Police Retirement Sys Admin"
if (str_detect(cstr, "HLTH")) cstr <- "Dept of Health"
if (str_detect(cstr, "HR")) cstr <- "Human Resourcese"
if (str_detect(cstr, "M-R")) cstr <- "Mayor's Office"
if (str_detect(cstr, "R&P")) cstr <- "Recreation & Parks"
if (str_detect(cstr, "TRANS")) cstr <- "Dept of Transportation"
if (str_detect(cstr, "OED")) cstr <- "Office of Employment Dev"
return(cstr)
}
get_agency_name2 <- function(cstr_vec) {
unlist(lapply(cstr_vec, get_agency_name))
}
BmoreSalaries2015 <- BmoreSalaries2015 %>% mutate(Agency = get_agency_name2(AgencyDept))
After cleaning, here is what the data look like:
print.data.frame(head(BmoreSalaries2015, n=10))
## Name JobTitle AgencyID AgencyDept HireDate
## 1 Aaron,Patricia G Facilities/Office Services II A03031 OED-Employment Dev 1979-10-24
## 2 Aaron,Petra L ASSISTANT STATE'S ATTORNEY A29045 States Attorneys Office 2006-09-25
## 3 Abaineh,Yohannes T EPIDEMIOLOGIST A65026 HLTH-Health Dept 2009-07-23
## 4 Abbene,Anthony M POLICE OFFICER A99005 Police Department 2013-07-24
## 5 Abbey,Emmanuel CONTRACT SERV SPEC II A40001 M-R Info Technology 2013-05-01
## 6 Abbott-Cole,Michelle CONTRACT SERV SPEC II A90005 TRANS-Traffic 2014-11-28
## 7 Abdal-Rahim,Naim A EMT Firefighter Suppression A64120 Fire Department 2011-03-30
## 8 Abdi,Ezekiel W POLICE SERGEANT A99127 Police Department 2007-06-14
## 9 Abdul Adl,Attrice A RADIO DISPATCHER SHERIFF A38410 Sheriff's Office 1999-09-02
## 10 Abdul Aziz,Hajr E LIFEGUARD I P04002 R&P-Recreation 2014-06-18
## AnnualSalary GrossPay YrOfSrv Agency
## 1 55314 53626.04 35.68 Office of Employment Dev
## 2 74000 73000.08 8.76 States Attorneys Office
## 3 64500 64403.84 5.94 Dept of Health
## 4 46309 59620.16 1.93 Police Department
## 5 60060 54059.60 2.16 Mayor's Office
## 6 42702 20250.80 0.59 Dept of Transportation
## 7 62175 83757.48 4.25 Fire Department
## 8 77343 92574.91 8.04 Police Department
## 9 44548 55943.29 15.82 Sheriff's Office
## 10 18408 1051.25 1.03 Recreation & Parks
Now the data is ready. Let’s find answers to our questions.
After decoding and combining sub-agencies, we now have a total of 30 identifiable agencies.
Here is a tally of employees in these agencies:
table(BmoreSalaries2015$Agency)
##
## Circuit Court City Council Civil Rights & Wage Enforce
## 141 96 11
## Comptroller's Office Council Services Dept of Finance
## 91 6 290
## Dept of Health Dept of Public Works Dept of Transportation
## 956 2369 1368
## Elections Employee's Retirement Sys Enoch Pratt Free Library
## 8 32 473
## Fire & Police Retirement Sys Admin Fire Department General Services
## 27 1615 357
## Housing & Community Dev Human Resourcese Law Department
## 384 66 108
## Legislative Reference Liquor License Board Mayor's Office
## 5 24 724
## Municipal & Zoning Appeals Office of Employment Dev Orphan's Court
## 9 162 10
## Planning Department Police Department Recreation & Parks
## 50 3097 966
## Sheriff's Office Special City Services States Attorneys Office
## 210 4 358
Let’s look at the numbers in a plot.
The Police Department, with 3097 employees, stood out to be the largest employer among different city agencies. The other agencies that employed a significant number of people are the Dept of Public Works (2369 employees), the Fire Department (1615 employees), Dept of Transportation (1368 employees), Recreation & Parks (966 employees), and the Dept of Health (956 employees).
It is easiest to answer this question by looking at the data set in descending order of GrossPay
.
BmoreSalaries2015 %>%
select(Name, JobTitle,AgencyDept, YrOfSrv, GrossPay) %>%
arrange(desc(GrossPay)) %>%
head(n=10) %>%
print.data.frame()
## Name JobTitle AgencyDept YrOfSrv GrossPay
## 1 Nalewajko Jr,Stephen C POLICE LIEUTENANT EID Police Department 33.86 235767.9
## 2 Batts,Anthony W Police Commissioner Police Department 2.76 210705.5
## 3 Camarote,Marc J POLICE SERGEANT EID Police Department 18.74 210508.7
## 4 Harris Jr,William POLICE SERGEANT Police Department 14.68 207614.8
## 5 Johnson,Rhonda A PARAMEDIC EMT-P Fire Department 25.03 205999.5
## 6 Walrath,Mark A POLICE LIEUTENANT EID Police Department 23.15 194655.8
## 7 Turrentine,Derrick J PARAMEDIC CRT Fire Department 10.62 193532.7
## 8 Raymond,Henry J Executive Director III FIN-Admin & Budgets 7.14 189892.2
## 9 Marcus Sr,Albert M POLICE OFFICER (EID) Police Department 40.40 186593.4
## 10 Parthemos,Kaliope Executive Director V Mayor's Office 8.51 183085.2
Data show that the Mayor was not among the top 10 highest paid persons in the City Government. In fact, her pay ranked 41st highest.
First let’s look at the pay distributions of the agencies.
Salary2015 <- BmoreSalaries2015 %>%
group_by(Agency) %>%
summarise(tot_pay_per_agency = sum(GrossPay, na.rm=TRUE),
n=n(), mean_pay = round(tot_pay_per_agency/n, digits=nDecimalPlace),
median_pay = median(GrossPay, na.rm=TRUE),
max_pay = max(GrossPay, na.rm=TRUE),
min_pay = min(GrossPay, na.rm=TRUE))
In the plot above, each box presents the middle 50% of gross pays received by employees in a particular agency. The horizontal cut inside the box represents median pay of the corresponding agency. The width of each box varies with respect to the number of employees serving in that agency. That is, a wider box implies more people worked in that agency in 2015. The dots above and below each box represent cases of gross pays that were beyond 1.5 times the interquartile range (the middle 50% pay range) of gross pay of that particular agency.
The following plot is another way to summarize the maximum, median, and minimum gross pays within each agency.
Since there is not sufficient data to differentiate employees who are part-time and full-time, the lowest paid employees are most likely part-time staff.
The following plot gives the ratios of highest to median pays in the agencies.
The above plot shows that the ratios of highest to median pays among different city agencies were pretty uniform (at around 3) except “Municipal & Zoning Appeals” and “Recreation & Parks”, whose ratios were more than 12. So why did these two agencies stand out since their employees weren’t in the highest paid group in the City?
It become clear as we look at the distributions of gross pays for these two agencies.
Municipal & Zoning Appeals had 9 employees in 2015, majority of whom served as members of BMZA, which were likely part-time positions. With the highest pay (likely a full-time position) at $101408.5 and median pay at $8060.78, it is unsurprising that the department ended up with a steep ratio between highest and median salaries.
The story of Recreation & Parks was similar but with a twist. The department had 966 employees in 2015. Based on its median pay, which was $10927.49, it seems like majority of these positions were still part-time. As a result, with the highest pay being $141909.71, Recreation & Park again ended up with a steep ratio between highest and median salaries.
Here we can see who the longest serving city employees were:
BmoreSalaries2015 %>%
arrange(desc(YrOfSrv)) %>%
select(Name, JobTitle, AgencyDept, YrOfSrv, HireDate) %>%
head(n=10) %>%
print.data.frame()
## Name JobTitle AgencyDept YrOfSrv HireDate
## 1 Walter,William CIVIL ENGINEERING DRAFTING SUP DPW-Water & Waste Water 63.02 1952-06-23
## 2 Wright,Howard F CONTRACT ADMINISTRATOR II DPW-Administration 54.03 1961-06-18
## 3 Whitt,Sadie E OFFICE ASST III Enoch Pratt Free Library 53.24 1962-04-03
## 4 Zemon,Wayne SURVEY TECHNICIAN IV TRANS-Highways 51.55 1963-12-10
## 5 Dixon,Joyce A OFFICE SUPERVISOR Police Department 50.42 1965-01-28
## 6 Mason,Beverly A ACCOUNTING ASST III Police Department 49.98 1965-07-08
## 7 Murdock,John HUMAN SERVICES WORKER I M-R Human Services 49.85 1965-08-23
## 8 Moreau,Edwin W Fire Lieutenant Suppression Fire Department 48.78 1966-09-19
## 9 Fisher Jr,James BUILDING PROJECT COORDINATOR General Services 48.64 1966-11-10
## 10 McCray,Brenda P OFFICE ASSISTANT III Housing & Community Dev 48.61 1966-11-21
There are different ways to answer this question because various factors affect retention rate. In the following, I use median tenure to be the retention indicator.
Tenure2015 <- BmoreSalaries2015 %>%
group_by(Agency) %>%
summarise(NumOfEmployees=n(), LongestTenure = max(YrOfSrv, na.rm=TRUE),
MedianTenure = round(median(YrOfSrv, na.rm=TRUE), digits=nDecimalPlace),
MeanTenure = round(mean(YrOfSrv, na.rm=TRUE), digits=nDecimalPlace),
ShortestTenure = min(YrOfSrv, na.rm=TRUE)) %>%
ungroup() %>%
arrange(desc(MedianTenure))
head(Tenure2015)
## Source: local data frame [6 x 6]
##
## Agency NumOfEmployees LongestTenure MedianTenure MeanTenure ShortestTenure
## (chr) (int) (dbl) (dbl) (dbl) (dbl)
## 1 Legislative Reference 5 29.36 26.56 22.02 5.99
## 2 Orphan's Court 10 30.01 13.93 14.92 0.54
## 3 Office of Employment Dev 162 38.93 13.71 15.87 0.02
## 4 Sheriff's Office 210 42.66 13.16 13.93 0.03
## 5 Police Department 3097 50.42 12.92 13.58 0.02
## 6 Fire Department 1615 48.78 12.45 14.62 0.02
Let’s look at the tenure data on a graph.
The plot above shows that “Legislative Reference” wins out for having the longest median tenure (26.56 years) among different agencies. Keep in mind though that this ranking is probably biased by the relatively few number of employees (5) in that agency.
So far in this project, I have analyzed the Baltimore City Employee Salaries FY2015 data set at Baltimore Open Data Portal. The data set recorded 14017 employees working for the City in 2015, and it provided employee information in 7 categories, which are name, job title, Agency ID, Agency, Hire Date, Annual Salary, and Gross Pay.
After cleaning and formatting the data and using existing data to calculate employee tenure, I analyzed the overall distributions of gross pay and tenure of City employees. The gross pay distribution clearly skews to the right (direction of higher salary). It has a median of $52753.13, and the middle 50% of city workers received pay between $34669.21 and $75771.01. The highest salary was $235767.92, and the lowest salary was $0. The tenure distribution also skews to the right (direction of longer tenure), which is consistent with the expectation for an organization that continues to recruit workers to replace those who had either retired or left the organization. The median tenure of City employees was 9.95 years. The longest tenure was 63.02 years. The middle 50% of people had put in 4.58 to 19.78 years working for the City.
I have also analyzed the distributions of gross pay and tenure with respect to the 30 agencies that were recorded in the data set. In this part of the analysis, I found that the Police Department, with 3097 employees, was the largest employer among different city agencies in 2015. The ratios of highest to median pays among different city agencies were around 3 except Municipal & Zoning Appeals and Recreation & Parks, whose ratios were more than 12. Further investigation showed that majority of the employees at these two agencies were part-time workers, implying that the maximum/median pay ratio was essentially comparing full-time and part-time pays. As a result, the ratios were understandably steep. In terms of tenure, Legislative Reference had the longest median tenure (26.56 years) among different agencies.