## Principles of date formatting

Dates are fundamental. Dates come in many formats. One of my biggest bug bears are dates. As dull as it sounds, formatting dates correctly is a fundamental part of data cleaning in healthcare. This is especially true if your data objects are event-driven (which is to say you collect events that happen at time-points such as the time a patient enters an endoscopy room or the day the endoscopy was done etc.). In addition, the correct and standardised formatting of dates will allow you to merge data from separate data sources and still understand a unified sequence of events. Eventing, as I call it, is fundamental to understanding processes and its inefficiences.

Given it is so important, perhaps the correct formatting of dates is not so boring after all.

### Date formats:

Dates come in may forms, but in R the main formats are as follows:

1.POSIXct or POSIXlt - stores dates and times and can manipulate timezones.

2.Date - date without times.

3.Character.

4.Numeric (the usual way it is stored when imported from Excel and refers to the number of days or seconds from an origin date).

5.On top of the above, dates even within the same format can be of different combinations such as the day or the month being first, or an abbreviated year etc.

Most of the time when importing, the dates will be in character format, but not always.

## Conversion of dates

The finest level of granularity I usually require is day. This means that the as.Date function, part of base R, is usually sufficient Occasionally I need hours which requires a slightly different approach. Below are some examples:

Text to Date

dates <- c("05/22/80", "07/06/01")
#Remember to specify the format of the date in the text otherwise it defaults to yyyy-mm-dd
betterDates <- as.Date(dates, format = "%B/%d/%Y")
betterDates
## [1] NA NA
#Or with the date time format
df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
newDate<-as.Date(df$Date, "%m/%d/%Y %H:%M:%S") newDate ## [1] "2009-10-09" "2009-10-15" Numeric to Date (eg from Excel) This normally requires remembering to specify an origin as the number is usually the number of days from that origin: dates <- c(30899, 38567) NewDates <- as.Date(dates, origin = "1899-12-30") POSIXct to Date mydate<-c("2013-01-01 07:00") theDatesinPOSIXct<-as.Date(as.POSIXct(mydate)) NewDates<-as.Date(theDatesinPOSIXct) Date to character # convert dates to character data #chrDate <- as.character(someDate) ## Extracting parts of a date Often dates come as the date with time in hours, mins and seconds. Other times you just want the month of a date, or the year. This is where the ‘lubridate’ package can be so useful as follows. Note, lubridate doesnt take text but can take anything else such as: POSIXct, POSIXlt, Date, Period, chron, yearmon, yearqtr, zoo, zooreg, timeDate, xts, its, ti, jul, timeSeries, and fts objects library(lubridate) some_date <- c("01/02/1979", "03/04/1980") month(as.POSIXlt(some_date, format="%d/%m/%Y")) ## [1] 2 4 #The same thing can be done with: day, month , year, hour, minute, second and many others. See https://rpubs.com/davoodastaraky/lubridate What to do with mixed dates If you are using many data sources, the dates can be very painful to standardise. In reality most will not have H:M:S associated with the date so I would standardise all the dates to %d_%m_%Y. Note I don’t use “/” as the forward slash can be treated oddly in text extraction and particularly regular expressions. A particularly interesting package is ‘anytime’. This claims to be able to take any date and convert it into a date format. This returns a POSIXct object (or a date object if anyDate is used instead of anytime). This can be seen here: https://cran.r-project.org/web/packages/anytime/anytime.pdf ## Time Series objects Time series analysis is worth exploring particularly if you have very numerical data. This type of analysis is often used for the assessment of financial data. The time series object is xts and essentially organises your data with the date as the label of each row. The limiting factor I have found is the ability to group the time series according to a non numerical variable For example if I want to split my data up according to type of endoscopic procedure performed I cannot use intrinsic time series objects to do this. It is however possible as seen on the page: Grouping by dates: so to create a timSeries object you should use the package ‘xts’ as follows library(kableExtra) #input data proc<-sample(c("EMR","RFA","Biopsies"), 100, replace = TRUE) #Sample dates dat<-sample(seq(as.Date('2013/01/01'), as.Date('2017/05/01'), by="day"), 100) #Generate 20 hospital numbers in no particular order: HospNum_Id<-sample(c("P433224","P633443","K522332","G244224","S553322","D0739033","U873352","P223333","Y763634","I927282","P223311","P029834","U22415","U234252","S141141","O349253","T622722","J322909","F630230","T432452"), 100, replace = TRUE) rndm<-sample(seq(0,40),100,replace=T) df<-data.frame(proc,dat,HospNum_Id,rndm) df$proc<-as.character(df$proc) library(xts) Myxts<-xts(df, order.by=df$dat)
kable(head(Myxts,25))
proc dat HospNum_Id rndm
Biopsies 2013-01-23 F630230 38
Biopsies 2013-01-24 D0739033 13
Biopsies 2013-02-02 J322909 25
Biopsies 2013-02-09 D0739033 5
Biopsies 2013-02-24 S141141 18
EMR 2013-02-28 U22415 38
Biopsies 2013-03-12 T622722 26
RFA 2013-03-20 T622722 39
Biopsies 2013-03-23 Y763634 9
Biopsies 2013-05-18 Y763634 26
Biopsies 2013-05-22 P633443 37
EMR 2013-05-24 P029834 19
Biopsies 2013-07-08 G244224 38
RFA 2013-07-19 P223311 36
EMR 2013-07-28 D0739033 23
EMR 2013-08-02 T432452 1
Biopsies 2013-09-10 P433224 29
RFA 2013-09-16 F630230 36
Biopsies 2013-10-02 U234252 29
Biopsies 2013-10-10 K522332 25
EMR 2013-11-01 U22415 14
Biopsies 2013-11-29 P029834 5
EMR 2013-12-19 U22415 3
RFA 2014-01-03 T622722 22
Biopsies 2014-01-29 U22415 5

So you will notice that the dataframe is now called a xts object (get this by typing str(Myxts) into the console) and that the rows are organised with date being used as the index. More on how to analyse this in the Data Analysis section