The ds command in Stata
The ds
command is described as a “hidden gem” in Tip 66 of 119 Stata Tips (3rd edition). Its function is to list variables matching name patterns or other characteristics. For example, we may want to confirm whether a variable has string, integer, or numeric values. Let’s use the ds
command to look at some of the variables from the auto
dataset.
First, we can use ds
to list all the variable to screen. For this purpose, I find ds
to be far simpler than other commands like describe
or codebook
.
. sysuse auto, clear (1978 Automobile Data) . ds make mpg headroom weight turn gear_ratio price rep78 trunk length displacement foreign
Let’s look at the first 10 values of the make
variable.
. list make in 1/10 +---------------+ | make | |---------------| 1. | AMC Concord | 2. | AMC Pacer | 3. | AMC Spirit | 4. | Buick Century | 5. | Buick Electra | |---------------| 6. | Buick LeSabre | 7. | Buick Opel | 8. | Buick Regal | 9. | Buick Riviera | 10. | Buick Skylark | +---------------+
It seems like make
is a string variable; let’s confirm this with the has(type string)
option.
If it is a string variable, then ds
will list the variable name to screen and save it in r(varlist)
.
. ds make, has(type string) make . return list macros: r(varlist) : "make"
Since make
isn’t numeric it is neither listed to screen nor saved in r(varlist)
.
. ds make, has(type numeric) . return list
Importantly, the r(varlist)
result from the ds
call can be passed to other commands. Let’s say we want to summarize all variables that are integers. We first run the ds
command and then pass r(varlist)
as an argument to summarize
.
. qui ds , has(type int) . summarize `r(varlist)', sep(10) Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 rep78 | 69 3.405797 .9899323 1 5 trunk | 74 13.75676 4.277404 5 23 weight | 74 3019.459 777.1936 1760 4840 length | 74 187.9324 22.26634 142 233 turn | 74 39.64865 4.399354 31 51 displacement | 74 197.2973 91.83722 79 425
Note that `r(varlist)’
follows local
macro rules (type help macro
).
I have shown some basic examples, but how can we use ds
to improve our data workflow? Here is a situation I often encounter. Every six months I download updated .dta
datasets from a research organization; however, the updated datasets have inconsistent variable formats. Sometimes a date may be saved in the string format (eg, “2013-06-23”), or the %td
format (eg, 23june2013), or the %tc
format (eg, 23june2013 00:00:00). It would be useful to automate the transformation of date variables into a standard format, say %td
. For this purpose, I wrote a fname.ado
program that uses ds
to identify the date format of a variable and perform the relevant transformation to %td
. Here is the program below:
program fdate
version 12
syntax varlist(min=1) [, sfmt(string)]
foreach var of local varlist {
tempvar tvar
qui ds `var', has(format %td)
if "`r(varlist)'" != "" {
dis as text _n "`var' is already in Stata %td format."
exit
}
qui ds `var', has(type string)
if "`r(varlist)'" != "" {
if "`sfmt'"=="" {
dis in red _n ". `var' is in string format, please provide sfmt argument."
exit
}
gen `tvar'= date(`var', "`sfmt'")
}
qui ds `var', has(format %tc)
if "`r(varlist)'" != "" {
qui gen double `tvar' = dofc(`var')
}
capture confirm `tvar'
if !_rc {
di in red ". `var' cannot be processed by fdate program. "
exit
}
format `tvar' %td
nobreak {
dis as text "Check if transformation of `var' is correct..."
preserve
qui drop if missing(`var')
local Count = cond(_N <= 10, _N, 10)
list `var' `tvar' in 1/`Count', abbrev(20)
restore
if !missing(`var') & missing(`tvar') {
dis in red _n "Warning: potential missing values when there should be date values"
exit
}
drop `var'
rename `tvar' `var'
}
}
end
exit
Let’s look at the code in blocks. In the first block ds
checks if the variable is in
the %td
date format and prints a message if it is.
qui ds `var', has(format %td)
if "`r(varlist)'" != "" {
dis as text _n "`var' is already in Stata %td format."
exit
}
In the second block ds
checks if the variable is a string. If so, it is transformed
into a date variable using the date
command. Because the string value for the date
could have been saved in several different ways, the user is asked to provide a mask argument for
translation (type help datetime translation
).
qui ds `var', has(type string)
if "`r(varlist)'" != "" {
if "`sfmt'"=="" {
dis in red _n ". `var' is in string format, please provide sfmt argument."
exit
}
gen `tvar'= date(`var', "`sfmt'")
}
In the third block ds
checks if the variable is in the %tc
datetime format. If so, it is transformed into the date format using the dofc
command (type help dofc
).
qui ds `var', has(format %tc)
if "`r(varlist)'" != "" {
qui gen double `tvar' = dofc(`var')
}
If neither of these conditions are met, the code prints a message and exits the program. (Note that additional conditions could be included here to suit your needs).
capture confirm `tvar'
if !_rc {
di in red ". `var' cannot be processed by fdate program. "
exit
}
The code does an informal check on the transformation and prints a few of the old and transformed values to screen.
nobreak {
dis as text "Check if transformation of `var' is correct..."
preserve
qui drop if missing(`var')
local Count = cond(_N <= 10, _N, 10)
list `var' `tvar' in 1/`Count', abbrev(20)
restore
if !missing(`var') & missing(`tvar') {
dis in red _n "Warning: potential missing values when there should be date values"
exit
}
Below is a demonstration of fdate
.
. use https://www.ssc.wisc.edu/sscc/pubs/files/dates.dta, clear . list dateString* beginning ending, abbrev(20) +------------------------------------------------------------------------------+ | dateString1 dateString2 dateString3 beginning ending | |------------------------------------------------------------------------------| 1. | November 3, 2010 11/3/2010 2010-11-03 08:35:12 12nov1998 22apr2006 | +------------------------------------------------------------------------------+ . fdate dateString1, sfmt("MDY") Check if transformation of dateString1 is correct... +------------------------------+ | dateString1 __000000 | |------------------------------| 1. | November 3, 2010 03nov2010 | +------------------------------+ . fdate dateString2, sfmt("MDY") Check if transformation of dateString2 is correct... +-------------------------+ | dateString2 __000000 | |-------------------------| 1. | 11/3/2010 03nov2010 | +-------------------------+ . fdate dateString3, sfmt("YMD") (1 missing value generated) Check if transformation of dateString3 is correct... +--------------------------------+ | dateString3 __000000 | |--------------------------------| 1. | 2010-11-03 08:35:12 . | +--------------------------------+ Warning: potential missing values when there should be date values . fdate dateString3, sfmt("YMD###") Check if transformation of dateString3 is correct... +---------------------------------+ | dateString3 __000000 | |---------------------------------| 1. | 2010-11-03 08:35:12 03nov2010 | +---------------------------------+ . fdate beginning ending beginning is already in Stata %td format ending is already in Stata %td format . list dateString* beginning ending, abbrev(20) +-----------------------------------------------------------------+ | dateString1 dateString2 dateString3 beginning ending | |-----------------------------------------------------------------| 1. | 03nov2010 03nov2010 03nov2010 12nov1998 22apr2006 | +-----------------------------------------------------------------+
I have shown some examples of the ds
command. It can be used in many other situations, as described in the documentation. Each time you wish to return the name of a variable, and use this result in a subsequent command, reach for ds
.