The ds command in Stata

The ds command is described as a “hidden gem” in Tip 66 of 119 Stata Tips (3rd edition). Its function is to list variables matching name patterns or other characteristics. For example, we may want to confirm whether a variable has string, integer, or numeric values. Let’s use the ds command to look at some of the variables from the auto dataset.

First, we can use ds to list all the variable to screen. For this purpose, I find ds to be far simpler than other commands like describe or codebook.

. sysuse auto, clear
(1978 Automobile Data)

. ds
make          mpg           headroom      weight        turn          gear_ratio
price         rep78         trunk         length        displacement  foreign

Let’s look at the first 10 values of the make variable.

. list make in 1/10

     +---------------+
     | make          |
     |---------------|
  1. | AMC Concord   |
  2. | AMC Pacer     |
  3. | AMC Spirit    |
  4. | Buick Century |
  5. | Buick Electra |
     |---------------|
  6. | Buick LeSabre |
  7. | Buick Opel    |
  8. | Buick Regal   |
  9. | Buick Riviera |
 10. | Buick Skylark |
     +---------------+

It seems like make is a string variable; let’s confirm this with the has(type string) option. If it is a string variable, then ds will list the variable name to screen and save it in r(varlist).

. ds make, has(type string)
make

. return list

macros:
            r(varlist) : "make"

Since make isn’t numeric it is neither listed to screen nor saved in r(varlist).

. ds make, has(type numeric)

. return list

Importantly, the r(varlist) result from the ds call can be passed to other commands. Let’s say we want to summarize all variables that are integers. We first run the ds command and then pass r(varlist) as an argument to summarize.

. qui ds , has(type int)

. summarize `r(varlist)', sep(10)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
       rep78 |         69    3.405797    .9899323          1          5
       trunk |         74    13.75676    4.277404          5         23
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
        turn |         74    39.64865    4.399354         31         51
displacement |         74    197.2973    91.83722         79        425

Note that `r(varlist)’ follows local macro rules (type help macro).

I have shown some basic examples, but how can we use ds to improve our data workflow? Here is a situation I often encounter. Every six months I download updated .dta datasets from a research organization; however, the updated datasets have inconsistent variable formats. Sometimes a date may be saved in the string format (eg, “2013-06-23”), or the %td format (eg, 23june2013), or the %tc format (eg, 23june2013 00:00:00). It would be useful to automate the transformation of date variables into a standard format, say %td. For this purpose, I wrote a fname.ado program that uses ds to identify the date format of a variable and perform the relevant transformation to %td. Here is the program below:

program fdate
version 12
syntax varlist(min=1) [, sfmt(string)]

foreach var of local varlist {
  tempvar tvar 
  qui ds `var', has(format %td)
  if "`r(varlist)'" != "" {
    dis as text _n "`var' is already in Stata %td format."
    exit
  }
  qui ds `var', has(type string)
  if "`r(varlist)'" != "" {
    if "`sfmt'"=="" {
      dis in red _n ". `var' is in string format, please provide sfmt argument." 
      exit
    }
    gen `tvar'= date(`var', "`sfmt'")
  }
  qui ds `var', has(format %tc)
  if "`r(varlist)'" != "" {
    qui gen double `tvar' = dofc(`var')
  }
  capture confirm `tvar'
  if !_rc {
    di in red ". `var' cannot be processed by fdate program. "
    exit
  }
  format `tvar' %td
  nobreak {
    dis as text "Check if transformation of `var' is correct..."
    preserve
      qui drop if missing(`var')
      local Count = cond(_N <= 10, _N, 10)
      list `var' `tvar' in 1/`Count', abbrev(20) 
    restore
    if !missing(`var') & missing(`tvar') {
      dis in red _n "Warning: potential missing values when there should be date values"
      exit
    }
    drop `var'
    rename `tvar' `var'
  }
}

end
exit

Let’s look at the code in blocks. In the first block ds checks if the variable is in the %td date format and prints a message if it is.

qui ds `var', has(format %td)
if "`r(varlist)'" != "" {
  dis as text _n "`var' is already in Stata %td format."
  exit
}

In the second block ds checks if the variable is a string. If so, it is transformed into a date variable using the date command. Because the string value for the date could have been saved in several different ways, the user is asked to provide a mask argument for translation (type help datetime translation).

qui ds `var', has(type string)
if "`r(varlist)'" != "" {
  if "`sfmt'"=="" {
    dis in red _n ". `var' is in string format, please provide sfmt argument." 
    exit
  }
  gen `tvar'= date(`var', "`sfmt'")
}

In the third block ds checks if the variable is in the %tc datetime format. If so, it is transformed into the date format using the dofc command (type help dofc).

qui ds `var', has(format %tc)
if "`r(varlist)'" != "" {
  qui gen double `tvar' = dofc(`var')
}

If neither of these conditions are met, the code prints a message and exits the program. (Note that additional conditions could be included here to suit your needs).

capture confirm `tvar'
if !_rc {
  di in red ". `var' cannot be processed by fdate program. "
  exit
}

The code does an informal check on the transformation and prints a few of the old and transformed values to screen.

  nobreak {
    dis as text "Check if transformation of `var' is correct..."
    preserve
      qui drop if missing(`var')
      local Count = cond(_N <= 10, _N, 10)
      list `var' `tvar' in 1/`Count', abbrev(20) 
    restore
    if !missing(`var') & missing(`tvar') {
      dis in red _n "Warning: potential missing values when there should be date values"
      exit
    }

Below is a demonstration of fdate.

. use https://www.ssc.wisc.edu/sscc/pubs/files/dates.dta, clear

. list dateString* beginning ending, abbrev(20)

     +------------------------------------------------------------------------------+
     |      dateString1   dateString2           dateString3   beginning      ending |
     |------------------------------------------------------------------------------|
  1. | November 3, 2010     11/3/2010   2010-11-03 08:35:12   12nov1998   22apr2006 |
     +------------------------------------------------------------------------------+

. fdate dateString1, sfmt("MDY")
Check if transformation of dateString1 is correct...

     +------------------------------+
     |      dateString1    __000000 |
     |------------------------------|
  1. | November 3, 2010   03nov2010 |
     +------------------------------+

. fdate dateString2, sfmt("MDY")
Check if transformation of dateString2 is correct...

     +-------------------------+
     | dateString2    __000000 |
     |-------------------------|
  1. |   11/3/2010   03nov2010 |
     +-------------------------+

. fdate dateString3, sfmt("YMD")
(1 missing value generated)
Check if transformation of dateString3 is correct...

     +--------------------------------+
     |         dateString3   __000000 |
     |--------------------------------|
  1. | 2010-11-03 08:35:12          . |
     +--------------------------------+

Warning: potential missing values when there should be date values

. fdate dateString3, sfmt("YMD###")
Check if transformation of dateString3 is correct...

     +---------------------------------+
     |         dateString3    __000000 |
     |---------------------------------|
  1. | 2010-11-03 08:35:12   03nov2010 |
     +---------------------------------+

. fdate beginning ending

beginning is already in Stata %td format

ending is already in Stata %td format

. list dateString* beginning ending, abbrev(20)

     +-----------------------------------------------------------------+
     | dateString1   dateString2   dateString3   beginning      ending |
     |-----------------------------------------------------------------|
  1. |   03nov2010     03nov2010     03nov2010   12nov1998   22apr2006 |
     +-----------------------------------------------------------------+

I have shown some examples of the ds command. It can be used in many other situations, as described in the documentation. Each time you wish to return the name of a variable, and use this result in a subsequent command, reach for ds.

Alain Vandormael
Alain Vandormael
Senior Data Scientist, PhD, MSc