The 2022 UK Stata Conference took place in London, UK, on 8 & 9 September 2022.
The two-day international event provides Stata users from all over the world the opportunity to exchange ideas, experiences, and information on new applications of the software.
Experience what happens when new and long-time Stata users from across all disciplines gather to discuss real-world applications of Stata.
Thank you for being a part of the longest-running Stata Conference.
2022 UK Stata Conference | UCL, Torrington Place, 1-19 Torrington Place, Fitzrovia, London, WC1E 7HB, UK
Thursday, 8 September, (London Time)
10:00 - 10:15
Introduction & Welcome
10:15 - 10:45
Resultssets in resultsframes in Stata 16-plus
Roger Newson, King's College London
A resultsset is a Stata dataset created as output by a Stata command. It may be listed and/or saved in a disk file and/or written over an existing dataset in memory, and/or (in Stata Versions 16 or higher) written to a data frame (or resultsframe) in the memory, without damaging any existing data frames. Commands creating resultssets include parmest, parmby, xcontract, xcollapse, descsave, xsvmat, and xdir. Commands useful for processing resultsframes include xframeappend, fraddinby, and invdesc. We survey the ways in which resultsset processing has been changed by resultsframes.
10:45 - 11:05
A suite of Stata programs for analysing simulation studies
Ella Marley-Zagar, University College London
Simulation studies are used in a variety of disciplines to evaluate the properties of statistical methods. Simulation studies involve creating data by random sampling, typically from known probability distributions, with the aim of assessing the robustness and accuracy of new statistical techniques by comparing them to some known truth. We introduce the siman suite for the analysis of simulation results, a set of Stata programs that offer data manipulation, analysis and graphics to process, explore and visualise the results of simulation studies.
11:05 - 11:35
Cook’s distance measures for panel data models
David Vincent, David Vincent Economics
Influential observations in regression analysis, are datapoints whose deletion has a large impact on the estimated coefficients. The usual diagnostics for assessing the influence of each datapoint, are designed for least squares regression and independent observations and are not appropriate when estimating panel data models.
11:35 - 12:35
Bayesian multilevel modeling
Yulia Marchenko, StataCorp
In multilevel or hierarchical data, which include longitudinal, cross-sectional, and repeated-measures data, observations belong to different groups. Groups may represent different levels of hierarchy such as hospitals, doctors nested within hospitals, and patients nested within doctors nested within hospitals. Multilevel models incorporate group-specific effects in the regression model and assume that they vary randomly across groups according to some a priori distribution, commonly a normal distribution. This assumption makes multilevel models natural candidates for Bayesian analysis. Bayesian multilevel models additionally assume that other model parameters such as regression coefficients and variance components — variances of group-specific effects — are also random.
12:35 - 13:40
13:40 - 14:00
Bias-corrected estimation of linear dynamic panel data models
Sebastian Kripfganz, University of Exeter Business School; Jörg Breitung, University of Cologne
In the presence of unobserved group-specific heterogeneity, the conventional fixed-effects and random-effects estimators for linear panel data models are biased when the model contains a lagged dependent variable and the number of time periods is small. We present a computationally simple bias-corrected estimator with attractive finite-sample properties, which is implemented in our new xtdpdbc Stata package. The estimator relies neither on instrumental variables nor on specific assumptions about the initial observations. Because it is a method-of-moments estimator, standard errors are readily available from asymptotic theory. Higher-order lags of the dependent variable can be accommodated as well.
14:00 - 14:30
Impact of proximity to gas production activity on birth outcomes across the US
Christopher F. Baum,
Hailee Schuele, Philip J. Landrigan, Summer Sherburne Hawkins, Boston College
Despite mounting evidence on the health effects of natural gas development (NGD), including hydraulic fracturing (“fracking”), existing research has been constrained to high-producing states, limiting generalizability. We examined the impacts of prenatal exposure to NGD production activity in all gas-producing US states on birth outcomes overall and by race/ethnicity. Mata routines were developed to link 185,376 NGD production facilities in 28 US states and their distance-weighted monthly output with county population centroids via geocoding. These data were then merged with 2005–2018 county-level microdata natality files on 33,849,409 singleton births from 1,984 counties in 28 states, using nine-month county-level averages of NGD production by both conventional and unconventional production methods, based on month/year of birth.
14:30 - 15:00
Estimating Compulsory Schooling Impacts on Labour Market
Erendira Leon Bravo, University of Westminster
This study estimates the impacts on labour market outcomes of the 1993 compulsory schooling reform in Mexico. A well-known problem in this analysis is the endogeneity between schooling and labour market outcomes due to unobservable characteristics that could jointly determine them. There is also heterogeneity in the empirical evidence of the effectiveness of such schooling policies among developing and developed countries perhaps due to the different contexts and identification strategies used. Some studies use Instrumental Variables (IV) and Difference in differences (D-i-D) methods to tackle endogeneity issues. Most analyses use a Regression Discontinuity Design (RDD) approach with different order polynomial of the year of birth (i.e., cubic or quartic order), whereas few studies use months of birth for more accurate and robust estimates as it allows more schooling variation within a year.
15:00 - 15:30
Coffee & Tea Break
15:30 - 16:00
Bias Adjusted Three Step Latent Class Analysis using R and the gsem Command in Stata
Daniel Tompsett and Bianca De Stavola, UCL, UK
In this presentation, we will describe a means to perform bias adjusted latent class analysis using three step methodology. This method is often performed using MPLUS, LATENT GOLD, or specific functions in Stata. Here we will describe a novel means to perform this analysis using the poLCA package in R to perform the first two steps, and the gsem command in Stata to perform the third step.
16:00 - 16:30
Distributed Lag Non-Linear Models (DLNMs) in Stata
Aurelio Tobias, Ben Armstrong, Antonio Gasparrini Spanish Research Council (CSIC), Barcelona, Spain, and LSHTM, London, UK
The distributed lag non-linear models (DLNMs) represent a modelling framework to flexibly describe associations showing potentially non-linear and delayed effects in time-series data. This methodology rests on the definition of a crossbasis, a bi-dimensional functional space combining two sets of basis functions, which specify the relationships in the dimensions of predictor and lags, respectively. DLNMs have been widely used in environmental epidemiology to investigate the short-term associations between environmental exposures, such as weather variables or air pollution, and health outcomes, such as mortality counts or disease-specific hospital admissions.
16:30 - 17:15
Advanced Data visualizations with Stata: Part III
Asjad Naqvi, Austrian Institute for Economic Research (WIFO), International Institute for Applied Systems Analysis (IIASA), Vienna University of Economics and Business (WI)
The presentation will showcase recent developments in complex data visualizations with Stata. These include various types of polar plots, for example, spider plots, sunburst charts, circular bar graphs, and various visualizations with spatial data, including bi-variate maps, gridded waffle charts, and map clippings. Updates for several Stata packages including joyplot, bimap, streamplot, and clipgeo will be presented and suggestions for improving Stata’s graph capabilities will be discussed.
Friday, 9 September
9:00 - 9:10
Welcome & Tribute to Nick Cox
9:10 - 9:40
Grinding axes: Axis scales, labels and ticks
Nick Cox, Durham University
This is a round-up of not quite utterly obvious tips and tricks for graph axes, using both official and community-contributed commands. Ever needed a logarithmic scale but found default labels undesirable?
a slightly non-standard scale such as logit, reciprocal or root?
a tick to be suppressed?
labels between ticks, not at them?
automagic choice of “nice” labels under your control?
Community-contributed commands mentioned will include mylabels, myticks, nicelabels, niceloglabels, qplot and transplot.
9:40 - 10:00
Exchangeably weighted bootstrap schemes
Philippe van Kerm, LISER and University of Luxembourg
The exchangeably weighted bootstrap is one of the many variants of bootstrap resampling schemes. Rather than directly drawing observations with replacement from the data, weighted bootstrap schemes generate vectors of replication weights to form bootstrap replications. Various ways to generate the replication weights can be adopted and some choices bring practical computational advantages. This talk demonstrates how easily such schemes can be implemented and where they are particularly useful, and introduces the exbsample command which facilitates their implementation.
10:00 - 10:30
Improving fitting and predictions for flexible parametric survival models
Paul Lambert, University of Leicester, UK and Karolinska Institutet, Sweden
Flexible parametric survival models have been available in Stata since 2000 with Patrick Royston’s stpm command. I developed stpm2 in 2008 which added various extensions. However, the command is old and does not take advantage of some of the features Stata has added over the years.
10:30 - 11:00
Coffee & Tea Break
11:00 - 11:30
sttex – a new dynamic document command for Stata and LATEX
Ben Jann, University of Bern
In this talk, I will introduce a new command for processing a dynamic LATEX document in Stata, i.e., a document containing both LATEX paragraphs and Stata code. A key feature of the new command is that it tracks changes in the Stata code and executes the code only when needed, allowing for an efficient workflow. The command is useful for creating automated statistical reports, writing articles with data analysis, preparing slides for a methods course or a conference talk, or even writing a complete textbook with examples of applications.
11:30 - 12:30
Custom estimation tables
Jeff Pitblado, StataCorp
This presentation illustrates how to construct custom tables from one or more estimation commands.
I begin with a description of what constitutes a collection and how items (numeric and string results) in a collection are tagged (identified) and conclude with a simple workflow to enable users to build their own custom tables from estimation commands. This presentation motivates the construction of estimation tables and concludes with the convenience command etable.
12:30 - 1:30
13:30 - 14:00
The Impact of a Government Pay Reform in Mexico on the Public Sector Wage Gap
Erendira Leon Bravo,
University of Westminster; and Barry Reilly, University of Sussex
The 2018 Federal Pay Reform on the Remuneration of Public Servants in Mexico is used to exploit its impacts on the public-private sector wage gap across the unconditional wage distribution in a developing country context. This policy uses both payment cuts and freezes for public sector workers.
14:00 - 14:30
Illuminating the factor and dependence structure in large panel models
Jan Ditzen, Free University of Bozen-Bolzano
In panel models a precise understanding about the number of common factors and dependence across the cross-sectional dimension is key for any applied work. This talk will give an overview about how to estimate the number of common factors and how to test for cross-sectional dependence. It does so by presenting two community contribute commands: xtnumfac and xtcd2. xtnumfac implements 10 different methods to estimate the number of factors, among them the popular methods by Bai & Ng (2002) and Ahn & Horenstein (2013). The degree of cross-section dependence is investigated using xtcd2. xtcd2 allows implements three different tests for cross-section dependence, based on Pesaran (2015), Juodis & Reese (2021) and Pesaran & Xie (2021). The talk includes a review of the theory, a discussion of the commands and empirical examples.
14:30 - 15:00
mixrandregret: A command for fitting mixed random regret minimization models using Stata
Álvaro A. Gutiérrez-Vargas,
Ziyue Zhu s & Martina Vandebroek.
Research Centre for Operation Research and Statistics (ORSTAT), KU Leuven.
This presentation describes the mixrandregret command, which extends the randregret command (Gutiérrez-Vargas, Meulders & Vandebroek, 2021, The Stata Journal 21(3), 626-658), incorporating random coefficients for random regret minimization (RRM) models. The command can fit a mixed version of the classic RRM model introduced in Chorus (2010, European Journal of Transport and Infrastructure Research 10: 181–196). It allows the user to specify a combination of fixed and random coefficients. In addition, the users can specify normal and log-normal distributions for the random coefficients using the commands’ options. Finally, the models are 11 estimated using Simulated Maximum Likelihood procedures using numerical integration to simulate the models’ choice probabilities.
15:00 - 15:30
Coffee & Tea Break
15:30 - 16:30
Panel discussion with StataCorp developers
Take part in the longest-running Stata Conference:
Timberlake has been successfully distributing Stata to customers since 1985 and has forty years of experience in providing expert solutions in all fields of data science, statistics and econometrics. Timberlake offers all customers fast, professional, and knowledgeable service and can meet the support requirements of all levels of Stata users through purchase, installation, and technical support as well as through our extensive schedule of the classroom, onsite, and online training courses that we offer globally.
Timberlake is the Stata distributor to the United Kingdom and Ireland, France, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.