Exploring Research Frontiers in Conemporary Statistics and Econometrics, ed. by Ingrid van Keilegom and Paul W. Wilson, Berlin: SpringerVerlag, 2011. Click here to see the table of contents. 

All papers stored here are in Portable Document Format (.pdf); use Adobe Acrobat Reader to view the papers.
When faced with multiple inputs $X\in\real^p_+$ and outputs $Y\in\real^q_+$, traditional quantile regression of $Y$ conditional on $X=x$ for measuring economic efficiency in the output (input) direction is thwarted by the absence of a natural ordering of Euclidean space for dimensions $q$ ($p$) greater than one. Daouia and Simar (2007) used nonstandard conditional quantiles to address this problem, conditioning on $Y\ge y$ ($X\le x$) in the output (input) orientation, but the resulting quantiles depend on the a priori chosen direction. This paper uses a dimensionless transformation of the $(p+q)$dimensional production process to develop an alternative formulation of distance from a realization of $(X,Y)$ to the efficient support boundary, motivating a new, unconditional quantile frontier lying inside the joint support of $(X,Y)$, but near the full, efficient frontier. The interpretation is analogous to univariate quantiles and corrects some of the disappointing properties of the conditional quantilebased approach. By contrast with the latter, our approach determines a unique partialquantile frontier independent of the chosen orientation (input, output, hyperbolic or directional distance). We prove that both the resulting efficiency score and its estimator share desirable monotonicity properties. Simple arguments from extremevalue theory are used to derive the asymptotic distributional properties of the corresponding empirical efficiency scores (both full and partial). The usefulness of the quantiletype estimator is shown from an infinitesimal and global robustness theory viewpoints via a comparison with the previous conditional quantilebased approach. A diagnostic tool is developed to find the appropriate quantileorder; in the literature to date, this trimming order has been fixed a priori. The methodology is used to analyze the performance of U.S. credit unions, where outliers are likely to affect traditional approaches.
This paper uses nonparametric methods and some new results on hypothesis testing with nonparametric efficiency estimators and applies these to analyze the effect of locallyavailable high performance computing (HPC) resources on universities' efficiency in producing research and other outputs. We find that locallyavailable HPC resources enhance the technical efficiency of research output in Chemistry, Civil Engineering, Physics, and History, but not in Computer Science, Economics, nor English; we find mixed results for Biology. Our research results provide a critical first step in a quantitative economic model for investments in HPC.
A rich theory of production and analysis of productive efficiency has developed since pioneering work by Tjalling Koopmans and George Debreu in the 1950s. Michael J. Farrell published the first empirical study, and it appeared in a statistical journal (JRSS), even though the paper provided no statistical theory. The literature in econometrics, management sciences, operations research and mathematical statistics has since been enriched by hundreds of papers trying to develop or implement new tools for analyzing productivity and efficiency of firms. Both parametric and nonparametric approaches have been proposed. The mathematical challenge is to derive estimators of production, cost, revenue, or profit frontiers which represent, in the case of production frontiers, the optimal loci of combinations of inputs (like labor, energy, capital, etc.) and outputs (the products or services produced by the firms). Optimality is defined in terms of various economic considerations. Then the efficiency of a particular unit is measured by its distance to the estimated frontier. The statistical problem can be viewed as the problem of estimating the support of a multivariate random variable, subject to some shape constraints, in multiple dimensions. These techniques are applied in thousands of papers in the economic and business literature. This ``Guided Tour'' reviews the development of various nonparametric approaches since Farrell's early work. Remaining challenges and open issues in this challenging arena are also described.
Data envelopment analysis (DEA) and free disposal hull (FDH) estimators are widely used to estimate efficiencies of production units. In applications, both efficiency scores for individual units as well as average efficiency scores are typically reported. While several bootstrap methods have been developed for making inference about the efficiencies of individual units, until now no methods have existed for making inference about mean efficiency levels. This paper shows that standard central limit theorems do not apply in the case of means of DEA or FDH efficiency scores due to the bias of the individual scores, which is of larger order than either the variance or covariances among individual scores. The main difficulty comes from the fact that such statistics depend on efficiency estimators evaluated at random points. Here, new central limit theorems are developed for means of DEA and FDH scores, and their efficacy for inference about mean efficiency levels is examined via Monte Carlo experiments.
Nonparametric estimators are widely used to estimate the productive efficiency of firms and other organizations, but often without any attempt to make statistical inference. Recent work has provided statistical properties of these estimators as well as methods for making statistical inference, and a link between frontier estimation and extreme value theory has been established. New estimators that avoid many of the problems inherent with traditional efficiency estimators have also been developed; these new estimators are robust with respect to outliers and avoid the wellknown curse of dimensionality. Statistical properties, including asymptotic distributions, of the new estimators have been uncovered. Finally, several approaches exist for introducing environmental variables into production models; both twostage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined.
Advances in informationprocessing technology have eroded the advantages of small scale and proximity to customers that traditionally enabled small lenders to thrive. Nonetheless, the membership and market share of U.S. credit unions have increased, though their average size has also risen. We investigate changes in the efficiency and productivity of U.S. credit unions during 19892006 by benchmarking the performance of individual firms against an estimated orderα quantile lying "near" the efficient frontier. We construct a cost analog of the Malmquist productivity index, which we decompose to estimate changes in cost and scale efficiency, and changes in technology. We find that costproductivity fell on average across all credit unions but especially among smaller credit unions. Smaller credit unions confronted a shift in technology that increased the minimum cost required to produce given amounts of output. All but the largest credit unions also became less scale efficient over time.
In productivity and efficiency analysis, the technical efficiency of a production unit is measured through its distance to the efficient frontier of the production set. The most familiar nonparametric methods use FarrellDebreu, Shephard, or hyperbolic radial measures. These approaches require that inputs and outputs be nonnegative, which can be problematic when using financial data. Recently, Chambers et al. (1996) have introduced directional distance functions which can be viewed as additive (rather than multiplicative) measures efficiency. Directional distance functions are not restricted to nonnegative input and output quantities; in addition, the traditional input and outputoriented measures are nested as special cases of directional distance functions. Consequently, directional distances provide greater flexibility. However, until now, only free disposal hull (FDH) estimators of directional distances (and their conditional and robust extensions) have known statistical properties (Simar and Vanhems, 2012). This paper develops the statistical properties of directional d estimators, which are especially useful when the production set is assumed convex. We first establish that the directional data envelopment analysis (DEA) estimators share the known properties of the traditional radial DEA estimators. We then use these properties to develop consistent bootstrap procedures for statistical inference about directional distance, estimation of confidence intervals, and bias correction. The methods are illustrated in some empirical examples.
This paper presents new, fully nonparametric estimates of rayscale and expansionpath scale economies for U.S. banks based on a model of bank costs. Unlike prior studies that use models with restrictive parametric assumptions or limited samples, our methodology uses local polynomial estimators and data on all U.S. banks over the period 19842006. Our estimates indicate that as recently as 2006, most U.S. banks faced increasing returns to scale, suggesting that scale economies are a plausible (but not necessarily only) reason for the growth in average bank size and that the tendency toward increasing scale is likely to continue unless checked by government intervention.
U.S. credit unions serve 93 million members, hold 10 percent of U.S. savings deposits, and make 13.2 percent of all nonrevolving consumer loans. Since 1985, the share of U.S. depository institution assets held by credit unions has nearly doubled, and the average (inflationadjusted) size of credit unions has increased over 600 percent. We use a locallinear estimator, dimesion reduction techniques, and bootstrap methods to estimate and make inference about rayscale and expansionpath scale economies. We find substantial evidence of increasing returns to scale among credit unions of all sizes, suggesting that further consolidation and growth among credit unions are likely.
A hyperbolic measure of technical efficiency was proposed by Färe et al. (1985) wherein efficiency is measured by the simultaneous maximum, feasible reduction in input quantities and increase in output quantities. In cases where returns to scale are not constant, the nonparametric data envelopment analysis (DEA) estimator of hyperbolic efficiency cannot be written as a linear program; consequently, the measure has not been used in empirical studies except where returns to scale are constant, allowing the estimator to be computed by linear programming methods. This paper develops an alternative estimator of the hyperbolic measure proposed by Färe et al. (1985). Statistical consistency and rates of convergence are established for the new estimator. A numerical procedure allowing computation of the original estimator is provided, and this estimator is also shown to be consistent, with the same rate of convergence as the new estimator. In addition, an unconditional, hyperbolic orderm efficiency estimator is developed by extending the ideas of Cazals et al. (2002). Asymptotic properties of this estimator are also given.
This paper examines the widespread practice where data envelopment analysis (DEA) efficiency estimates are regressed on some environmental variables in a secondstage analysis. In the literature, only two statistical models have been proposed in which secondstage regressions are welldefined and meaningful. In the model considered by Simar and Wilson (2007), truncated regression provides consistent estimation in the second stage, where as in the model proposed by Banker and Natarajan (2008), ordinary least squares (OLS) provides consistent estimation. This paper examines, compares, and contrasts the very different assumptions underlying these two models, and makes clear that secondstage OLS estimation is consistent only under very peculiar and unusual assumptions on the datagenerating process that limit its applicability. In addition, we show that in either case, bootstrap methods provide the only feasible means for inference in the second stage. We also comment on ad hoc specifications of secondstage regression equations that ignore the part of the datagenerating process that yields data used to obtain the initial DEA estimates.
We develop a tractable, consistent bootstrap algorithm for inference about FarrellDebreu efficiency scores estimated by nonparametric data envelopment analysis (DEA) methods. The algorithm allows for very general situations where the distribution of the inefficiencies in the inputoutput space may be heterogeneous. Computational efficiency and tractability are achieved by avoiding the complex doublesmoothing procedure in the algorithm proposed by Kneip et al. (2008). In particular, we avoid technical difficulties in the earlier algorithm associated with smoothed estimates of a density with unknown, nonlinear, multivariate bounded support requiring complicated reflection methods. The new procedure described here is relatively simple and easy to implement: for particular values of a pair of smoothing parameters, the computational complexity is the same as the (inconsistent) naive bootstrap. The resulting computational speed allows the bootstrap to be iterated in order to optimize the smoothing parameters. From a practical viewpoint, only standard packages for computing DEA efficiency estimates, i.e., solving linear problems, are required for implementation. The performance of the method in finite samples is illustrated through some simulated examples.
It is wellknown that the naive bootstrap yields inconsistent inference in the context of data envelopment analysis (DEA) or free disposal hull (FDH) estimators in nonparametric frontier models. For inference about efficiency of a single, fixed point, drawing bootstrap pseudosamples of size m<n provides consistent inference, although coverages are quite sensitive to the choice of subsample size m. We provide a probabilistic framework in which these methods are shown to valid for statistics comprised of functions of DEA or FDH estimators. We examine a simple, databased rule for selecting m suggested by Politis et al. (2001), and provide Monte Carlo evidence on the size and power of our tests. Our methods (i) allow for heterogeneity in the inefficiency process, and unlike previous methods, (ii) do not require multivariate kernel smoothing, and (iii) avoid the need for solutions of intermediate linear programs.
Frontier techniques, including data envelopment analysis (DEA) and stochastic frontier analysis (SFA), have been used to measure health care provider efficiency in hundreds of published studies. Although these methods have the potential to be useful to decision makers, their utility is limited by both methodological questions concerning their application, as well as some disconnect between the information they provide and the insight sought by decision makers. The articles in this special issue focus on the application of DEA and SFA to hospitals with the hope of making these techniques more accurate and accessible to end users. This introduction to the special issue highlights the importance of measuring the efficiency of health care providers, provides a background on frontier techniques, contains an overview of the articles in the special issue, and suggests a research agenda for DEA and SFA.
Conventional approaches for inference about efficiency in parametric stochastic frontier (PSF) models are based on percentiles of the estimated distribution of the onesided error term, conditional on the composite error. When used as prediction intervals, coverage is poor when the signaltonoise ratio is low, but improves slowly as sample size increases. We show that prediction intervals estimated by bagging yield much better coverages than the conventional approach, even with low signaltonoise ratios. We also present a bootstrap method that gives confidence interval estimates for (conditional) expectations of efficiency, and which have good coverage properties that improve with sample size. In addition, researchers who estimate PSF models typically reject models, samples, or both when residuals have skewness in the ``wrong'' direction, i.e., in a direction that would seem to indicate absence of inefficiency. We show that correctly specified models can generate samples with ``wrongly'' skewed residuals, even when the variance of the inefficiency process is nonzero. Both our bagging and bootstrap methods provide useful information about inefficiency and model parameters irrespective of whether residuals have skewness in the desired direction.
This note corrects an empirical example appearing in Wilson (1993), and provides updated information about the computational burden of the outlierdetection method proposed in Wilson (1993).
This paper uses a new nonparametric, unconditional, hyperbolic orderα quantile estimator to construct a hyperbolic version of the Malmquist index. Unlike traditional nonparametric efficiency estimators, the new estimator is both robust to data outliers and has a rootn convergence rate. We use this estimator to examine changes in the efficiency and productivity of U.S. banks between 1985 and 2004. We find that larger banks experienced larger efficiency and productivity gains than small banks, consistent with the presumption that recent changes in regulation and information technology have favored larger banks.
This paper examines the technical efficiency of U.S. Federal Reserve check processing offices over 19802003. We extend results from Park et al. (2000) and Daouia and Simar (2007) to develop an unconditional, hyperbolic, αquantile estimator of efficiency. Our new estimator is fully nonparametric and robust with respect to outliers; when used to estimate distance to quantiles lying close to the full frontier, it is strongly consistent and converges at rate rootn, thus avoiding the curse of dimensionality that plagues data envelopment analysis (DEA) estimators. Our methods could be used by policymakers to compare inefficiency levels across offices or by managers of individual offices to identify peer offices.
Nonparametric data envelopment analysis (DEA) estimators based on linear programming methods have been widely applied in analyses of productive efficiency. The distributions of these estimators remain unknown except in the simple case of one input and one output, and previous bootstrap methods proposed for inference have not been proved consistent, making inference doubtful. This paper derives the asymptotic distribution of DEA estimators under variable returnstoscale. This result is used to prove consistency of two different bootstrap procedures (one based on subsampling, the other based on smoothing). The smooth bootstrap requires smoothing the irregularlybounded density of inputs and outputs as well as smoothing the DEA frontier estimate. Both bootstrap procedures allow for dependence of the inefficiency process on output levels and the mix of inputs in the case of inputoriented measures, or on inputs levels and the mix of outputs in the case of outputoriented measures.
This paper describes a software package for computing nonparametric efficiency estimates, making inference, and testing hypotheses in frontier models. Commands are provided for bootstrapping as well as computation of some new, robust estimators of efficiency, etc.
Nonparametric estimators based on the idea of enveloping the data (FDH and DEAtype estimators) have been widely used to estimate the productive efficiency of firms and other organizations. Many have claimed that FDH and DEA techniques are nonstatistical, as opposed to econometric approaches where particular parametric expressions are posited to model the frontier. This view is unfounded; statistical models allowing determination of the statistical properties of the nonparametric estimators in general multioutput, multiinput settings can be defined. Recent results provide the asymptotic sampling distributions of both FDH and DEA estimators in multivariate settings. Sampling distributions may also be approximated in very general situations using bootstrap methods. Consequently, statistical inference based on FDH and DEAtype estimators is now possible. In addition, other estimators have recently been developed; these new estimators avoid many of the problems inherent with FDH and DEA estimators, and in particular avoid the wellknown curse of dimensionality that plagues most nonparametric estimators. Statistical properties, including asymptotic distributions, of the new estimators have been uncovered. This chapter summarizes the available results, providing a guide to the existing literature for practitioners and others.
Many papers have regressed nonparametric estimates of productive efficiency on environmental variables in twostage procedures to account for exogenous factors that might affect firms' performance. None of these have described a coherent datagenerating process (DGP). Moreover, conventional approaches to inference employed in these papers are invalid due to complicated, unknown serial correlation among the estimated efficiencies. We first describe a sensible DGP for such models. We propose single and double bootstrap procedures; both permit valid inference, and the double bootstrap procedure improves statistical efficiency in the secondstage regression. We examine the statistical performance of our estimators using Monte Carlo experiments.