Statistics and Computing
This page provides some information about the statistical applications and computing resources used in Dr. Rosopa’s research lab.
S-PLUS & R
S-PLUS is a statistical environment. Underlying S-PLUS is the award-winning language S developed at Bell Labs (now Lucent Technologies). S-PLUS has a very user-friendly interface with point-and-click features similar to those available in other statistical software. Furthermore, S-PLUS is flexible and powerful (e.g., modern statistical methods, stunning graphics capabilities, dynamically interface with compiled routines in C, FORTRAN, and C++, interface with Java methods, develop customized applications, highly extensible, etc.). R is an open-source system that can essentially be described as a free version of S. Many of the functions are identical (or very similar). Although R does not have a graphical user interface like that of other statistical environments, John Fox (a sociologist and statistician) has developed an add-on package which loads a graphical user interface in R. Although Dr. Rosopa uses both S-PLUS and R, he tends to use R for teaching. Being an open-source system, there is an international community of researchers in statistics, computer science, and other areas that contribute to the continued development of R. It is not uncommon for new methods to become available in R much faster than in other statistical applications.
S-PLUS and R are used in a variety of industries and research disciplines. To become more familiar with S-PLUS and R, here are some useful links.
- Econometrics (UC-Berkeley Econometrics Laboratory)
- Environmental Statistics (Philip Dixon)
- Political Science (Jeff Gill)
- Psychology (R for Psychology)
- Quantitative Risk Management (Alexander J. McNeil)
- Sociology (John Fox)
- Statistics (UCLA, Basics of S-PLUS, Robust Methods by Rand Wilcox)
Here are some recommended books:
- Faraway, J. J. (2005). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models.
- Fox, J. (2002). An R and S-PLUS Companion to Applied Regression.
- Krause, A., & Olson, M. (2005). The Basics of S-PLUS (4th ed.).
- Pinheiro, J. C., & Bates, D. M. (2000). Mixed Effects Models in S and S-PLUS.
- Venables, W., & Ripley, B. D. (2000). S Programming.
- Venables, W., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed.).
Condor is a workload management system for computing-intensive tasks. From a pool of machines (e.g., 100s, 1000s) that Condor manages, it uses unharvested computing cycles to complete jobs. Some examples of jobs include simulation of ground penetrating radar, machine learning, Monte Carlo simulations (e.g., from astrophysics to psychology), simulation of polymer blends, etc. Here are some interesting statistics:
- At the University of Wisconsin-Madison, over 1000 workstations are managed by Condor and on an average day, Condor completes 650 CPU days.
- At Clemson University, Condor links over 1700 machines in the campus grid. Mary Beth Kurz, an Assistant Professor of industrial engineering, studies genetic algorithms for large scale optimization in manufacturing. She was able to complete the equivalent of 17 years of computing time in just one week using the campus grid managed by Condor.