Skip to content

Posts tagged ‘useR 2011’

useR! 2011 Live Blog


The R UseR Conference 2011
University of Warwick, UK

15:59. useR! 2012 will be hosted by Vanderbilt University, Nashville, USA.

15:02 FINALLY, the last invited talk by Simon Urbanek (At&T) on
R Graphics: Supercharged – Recent Advances in Visualization and Analysis of Large Data in R Download Abstract here

  • 85% of seats in the great lecturer hall are filled! >250 people
  • New feature in R graphic
  • Real demo n city map, how can city be more green. estimate the traffic
  • Integrated R graphic with the real map (like a google map). Looks nice.
  • Polygons with holes: polypath()
    – regular polygon()s can create holes
  • Most recent features -> screen output control
  • so far there is no way to tell when to  actually show graphic on the screen: now or only now???

Challenges

  • Data size increases
  • Large RAM (>100 GB) and CPU power is affordable
  • Visualisation need to keep up
    – redering, game industry provides solutions: OpenGL + GPUs
    – visualization method for large data
    -> interactivity 9divided and conquer, shift of focus)
    -> sufficient statistics, aggregation

Proposed solution
– Redering speed -> use OpenGL back-ends for R devices (qtdevice, iPlot extreme)
– Example is showing much quicker speed from 5 secs to just a sec

  • About iPlots
  • iPlots = interactive plot for data analysis- selection, highlighting, brushing …
    – interactive change of plot parameter
    – queries
    – all essential plots ( scatterplot, bar charts, histograms, parallel coordination)
  • Demo now. The interactive seems nice, similar to the set of new iPods colour set.

    iPlot (credit: Ashley Ford's Facebook)

  • Now we can ask the selected points in the graph by
    >which(selected(p))
    – DEMO: pca and now you can select those outliers and find what are they!
    – DEMO: histogram with changing parameter and so does its graph

Conclusion

  • R: rasterImage(), polypath(), dev.hold/flush()
  • Large data requires fast graphics and interactivity
  • OpenGL graphics devices (idev(), qtdevice, …)
  • iPlot eXtreme: high performance interactive graphic.
  • Fast (C++, OpenGL: Interactivity on > 1 mino points)
  • Efficient (no copying, reference semantics)
  • Extensible (custom visuals, statistical objects, plots)
  • CRAN release as “ix” expected next month

References with Link

END! Q&A now

##############################################

12:15An algorithm for the computation of the power of Monte Carlo tests with guaranteed precision by Patrick Rubin-Delanchy. Download abstract here

  • Statistical formulation. Data = N observable stream.
  • Algorithm with finite time, an example of Permutation test for two Gaussian groups
  • They have other “trick” to reduce the effort: Choice of N, hypothesis test on remaining stream.

##############################################

11:57am. Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions by Taylor Arnold
. Download abstract here

  • Deal with Komogorov-Smirnov Test
  • Discreet K-S test – not implement in any of the major statistical computing package. Hence they aim to do in R
  • Decided to require that discrete null dist be specified via class stepfun
  • After obtaining test statistics, the p-value must be calculated
  • Implementation is tedious but relatively straight forward
  • Using Discrete Cramér-von Mises Test

##############################################

11:31am. The benchden Package: Benchmark Densities for Nonparametric Density Estimation by Henrike Weinert (Inference Session) download abstract here

  • the package implements 28 benchmark densities
  • from stat package e.g., uniform, exponential
  • normal mixtures (Marronite, claw)
  • Support: compact = infinite peack, uniform scale mixture or sawtooth
  • Support: gaps e.g., Matterhorn, caliper and trimodal uniform
  • Support: half line e.g., Maxwell, Pareto and inverse exponential
  • Support: Real line e.g., logistic, double exponential
  • Simulation study to compare two different bandwidth selector for a kernel estimator

##############################################

 11:15am. Density Estimation Packages in R by Henry Deng Download abstract here (Inference Session) 

  • Review packages in CRAN on density function
  • Calculating speed, random sert of n normally distributed. ash is the fastest and pendensity is the slowest
  • Estimation accuracy using mean absolute error. varying results based on type of distribution and data point, interesting.
  • Additional idea, trade-off between speed and accuracy
  • Well-performing packages seems to have long establishment in R with frequent updates.
  • Recommended packages: KernSmooth or ASH    

 

##############################################

18 Aug 2011, The last day (but by no mean the least)

10:32am Binomial regression model by Merete Download abstract here

  • Three differnt methods for extractor of residuals, unstandardized, standardized and studentized.
  • Exact deletion residuals, new type of residual implementaed in binomTools
  • approx.deletion (rstudent) residual function
  • Parallel histrograms = explorative vertion of Hosmer-Lemeshow goodness-of-fit test (with fixed cut point)
  • Half normal plot, uses absolute residual values but otherwise equivalent to a normal plot. Optimal simulated envelopes to support interpretation
  • Profile likelihood from MASS package -> return and plot the profile likelihood root – nor the profile likelihood
  • Misc = to grop binary or complete grouped fara based on a specified data, empirical area under ROC curve

##############################################

6:12pm Interval before Conference Dinner!

5:33pm Missing Values in Principle Component Analysis (pca) by Julie Josse download abstract

  • Starting with… theoretical stuffs
  • Overfitting issues from missing values -> fixed by shrinkage method
  • Procedures in misMDA package
  • Step1: Estimation of number of dimensions
  • Step2: Imputation of missing values by ‘imputePCA’ function
  • Step3: PCA on the completed data set, ‘MIPCA’ function
    – Iterative PCA: single imputation method
    – A unique alue cannot reflect the cariability of prediction
    – MUltiple imputation: generating plausible values for each missing value
  • Supplementary projection via ‘plot()’ function
    – Individual position (and variables) with other predictions
  • Between imputation variability too!

##############################################

5:06pm Here is the HIGHLIGHT. John Fox! on Tests for Multivariate Linear Models with the car Package Download abstract

    • Discussion on fitting multivariate linear models (MLMs) in R with the lm function
    • The anova function is flexible but calculating sequential (TypeI) test and performing other common tests, especially for repeat-measures designs, is relatively inconvenient.
    • The Anova function (with a capital A) in car package (FOx and Weisberg, 2011) can perform partial (type II or type III) test for the terms in a multivariate linear model, including simply specified multivariate and univariate  test for repeated-measures models.
    • The linearHypothesis function in the car package can test arbitrary linear hypothesis for multivariate linear models, including models for repeated measures.
    • Both the anova and linearHypothesis functions return a variety of information useful in further computation on multivariate linear model
    • Now he’s demonstrating how to use ‘car’ package using the Anderson-Fisher Iris data
Correlation plot, basic box-plot
> mod.iris <- lm(cbind(Sepal.Lenght, Sepal.Width,
 Petal.Length, Petal.Width) ~ Species, data=iris)
> (monova.iris <- Anova(mod.iris))
Type II MONOVA Tests: ...
> anova(mod.iris)
gave exact result as default function
  • Summary
    >summary()
  • Also handling repeated measures = a single repeated-measure. it can be handled in anova function in R but it is simpler to get common tests from Anova and LinearHypothesis function in the car package.
  • {21% of my MacBook battery now, please no blackout any time soon.
  • 5 mins left in this presentation!
  • It’s done! Q&A now.

##############################################

4:47pm Multiple choice models: why not the same answer? A comparison among LIMDEP, R, SAS and Stata by Giuseppe Bruno Download Abstract

  • Similar to my presentation but focus on the application of R packages on Choice Model with other proprietary softwares and more technical!

##############################################

4:35pm Regression Models for Ordinal Data: Introducing R-package ordinal by Rune Haubo B. Christensen Download Abstract

  • The package offers the regression model for ordinal data.
  • Providing various standard model fit indices.
  • Extends the basic model with scale effect, normal effect, random effects, structured thresholds.
  • Future work. more flexible random effect structures and nested effects.

##############################################

17 August 2011, 14:02pm. Invited talk on Modelling Three-dimensional surface in R by Adrian Bowman, University of Glasgow.  

  • He is showing an application on people faces.
  • Three-dimensional point graph he’s presenting is pretty much like a picture! Cool Stuff again.
  • Now how to model such data.
  • Face3D research consortium: http://www.face3d.ac.uk/wiki/index.php/The_Face3D_project
    Breast surgery/reconstruction
    -> 
    Identifying breast boundary
    -> Begin at the landmark which represent the most prominent point
    -> Identifying breast boundary by the point of maximum curve.
    -> Subsequent boundary points are now identified by rotation
    -> Fit a principle curve ti the single point
    -> Then Decomposing asymmetry – surfaces. The component can be also examined by an individual patient
  • Identifying curves
    -> Surface curvature is one of the key issue in the area. We can measure the direction of the curve using this.
  • Change point detection. There are many approaches.
  • He is now showing an example of curve identification of lips, tracking where the lips meet! Hence such a curve is changing dynamically and not linear. 
  • Disclaimer! Image application in R is not my cup of tea. So the note may looks weird!
  • Principle component for faces now!
  • Then, application in Orthognatic surgery. Comparing before and after!
    – key issue id the prediction after the surgery
    – Use CT scanning before the surgery then get the data of your face to predict what’s gonna happens after the surgery
    – Taking some measure of uncertainty into the model too.
  • The last topic Magneto-encephagraphy (MEG)
    – data could be very noisy in this case
    – Showing a typical dipole topographies  on a single dyad data
    – Possible dipole? Result on a single trial experiment using dynamic and multiple colour looks nice!
    – Result presented in  term of both picture and also graph
    – variation across trials -> All trails dipole
    – A visualization tool in ‘rpanel’ package is a GUI one!

##############################################

4:40pm Just finished my presentation. Relax time!

4:41pm A presentation before me is very interesting. It’s about Inventory but also deal with Bullwhip Effect and Supply Chain Performance. Nice one. The package creator is also a PhD student from Brasil. Gotta tell my supervisor.

4:43pm Now in M02. Ortolani Millo is presenting “Integrating R and Excel for Automatic business Forecasting. It works as an add-in in Excel offering options to do the forecasting. ARIMA is in there too!

##############################################

2:46pm Nomograms for visualising relationships between variables by Janathan Rougier

  • He is showing how to use monogram by fitting a  donkey hand-drawing picture
  • See picture from David Smith http://yfrog.com/kgev6rvj
  • using pynomo package see http://www.pynomo.org

##############################################


2:02pm: Design of Experiment (DoE) in R by  Ulrike Grömping

  •  She is explaining Principles of DoE.
    Block what you can and randomise what you cannot (Box, Hunter and Hunter 1978; 2005).
    Randomisation: Balance out unknown influences.
  • DoE in R: What is there?
    – Task Views, thanks to Achim Zeileis
    – started February 2008
    – currently contains 37 R packages related to DoE
    – Main Purposes = Pointer to existing functionality and support synergies. avoid double work
    – First package in 2000, conf.design (core) and roughly exponentially increase since 2004
  • Key driver for her work on DoE in R
    – Wanted free software solution for industrial experimentation
    Most often-needed: fractional factorial 2-level designs (->FrF2)
    – Also sometimes needed: orthogonal array
  • Mission
    – Free researcher’s and experimenters’ brains
    – From intricate mathematical and/or programming tasks
    – For thinking about application problem
  • Package suite for industrial DoE in R
    – ‘DoE.base
    – FrF2
    – DoE.wrapper. for wrapping existing functionality
  • DoE available in Rcmdr (John Fox) as Rcmdr.Plugin.DoE <- So now it seems not too difficult for me!
  • Call for activities
    – Make R cover a boarder range of DoE facilities
    – Writer a package, or contribute functionality to an existing package
    – Try to stay close to existing structures

##############################################


R Studio by J.J.Allaire

  • RStudio =  R coding Tool available on Window, MacOS X, and Linux and on the web
  • Screenshots look very similar on any platform.
  • Highlight = Extract function to re-run a chunk of code
  • Conventional R history mechanism = save every command entered, searchable history, code navigation (in the next beta release)
  • In 10 year it will be almost impossible to justify NOT using open source software.
  • Future plan = make the capabilities of R more transparant and accessibility

##############################################

9:58am Keynote by Brian D. Ripley

  • A brief Timeline
  • Prehistory – 1997
  • JCGS paper summitted Mar 1995
  • The ealiest extant version seems to be Jun 1995 (456KB); 0.1 alpha (842KB)
  • R 2.14.0 Oct 2011
  • R 2.15.0 is scheduled in Mar 2012
  • R 3.0.0 will be a Major change but no plan for this yet.

CRAN

  • CRAN: 2 packages in 1997, >100 in 2001 an now ~3200 current packages
  • ~80 successful submission per week to CRAN
  • 10,000 current packages for Christmas 2016?
  • Infrastructure provided by wu.ac.at and Stefan Theußl

CRAN was replaced by ‘repos’ and provided tool to suport other repositories in 2004, but rather few public repositories have emerged.

The R Development Process

  • R is run by the active member of its core team. Meet in person only every couple of year.
  • The day-to-day business is by email. 3 in NZ 1 India, 8 EU, 3 America

How do features get into R?

  • R was principally developed for the benefit of the core team. Only they have votes.
  • Most of what we have seen in R is there because core team members needed / wanted for e.g., research (esp. initially), teaching ( early 2000s), to develop R itself or to support other projets they were involve with.

Internationalization

  • The core member are all native speaker if  a Western european language which can be written in Latin-1.
  • Japanese statisticians became interested in working in R.

The Future

  • R is heavily dependent on a small group of altruistic people.
  • They do feel that their contributions are not treated with respect.
  • People needs to trust the decision of the core team.

Trend prediction

  • Window will remain out-of-step with other OSec.
  • The number of packages will grow inexorably. Whereas they provide a wonderfully comprehensive test suite, they also provide a formidable barrier to change.

##############################################

8:45am Opening session now!

Interesting numbers
440 participants
41 countries
342 EU, 60 N America, 16 Oceania, 13 Asia, 5 Central and South America, 4  Africa, 13 conference sponsors and exhibitor

R packages for Structural Equation Model: SEM with R


Structural Equation Model (SEM) was first examined by a software called LISREL. Then, SEM has been mainly run by several proprietary software i.e., Mplus, AMOS, EQS, SAS and a new version of Stata (v.12).

However, you may also run SEM with a great but free software like R.

To the best of my knowledge, there are now four active packages that you can use to fit SEM. Here they are:

Main Packages (for fitting SEM models) 

  1. sem (John Fox, 2006):The first R package for SEM ” fit by maximum likelihood assuming multinormality, and single-equation estimation for observed-variable models by two-stage least.squares.” It was also the first package I tried to run SEM in R. Thanks to a very quick response from Prof.Fox on my question I emailed him.
    See Example of ‘sem’ package here.
  2. OpenMx (Boker et al, 2011)
    A very active package that “is free and open source software for use with R that allows estimation of a wide variety of advanced multivariate statistical models.” contributed by experts in R and SEM.
    See Example of ‘OpenMx’ package here.
  3. lavaan (Yves Rosseel, 2012)
    A promising package for SEM. Its command language is similar to those of Mplus. Hence it is perhaps the most user-friendly package for SEM to date.
    See Example of ‘lavaan’ package here.
    Link to JSS paper
  4. semPLS (Armin Monecke, 2012)
    Fitting Structural Equation Model Using Partial Least Squares
    See: CRAN link, JSS paper
  5. plspm (Gaston Sanchez, 2012)
    R package dedicated to Partial Least Squares (PLS) methods (CRAN, plsmodeling.com)
    by Gaston Sanchez and Laura Trinchera
    A corresponding book titled “PLS Path Modeling with R” can be downloaded here.
My paper in useR! 2011 has evaluated R packages vs. Proprietary software i.e., AMOS & Lisrel.

Today (30 May 2012), I gladly found that there are also complementary packages for SEM in R as follows.

Complementary packages

  • SEMplusR: Functions, examples and datasets to learn, use and teach Structural Equation Modeling (SEM)  [GitHub]
    by Pairach Piboonrungroj 
  • SEMModComp: Model Comparisons for SEM [CRAN link, Additional Documents]
    by  Roy Levy
  • semGOF: an add-on package which provides fourteen goodness-of-fit indeces for structural equation models using ‘sem’ package.[CRAN]
    by Elena Bertossi 
  • stremo: Functions to help the process of learning structural equation modelling [CRAN link]
    by  Gustavo Carvalho, Marco Batalha, and Owen Petchey
  • FIAR: Functional Integration Analysis in R [CRAN link]
    by  Bjorn Roelstraete
  • semTools: Useful tools for structural equation modeling [CRAN link]
    by  Sunthud Pornprasertmanit, Patrick Miller, Alex Schoemann, Yves Rosseel
  • simsem: SIMulated Structural Equation Modeling [CRAN link]
    by  Sunthud Pornprasertmanit, Patrick Miller, Alexander Schoemann
  • pathmox R package dedicated to segmentation trees in PLS Path Modeling [CRAN, plsmodeling.com]

Packages for SEM plotting and graphics

  • qgraph: Network representations of relationships in data [CRAN link]
    by  Sacha Epskamp, Angelique O. J. Cramer, Lourens J. Waldorp, Verena D. Schmittmann and Denny Borsboom
  • psych: Procedures for Psychological, Psychometric, and Personality Research [CRAN link]
    by William Revelle

Packages that link R with other software to fit SEM

  • Mplus
    Automating Mplus Model Estimation and Interpretation [CRAN link]
    by  Michael Hallquist
  • EQS
    R/EQS Interface [CRAN link]
    by  Patrick Mair and Eric Wu

More external resources on SEM in R

  • CRAN Task view on ‘Structural Equation Models, Factor Analysis, PCA’ in Psychometrics [url]
    by Patrick Mair
  • A tutorial on the use of sem package  [url]
    by William Revelle
  • A post on ‘Structural Equation Modeling in R‘  [url]
    by Jeromy Anglim

sem package in R: a sample of transaction cost measurement


A sneak peek of the result of a Structural Equation Model using sem package in R.

I will present in the R User Conference in Warwick next week, is reveal first here.

The following are the codes of ‘sem’ package I used in the paper.

You may also view the code of other packages to run SEM models as well.

##-------------------------------------------------##
##    Measuring Transaction Cost in Supply Chains  ##
##               Pairach Piboonrungroj             ##
##         R useR conference August 2011           ##
##-------------------------------------------------##
install.packages("sem")
library(sem)

# 1. Load data
hoteldata <- read.csv("http://dl.dropbox.com/u/46344142/useR2011/cleandata.csv")

#Input covariance matrix
data.tc.1 <- cor(hoteldata)

# Have a look at the top of the data
# to check it we import the right one
head(data.tc.1)

#Create an object contains correlation matrix of the data for fitting the model in the next step
hotel.cor <- cor(hoteldata)
#  path parameter  start-value
model.TC.1 <- specify.model()
	TC   -> TC1,    gamma1,  NA # measurement item
	TC   -> TC2,    gamma2,  NA
 	TC   -> TC3,    gamma3,  NA
 	TC   -> TC6,    gamma6,  NA
 	TC   -> TC7,    gamma7,  NA
 	TC   -> TC11,   gamma11, NA
 	TC   -> TC13,   gamma13, NA
 	TC1  <-> TC1,	e1,      NA # measurement error
 	TC2  <-> TC2,	e2,      NA
 	TC3  <-> TC3,	e3,      NA
 	TC6  <-> TC6,	e6,      NA
 	TC7  <-> TC7,	e7,      NA
 	TC11 <-> TC11,	e11,     NA
 	TC13 <-> TC13,	e13,     NA
 	TC   <-> TC,    NA,      1

model.TC.1

sem.TC.1 <- sem(model.TC.1, data.tc.1, 53)
# print result (fit indices, parameters, hypothesis tests)
summary(sem.TC.1)
# standardised coefficients (loadings)
std.coef(sem.TC.1)
%d bloggers like this: