Skip to content

Posts from the ‘Software’ Category

useR! 2011 Live Blog


The R UseR Conference 2011
University of Warwick, UK

15:59. useR! 2012 will be hosted by Vanderbilt University, Nashville, USA.

15:02 FINALLY, the last invited talk by Simon Urbanek (At&T) on
R Graphics: Supercharged – Recent Advances in Visualization and Analysis of Large Data in R Download Abstract here

  • 85% of seats in the great lecturer hall are filled! >250 people
  • New feature in R graphic
  • Real demo n city map, how can city be more green. estimate the traffic
  • Integrated R graphic with the real map (like a google map). Looks nice.
  • Polygons with holes: polypath()
    – regular polygon()s can create holes
  • Most recent features -> screen output control
  • so far there is no way to tell when to  actually show graphic on the screen: now or only now???

Challenges

  • Data size increases
  • Large RAM (>100 GB) and CPU power is affordable
  • Visualisation need to keep up
    – redering, game industry provides solutions: OpenGL + GPUs
    – visualization method for large data
    -> interactivity 9divided and conquer, shift of focus)
    -> sufficient statistics, aggregation

Proposed solution
– Redering speed -> use OpenGL back-ends for R devices (qtdevice, iPlot extreme)
– Example is showing much quicker speed from 5 secs to just a sec

  • About iPlots
  • iPlots = interactive plot for data analysis- selection, highlighting, brushing …
    – interactive change of plot parameter
    – queries
    – all essential plots ( scatterplot, bar charts, histograms, parallel coordination)
  • Demo now. The interactive seems nice, similar to the set of new iPods colour set.

    iPlot (credit: Ashley Ford's Facebook)

  • Now we can ask the selected points in the graph by
    >which(selected(p))
    – DEMO: pca and now you can select those outliers and find what are they!
    – DEMO: histogram with changing parameter and so does its graph

Conclusion

  • R: rasterImage(), polypath(), dev.hold/flush()
  • Large data requires fast graphics and interactivity
  • OpenGL graphics devices (idev(), qtdevice, …)
  • iPlot eXtreme: high performance interactive graphic.
  • Fast (C++, OpenGL: Interactivity on > 1 mino points)
  • Efficient (no copying, reference semantics)
  • Extensible (custom visuals, statistical objects, plots)
  • CRAN release as “ix” expected next month

References with Link

END! Q&A now

##############################################

12:15An algorithm for the computation of the power of Monte Carlo tests with guaranteed precision by Patrick Rubin-Delanchy. Download abstract here

  • Statistical formulation. Data = N observable stream.
  • Algorithm with finite time, an example of Permutation test for two Gaussian groups
  • They have other “trick” to reduce the effort: Choice of N, hypothesis test on remaining stream.

##############################################

11:57am. Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions by Taylor Arnold
. Download abstract here

  • Deal with Komogorov-Smirnov Test
  • Discreet K-S test – not implement in any of the major statistical computing package. Hence they aim to do in R
  • Decided to require that discrete null dist be specified via class stepfun
  • After obtaining test statistics, the p-value must be calculated
  • Implementation is tedious but relatively straight forward
  • Using Discrete Cramér-von Mises Test

##############################################

11:31am. The benchden Package: Benchmark Densities for Nonparametric Density Estimation by Henrike Weinert (Inference Session) download abstract here

  • the package implements 28 benchmark densities
  • from stat package e.g., uniform, exponential
  • normal mixtures (Marronite, claw)
  • Support: compact = infinite peack, uniform scale mixture or sawtooth
  • Support: gaps e.g., Matterhorn, caliper and trimodal uniform
  • Support: half line e.g., Maxwell, Pareto and inverse exponential
  • Support: Real line e.g., logistic, double exponential
  • Simulation study to compare two different bandwidth selector for a kernel estimator

##############################################

 11:15am. Density Estimation Packages in R by Henry Deng Download abstract here (Inference Session) 

  • Review packages in CRAN on density function
  • Calculating speed, random sert of n normally distributed. ash is the fastest and pendensity is the slowest
  • Estimation accuracy using mean absolute error. varying results based on type of distribution and data point, interesting.
  • Additional idea, trade-off between speed and accuracy
  • Well-performing packages seems to have long establishment in R with frequent updates.
  • Recommended packages: KernSmooth or ASH    

 

##############################################

18 Aug 2011, The last day (but by no mean the least)

10:32am Binomial regression model by Merete Download abstract here

  • Three differnt methods for extractor of residuals, unstandardized, standardized and studentized.
  • Exact deletion residuals, new type of residual implementaed in binomTools
  • approx.deletion (rstudent) residual function
  • Parallel histrograms = explorative vertion of Hosmer-Lemeshow goodness-of-fit test (with fixed cut point)
  • Half normal plot, uses absolute residual values but otherwise equivalent to a normal plot. Optimal simulated envelopes to support interpretation
  • Profile likelihood from MASS package -> return and plot the profile likelihood root – nor the profile likelihood
  • Misc = to grop binary or complete grouped fara based on a specified data, empirical area under ROC curve

##############################################

6:12pm Interval before Conference Dinner!

5:33pm Missing Values in Principle Component Analysis (pca) by Julie Josse download abstract

  • Starting with… theoretical stuffs
  • Overfitting issues from missing values -> fixed by shrinkage method
  • Procedures in misMDA package
  • Step1: Estimation of number of dimensions
  • Step2: Imputation of missing values by ‘imputePCA’ function
  • Step3: PCA on the completed data set, ‘MIPCA’ function
    – Iterative PCA: single imputation method
    – A unique alue cannot reflect the cariability of prediction
    – MUltiple imputation: generating plausible values for each missing value
  • Supplementary projection via ‘plot()’ function
    – Individual position (and variables) with other predictions
  • Between imputation variability too!

##############################################

5:06pm Here is the HIGHLIGHT. John Fox! on Tests for Multivariate Linear Models with the car Package Download abstract

    • Discussion on fitting multivariate linear models (MLMs) in R with the lm function
    • The anova function is flexible but calculating sequential (TypeI) test and performing other common tests, especially for repeat-measures designs, is relatively inconvenient.
    • The Anova function (with a capital A) in car package (FOx and Weisberg, 2011) can perform partial (type II or type III) test for the terms in a multivariate linear model, including simply specified multivariate and univariate  test for repeated-measures models.
    • The linearHypothesis function in the car package can test arbitrary linear hypothesis for multivariate linear models, including models for repeated measures.
    • Both the anova and linearHypothesis functions return a variety of information useful in further computation on multivariate linear model
    • Now he’s demonstrating how to use ‘car’ package using the Anderson-Fisher Iris data
Correlation plot, basic box-plot
> mod.iris <- lm(cbind(Sepal.Lenght, Sepal.Width,
 Petal.Length, Petal.Width) ~ Species, data=iris)
> (monova.iris <- Anova(mod.iris))
Type II MONOVA Tests: ...
> anova(mod.iris)
gave exact result as default function
  • Summary
    >summary()
  • Also handling repeated measures = a single repeated-measure. it can be handled in anova function in R but it is simpler to get common tests from Anova and LinearHypothesis function in the car package.
  • {21% of my MacBook battery now, please no blackout any time soon.
  • 5 mins left in this presentation!
  • It’s done! Q&A now.

##############################################

4:47pm Multiple choice models: why not the same answer? A comparison among LIMDEP, R, SAS and Stata by Giuseppe Bruno Download Abstract

  • Similar to my presentation but focus on the application of R packages on Choice Model with other proprietary softwares and more technical!

##############################################

4:35pm Regression Models for Ordinal Data: Introducing R-package ordinal by Rune Haubo B. Christensen Download Abstract

  • The package offers the regression model for ordinal data.
  • Providing various standard model fit indices.
  • Extends the basic model with scale effect, normal effect, random effects, structured thresholds.
  • Future work. more flexible random effect structures and nested effects.

##############################################

17 August 2011, 14:02pm. Invited talk on Modelling Three-dimensional surface in R by Adrian Bowman, University of Glasgow.  

  • He is showing an application on people faces.
  • Three-dimensional point graph he’s presenting is pretty much like a picture! Cool Stuff again.
  • Now how to model such data.
  • Face3D research consortium: http://www.face3d.ac.uk/wiki/index.php/The_Face3D_project
    Breast surgery/reconstruction
    -> 
    Identifying breast boundary
    -> Begin at the landmark which represent the most prominent point
    -> Identifying breast boundary by the point of maximum curve.
    -> Subsequent boundary points are now identified by rotation
    -> Fit a principle curve ti the single point
    -> Then Decomposing asymmetry – surfaces. The component can be also examined by an individual patient
  • Identifying curves
    -> Surface curvature is one of the key issue in the area. We can measure the direction of the curve using this.
  • Change point detection. There are many approaches.
  • He is now showing an example of curve identification of lips, tracking where the lips meet! Hence such a curve is changing dynamically and not linear. 
  • Disclaimer! Image application in R is not my cup of tea. So the note may looks weird!
  • Principle component for faces now!
  • Then, application in Orthognatic surgery. Comparing before and after!
    – key issue id the prediction after the surgery
    – Use CT scanning before the surgery then get the data of your face to predict what’s gonna happens after the surgery
    – Taking some measure of uncertainty into the model too.
  • The last topic Magneto-encephagraphy (MEG)
    – data could be very noisy in this case
    – Showing a typical dipole topographies  on a single dyad data
    – Possible dipole? Result on a single trial experiment using dynamic and multiple colour looks nice!
    – Result presented in  term of both picture and also graph
    – variation across trials -> All trails dipole
    – A visualization tool in ‘rpanel’ package is a GUI one!

##############################################

4:40pm Just finished my presentation. Relax time!

4:41pm A presentation before me is very interesting. It’s about Inventory but also deal with Bullwhip Effect and Supply Chain Performance. Nice one. The package creator is also a PhD student from Brasil. Gotta tell my supervisor.

4:43pm Now in M02. Ortolani Millo is presenting “Integrating R and Excel for Automatic business Forecasting. It works as an add-in in Excel offering options to do the forecasting. ARIMA is in there too!

##############################################

2:46pm Nomograms for visualising relationships between variables by Janathan Rougier

  • He is showing how to use monogram by fitting a  donkey hand-drawing picture
  • See picture from David Smith http://yfrog.com/kgev6rvj
  • using pynomo package see http://www.pynomo.org

##############################################


2:02pm: Design of Experiment (DoE) in R by  Ulrike Grömping

  •  She is explaining Principles of DoE.
    Block what you can and randomise what you cannot (Box, Hunter and Hunter 1978; 2005).
    Randomisation: Balance out unknown influences.
  • DoE in R: What is there?
    – Task Views, thanks to Achim Zeileis
    – started February 2008
    – currently contains 37 R packages related to DoE
    – Main Purposes = Pointer to existing functionality and support synergies. avoid double work
    – First package in 2000, conf.design (core) and roughly exponentially increase since 2004
  • Key driver for her work on DoE in R
    – Wanted free software solution for industrial experimentation
    Most often-needed: fractional factorial 2-level designs (->FrF2)
    – Also sometimes needed: orthogonal array
  • Mission
    – Free researcher’s and experimenters’ brains
    – From intricate mathematical and/or programming tasks
    – For thinking about application problem
  • Package suite for industrial DoE in R
    – ‘DoE.base
    – FrF2
    – DoE.wrapper. for wrapping existing functionality
  • DoE available in Rcmdr (John Fox) as Rcmdr.Plugin.DoE <- So now it seems not too difficult for me!
  • Call for activities
    – Make R cover a boarder range of DoE facilities
    – Writer a package, or contribute functionality to an existing package
    – Try to stay close to existing structures

##############################################


R Studio by J.J.Allaire

  • RStudio =  R coding Tool available on Window, MacOS X, and Linux and on the web
  • Screenshots look very similar on any platform.
  • Highlight = Extract function to re-run a chunk of code
  • Conventional R history mechanism = save every command entered, searchable history, code navigation (in the next beta release)
  • In 10 year it will be almost impossible to justify NOT using open source software.
  • Future plan = make the capabilities of R more transparant and accessibility

##############################################

9:58am Keynote by Brian D. Ripley

  • A brief Timeline
  • Prehistory – 1997
  • JCGS paper summitted Mar 1995
  • The ealiest extant version seems to be Jun 1995 (456KB); 0.1 alpha (842KB)
  • R 2.14.0 Oct 2011
  • R 2.15.0 is scheduled in Mar 2012
  • R 3.0.0 will be a Major change but no plan for this yet.

CRAN

  • CRAN: 2 packages in 1997, >100 in 2001 an now ~3200 current packages
  • ~80 successful submission per week to CRAN
  • 10,000 current packages for Christmas 2016?
  • Infrastructure provided by wu.ac.at and Stefan Theußl

CRAN was replaced by ‘repos’ and provided tool to suport other repositories in 2004, but rather few public repositories have emerged.

The R Development Process

  • R is run by the active member of its core team. Meet in person only every couple of year.
  • The day-to-day business is by email. 3 in NZ 1 India, 8 EU, 3 America

How do features get into R?

  • R was principally developed for the benefit of the core team. Only they have votes.
  • Most of what we have seen in R is there because core team members needed / wanted for e.g., research (esp. initially), teaching ( early 2000s), to develop R itself or to support other projets they were involve with.

Internationalization

  • The core member are all native speaker if  a Western european language which can be written in Latin-1.
  • Japanese statisticians became interested in working in R.

The Future

  • R is heavily dependent on a small group of altruistic people.
  • They do feel that their contributions are not treated with respect.
  • People needs to trust the decision of the core team.

Trend prediction

  • Window will remain out-of-step with other OSec.
  • The number of packages will grow inexorably. Whereas they provide a wonderfully comprehensive test suite, they also provide a formidable barrier to change.

##############################################

8:45am Opening session now!

Interesting numbers
440 participants
41 countries
342 EU, 60 N America, 16 Oceania, 13 Asia, 5 Central and South America, 4  Africa, 13 conference sponsors and exhibitor

R packages for Structural Equation Model: SEM with R


Structural Equation Model (SEM) was first examined by a software called LISREL. Then, SEM has been mainly run by several proprietary software i.e., Mplus, AMOS, EQS, SAS and a new version of Stata (v.12).

However, you may also run SEM with a great but free software like R.

To the best of my knowledge, there are now four active packages that you can use to fit SEM. Here they are:

Main Packages (for fitting SEM models) 

  1. sem (John Fox, 2006):The first R package for SEM ” fit by maximum likelihood assuming multinormality, and single-equation estimation for observed-variable models by two-stage least.squares.” It was also the first package I tried to run SEM in R. Thanks to a very quick response from Prof.Fox on my question I emailed him.
    See Example of ‘sem’ package here.
  2. OpenMx (Boker et al, 2011)
    A very active package that “is free and open source software for use with R that allows estimation of a wide variety of advanced multivariate statistical models.” contributed by experts in R and SEM.
    See Example of ‘OpenMx’ package here.
  3. lavaan (Yves Rosseel, 2012)
    A promising package for SEM. Its command language is similar to those of Mplus. Hence it is perhaps the most user-friendly package for SEM to date.
    See Example of ‘lavaan’ package here.
    Link to JSS paper
  4. semPLS (Armin Monecke, 2012)
    Fitting Structural Equation Model Using Partial Least Squares
    See: CRAN link, JSS paper
  5. plspm (Gaston Sanchez, 2012)
    R package dedicated to Partial Least Squares (PLS) methods (CRAN, plsmodeling.com)
    by Gaston Sanchez and Laura Trinchera
    A corresponding book titled “PLS Path Modeling with R” can be downloaded here.
My paper in useR! 2011 has evaluated R packages vs. Proprietary software i.e., AMOS & Lisrel.

Today (30 May 2012), I gladly found that there are also complementary packages for SEM in R as follows.

Complementary packages

  • SEMplusR: Functions, examples and datasets to learn, use and teach Structural Equation Modeling (SEM)  [GitHub]
    by Pairach Piboonrungroj 
  • SEMModComp: Model Comparisons for SEM [CRAN link, Additional Documents]
    by  Roy Levy
  • semGOF: an add-on package which provides fourteen goodness-of-fit indeces for structural equation models using ‘sem’ package.[CRAN]
    by Elena Bertossi 
  • stremo: Functions to help the process of learning structural equation modelling [CRAN link]
    by  Gustavo Carvalho, Marco Batalha, and Owen Petchey
  • FIAR: Functional Integration Analysis in R [CRAN link]
    by  Bjorn Roelstraete
  • semTools: Useful tools for structural equation modeling [CRAN link]
    by  Sunthud Pornprasertmanit, Patrick Miller, Alex Schoemann, Yves Rosseel
  • simsem: SIMulated Structural Equation Modeling [CRAN link]
    by  Sunthud Pornprasertmanit, Patrick Miller, Alexander Schoemann
  • pathmox R package dedicated to segmentation trees in PLS Path Modeling [CRAN, plsmodeling.com]

Packages for SEM plotting and graphics

  • qgraph: Network representations of relationships in data [CRAN link]
    by  Sacha Epskamp, Angelique O. J. Cramer, Lourens J. Waldorp, Verena D. Schmittmann and Denny Borsboom
  • psych: Procedures for Psychological, Psychometric, and Personality Research [CRAN link]
    by William Revelle

Packages that link R with other software to fit SEM

  • Mplus
    Automating Mplus Model Estimation and Interpretation [CRAN link]
    by  Michael Hallquist
  • EQS
    R/EQS Interface [CRAN link]
    by  Patrick Mair and Eric Wu

More external resources on SEM in R

  • CRAN Task view on ‘Structural Equation Models, Factor Analysis, PCA’ in Psychometrics [url]
    by Patrick Mair
  • A tutorial on the use of sem package  [url]
    by William Revelle
  • A post on ‘Structural Equation Modeling in R‘  [url]
    by Jeromy Anglim

Pairach.com has now linked to R-Bloggers.com


Finally, in the light of the greater contribution, this website will be a part of one of the largest R websites, R-Bloggers.com after following this site for a while.

For those who are using R or interested in this great but free software, R-Bloggers.com is a must!

sem package in R: a sample of transaction cost measurement


A sneak peek of the result of a Structural Equation Model using sem package in R.

I will present in the R User Conference in Warwick next week, is reveal first here.

The following are the codes of ‘sem’ package I used in the paper.

You may also view the code of other packages to run SEM models as well.

##-------------------------------------------------##
##    Measuring Transaction Cost in Supply Chains  ##
##               Pairach Piboonrungroj             ##
##         R useR conference August 2011           ##
##-------------------------------------------------##
install.packages("sem")
library(sem)

# 1. Load data
hoteldata <- read.csv("http://dl.dropbox.com/u/46344142/useR2011/cleandata.csv")

#Input covariance matrix
data.tc.1 <- cor(hoteldata)

# Have a look at the top of the data
# to check it we import the right one
head(data.tc.1)

#Create an object contains correlation matrix of the data for fitting the model in the next step
hotel.cor <- cor(hoteldata)
#  path parameter  start-value
model.TC.1 <- specify.model()
	TC   -> TC1,    gamma1,  NA # measurement item
	TC   -> TC2,    gamma2,  NA
 	TC   -> TC3,    gamma3,  NA
 	TC   -> TC6,    gamma6,  NA
 	TC   -> TC7,    gamma7,  NA
 	TC   -> TC11,   gamma11, NA
 	TC   -> TC13,   gamma13, NA
 	TC1  <-> TC1,	e1,      NA # measurement error
 	TC2  <-> TC2,	e2,      NA
 	TC3  <-> TC3,	e3,      NA
 	TC6  <-> TC6,	e6,      NA
 	TC7  <-> TC7,	e7,      NA
 	TC11 <-> TC11,	e11,     NA
 	TC13 <-> TC13,	e13,     NA
 	TC   <-> TC,    NA,      1

model.TC.1

sem.TC.1 <- sem(model.TC.1, data.tc.1, 53)
# print result (fit indices, parameters, hypothesis tests)
summary(sem.TC.1)
# standardised coefficients (loadings)
std.coef(sem.TC.1)

useR! 2011 full timetable out NOW!


Monday 15th August

Monday timetable.

Opening Mixer, sponsored by CRiSM, 19:30 – 21:00

All conference attendees are welcome to join us for drinks at The Bar on the first floor of the Rootes Building. Beer, wine, non-alcoholic drinks and light snacks will be available for all attendees.

Tuesday 16th August

Tuesday timetable.

Poster Reception, sponsored by Revolution Analytics, 20:00 – 23:00

The evening poster reception will be held at the Panorama Suite on the second floor of the Rootes Building. Beer, wine, non-alcoholic drinks and light snacks will be available for all attendees.

Wednesday 17th August

Wednesday timetable.

Conference Dinner, sponsored by RStudio, 19:30 – 23:00

The conference dinner will be held at the Panorama Suite on the second floor of the Rootes Building. Admission to the conference dinner is via ticket only.

Thursday 18th August

Thursday timetable.


Notes

Printed abstracts will not be provided at the conference. Links to individual abstracts are provided in the timetables above and the abstracts for all contributed talks and posters are collected together in theabstract booklet.

Please note that there are three different formats for contributed talks:

useR! Kaleidoscope:
These sessions will give a broad overview of the many different applications of R and should appeal to a wide audience.
useR! Focus Sessions:
These sessions will focus on topics of special interest and may be more technical.
useR! Lightning Talks:
New for useR! 2011, these sessions, with oral presentations of 5 minutes, provide a platform for participants to speak on any R-related topic and should particularly appeal to R newbies.

Information for Speakers, Session Chairs and Poster Presenters

  • Presenters of Kaleidoscope and Focus contributed talks, please note that your talk is scheduled for 17 minutes, followed by 3 minutes discussion.
  • Presenters of Lightning talks, please note that your talk is scheduled for 5 minutes, with 1 minute question/transition time. A variation of the pecha kucha and ignite formats will be used, in which you must provide 15 slides to accompany your talk and each slide will be shown for 20 seconds. Slides must be provided in PDF format and sent to useR-2011@R-project.org by 17:00 GMT, Friday 12 August, 2011.
  • Each useR! Invited Lecture will last 40 minutes, with 5 minutes at the end of the lecture reserved for questions.
  • Technical details: All lecture rooms are equipped with an LCD projector and a computer or laptop that is connected to the internet. Unless speakers require their own laptop for software demonstrations, speakers are expected to ensure that a PDF of their presentation slides is on the computer provided, before the start of their session. For presenters of Lightning Talks, this is achieved by sending the slides in advance as described above; for presenters of other talks, please see the conference assistant in the relevant lecture room to arrange transfer of your slides. Conference assistants will be available in the lecture rooms at least 10 minutes before the program starts each day and at least 10 minutes before the program recommences after each coffee or lunch break. Please arrive well before the start of your session and introduce yourself to the Session Chair. During your talk, please look out for the countdown cards that the Chair will use to signal that your time is coming to a close.
  • Session Chairs: Please check the News Board in the main atrium for any changes to your session. Please arrive well before the start of your session to meet the speakers and ensure that they have a PDF of their slides on the room computer. Countdown cards will be provided to help you keep the speakers to time. If we are aware that a speaker is missing up to 2 hours prior to the relevant session, then the other talks will be brought forwards and a notice placed on the News Board, otherwise can we please request that Chairs keep to the timetable, and in the event of a missing speaker, advise the audience to go to an alternative talk.
  • For those preparing posters, the poster boards can accommodate posters of size A0 (Portrait) or A1 (Landscape). Poster presenters should arrive at least 15 minutes prior to the start of the poster session to put up their poster (presenters can access the venue from 19:00). Regular posters will have allocated boards as numbered in the schedule; late-breaking posters can be put up on any unnumbered board. Presenters should stay in the vicinity of their poster for the first hour of the session. Please take down your poster at the end of the session; if you are unable to stay until the end (23:00) remaining posters will be removed and can be retrieved from the registration desk the next day.

Source:  The Official R users Conference 2011
PS. My abstract can be found here