## useR! 2011 Live Blog

**The R UseR Conference 2011
University of Warwick, UK**

15:59. **useR! 2012** will be hosted by Vanderbilt University, Nashville, USA.

- Pre-conference short course
- Tutorial June 12
- http://www.r-project.org/
**useR**–**2012**/

**15:02 FINALLY, the last invited talk** by **Simon Urbanek (At&T)** on

“**R Graphics: Supercharged – Recent Advances in Visualization and Analysis of Large Data in R**“ Download Abstract here

- 85% of seats in the great lecturer hall are filled! >250 people
- New feature in R graphic
- Real demo n city map, how can city be more green. estimate the traffic
- Integrated R graphic with the real map (like a google map). Looks nice.
- Polygons with holes:
`polypath()`

– regular`polygon()`

s can create holes - Most recent features -> screen output control
- so far there is no way to tell when to actually show graphic on the screen: now or only now???

**Challenges**

- Data size increases
- Large RAM (>100 GB) and CPU power is affordable
- Visualisation need to keep up

– redering, game industry provides solutions: OpenGL + GPUs

– visualization method for large data

-> interactivity 9divided and conquer, shift of focus)

-> sufficient statistics, aggregation

**Proposed solution**

– Redering speed -> use OpenGL back-ends for R devices (qtdevice, iPlot extreme)

– Example is showing much quicker speed from 5 secs to just a sec

- About iPlots
- iPlots = interactive plot for data analysis- selection, highlighting, brushing …

– interactive change of plot parameter

– queries

– all essential plots ( scatterplot, bar charts, histograms, parallel coordination) - Demo now. The interactive seems nice, similar to the set of new iPods colour set.
- Now we can ask the selected points in the graph by

>which(selected(p))

– DEMO: pca and now you can select those outliers and find what are they!

– DEMO: histogram with changing parameter and so does its graph

**Conclusion**

- R: rasterImage(), polypath(), dev.hold/flush()
- Large data requires fast graphics and interactivity
- OpenGL graphics devices (idev(), qtdevice, …)
- iPlot eXtreme: high performance interactive graphic.
- Fast (C++, OpenGL: Interactivity on > 1 mino points)
- Efficient (no copying, reference semantics)
- Extensible (custom visuals, statistical objects, plots)
**CRAN release as “ix” expected next month**

**References with Link**

**END! Q&A now**

**12:15An algorithm for the computation of the power of Monte Carlo tests with guaranteed precision by Patrick Rubin-Delanchy. Download abstract here**

- Statistical formulation. Data = N observable stream.
- Algorithm with finite time, an example of Permutation test for two Gaussian groups
- They have other “trick” to reduce the effort: Choice of N, hypothesis test on remaining stream.

**
11:57am. Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions by Taylor Arnold**.

**Download abstract here**

- Deal with Komogorov-Smirnov Test
- Discreet K-S test – not implement in any of the major statistical computing package. Hence they aim to do in R
- Decided to require that discrete null dist be specified via class
*stepfun* - After obtaining test statistics, the
*p*-value must be calculated - Implementation is tedious but relatively straight forward
- Using Discrete Cramér-von Mises Test

**11:31am. The benchden Package: Benchmark Densities for Nonparametric Density Estimation by Henrike Weinert (Inference Session) download abstract here**

- the package implements 28 benchmark densities
- from stat package e.g., uniform, exponential
- normal mixtures (Marronite, claw)
- Support: compact = infinite peack, uniform scale mixture or sawtooth
- Support: gaps e.g., Matterhorn, caliper and trimodal uniform
- Support: half line e.g., Maxwell, Pareto and inverse exponential
- Support: Real line e.g., logistic, double exponential
- Simulation study to compare two different bandwidth selector for a kernel estimator

** 11:15am. Density Estimation Packages in R by Henry Deng** Download abstract here (Inference Session)

- Review packages in CRAN on density function
- Calculating speed, random sert of n normally distributed. ash is the fastest and pendensity is the slowest
- Estimation accuracy using mean absolute error. varying results based on type of distribution and data point, interesting.
- Additional idea, trade-off between speed and accuracy
- Well-performing packages seems to have long establishment in R with frequent updates.
- Recommended packages: KernSmooth or ASH

** **

**18 Aug 2011, The last day (but by no mean the least)**

10:32am **Binomial regression** model by Merete Download abstract here

- Three differnt methods for extractor of residuals, unstandardized, standardized and studentized.
- Exact deletion residuals, new type of residual implementaed in binomTools
- approx.deletion (rstudent) residual function
- Parallel histrograms = explorative vertion of Hosmer-Lemeshow goodness-of-fit test (with fixed cut point)
- Half normal plot, uses absolute residual values but otherwise equivalent to a normal plot. Optimal simulated envelopes to support interpretation
- Profile likelihood from MASS package -> return and plot the profile likelihood root – nor the profile likelihood
- Misc = to grop binary or complete grouped fara based on a specified data, empirical area under ROC curve

**6:12pm Interval before Conference Dinner!**

**5:33pm Missing Values in Principle Component Analysis (pca) by Julie Josse download abstract**

- Starting with… theoretical stuffs
- Overfitting issues from missing values -> fixed by shrinkage method
**Procedures**in misMDA package- Step1: Estimation of number of dimensions
- Step2: Imputation of missing values by ‘imputePCA’ function
- Step3: PCA on the completed data set, ‘MIPCA’ function

– Iterative PCA: single imputation method

– A unique alue cannot reflect the cariability of prediction

– MUltiple imputation: generating plausible values for each missing value - Supplementary projection via ‘plot()’ function

– Individual position (and variables) with other predictions - Between imputation variability too!

**5:06pm Here is the HIGHLIGHT. John Fox! on Tests for Multivariate Linear Models with the car Package Download abstract**

- Discussion on fitting multivariate linear models (MLMs) in R with the lm function
- The anova function is flexible but calculating sequential (TypeI) test and performing other common tests, especially for repeat-measures designs, is relatively inconvenient.
- The Anova function (with a capital A) in car package (FOx and Weisberg, 2011) can perform partial (type II or type III) test for the terms in a multivariate linear model, including simply specified multivariate and univariate test for repeated-measures models.
- The linearHypothesis function in the car package can test arbitrary linear hypothesis for multivariate linear models, including models for repeated measures.
- Both the anova and linearHypothesis functions return a variety of information useful in further computation on multivariate linear model
- Now he’s demonstrating how to use ‘car’ package using the Anderson-Fisher Iris data

Correlation plot, basic box-plot > mod.iris <- lm(cbind(Sepal.Lenght, Sepal.Width, Petal.Length, Petal.Width) ~ Species, data=iris) > (monova.iris <- Anova(mod.iris)) Type II MONOVA Tests: ... > anova(mod.iris) gave exact result as default function

- Summary

>summary() - Also handling repeated measures = a single repeated-measure. it can be handled in anova function in R but it is simpler to get common tests from Anova and LinearHypothesis function in the car package.
5 mins left in this presentation!
- 5 mins left in this presentation!
- It’s done! Q&A now.

**4:47pm Multiple choice models: why not the same answer? A comparison among LIMDEP, R, SAS and Stata by Giuseppe Bruno Download Abstract**

- Similar to my presentation but focus on the application of R packages on Choice Model with other proprietary softwares and more technical!

##############################################

**4:35pm Regression Models for Ordinal Data: Introducing R-package ordinal by Rune Haubo B. Christensen Download Abstract**

- The package offers the regression model for ordinal data.
- Providing various standard model fit indices.
- Extends the basic model with scale effect, normal effect, random effects, structured thresholds.
- Future work. more flexible random effect structures and nested effects.

**17 August 2011, 14:02pm. Invited talk on Modelling Three-dimensional surface in R by Adrian Bowman, University of Glasgow. **

- He is showing an application on people faces.
- Three-dimensional point graph he’s presenting is pretty much like a picture! Cool Stuff again.
- Now how to model such data.
- Face3D research consortium: http://www.face3d.ac.uk/wiki/index.php/The_Face3D_project

Breast surgery/reconstructionIdentifying breast boundary

->

-> Begin at the landmark which represent the most prominent point

-> Identifying breast boundary by the point of maximum curve.

-> Subsequent boundary points are now identified by rotation

-> Fit a principle curve ti the single point

-> Then Decomposing asymmetry – surfaces. The component can be also examined by an individual patient - Identifying curves

-> Surface curvature is one of the key issue in the area. We can measure the direction of the curve using this. - Change point detection. There are many approaches.
- He is now showing an example of curve identification of lips, tracking where the lips meet! Hence such a curve is changing dynamically and not linear.
- Disclaimer! Image application in R is not my cup of tea. So the note may looks weird!
- Principle component for faces now!
- Then, application in Orthognatic surgery. Comparing before and after!

– key issue id the prediction after the surgery

– Use CT scanning before the surgery then get the data of your face to predict what’s gonna happens after the surgery

– Taking some measure of uncertainty into the model too. - The last topic Magneto-encephagraphy (MEG)

– data could be very noisy in this case

– Showing a typical dipole topographies on a single dyad data

– Possible dipole? Result on a single trial experiment using dynamic and multiple colour looks nice!

– Result presented in term of both picture and also graph

– variation across trials -> All trails dipole

– A visualization tool in ‘rpanel’ package is a GUI one!

**4:40pm Just finished my presentation. Relax time!**

4:41pm A presentation before me is very interesting. It’s about Inventory but also deal with Bullwhip Effect and Supply Chain Performance. Nice one. The package creator is also a PhD student from Brasil. Gotta tell my supervisor.

4:43pm Now in M02. Ortolani Millo is presenting “Integrating R and Excel for Automatic business Forecasting. It works as an add-in in Excel offering options to do the forecasting. ARIMA is in there too!

**2:46pm Nomograms for visualising relationships between variables by Janathan Rougier
**

- He is showing how to use monogram by fitting a donkey hand-drawing picture
- See picture from David Smith http://yfrog.com/kgev6rvj
- using pynomo package see http://www.pynomo.org

2:02pm: Design of Experiment (DoE) in R by Ulrike Grömping

- She is explaining Principles of DoE.

Block what you can and randomise what you cannot (Box, Hunter and Hunter 1978; 2005).

Randomisation: Balance out unknown influences. - DoE in R: What is there?

– Task Views, thanks to Achim Zeileis

– started February 2008

– currently contains 37 R packages related to DoE

– Main Purposes = Pointer to existing functionality and support synergies. avoid double work

– First package in 2000, conf.design (core) and roughly exponentially increase since 2004 - Key driver for her work on DoE in R

– Wanted free software solution for industrial experimentation

Most often-needed: fractional factorial 2-level designs (->FrF2)

– Also sometimes needed: orthogonal array - Mission

– Free researcher’s and experimenters’ brains

– From intricate mathematical and/or programming tasks

– For thinking about application problem - Package suite for industrial DoE in R

– ‘DoE.base

– FrF2

– DoE.wrapper. for wrapping existing functionality - DoE available in Rcmdr (John Fox) as Rcmdr.Plugin.DoE <- So now it seems not too difficult for me!
- Call for activities

– Make R cover a boarder range of DoE facilities

– Writer a package, or contribute functionality to an existing package

– Try to stay close to existing structures

R Studio by J.J.Allaire

- RStudio = R coding Tool available on Window, MacOS X, and Linux and on the web
- Screenshots look very similar on any platform.
- Highlight = Extract function to re-run a chunk of code
- Conventional R history mechanism = save every command entered, searchable history, code navigation (in the next beta release)
- In 10 year it will be almost impossible to justify NOT using open source software.
- Future plan = make the capabilities of R more transparant and accessibility

**9:58am Keynote by Brian D. Ripley**

- A brief Timeline
- Prehistory – 1997
- JCGS paper summitted Mar 1995
- The ealiest extant version seems to be Jun 1995 (456KB); 0.1 alpha (842KB)
- R 2.14.0 Oct 2011
- R 2.15.0 is scheduled in Mar 2012
- R 3.0.0 will be a Major change but no plan for this yet.

**CRAN**

- CRAN: 2 packages in 1997, >100 in 2001 an now ~3200 current packages
- ~80 successful submission per week to CRAN
- 10,000 current packages for Christmas 2016?
- Infrastructure provided by wu.ac.at and Stefan Theußl

CRAN was replaced by ‘repos’ and provided tool to suport other repositories in 2004, but rather few public repositories have emerged.

**The R Development Process**

- R is run by the active member of its core team. Meet in person only every couple of year.
- The day-to-day business is by email. 3 in NZ 1 India, 8 EU, 3 America

**How do features get into R?**

- R was principally developed for the benefit of the core team. Only they have votes.
- Most of what we have seen in R is there because core team members needed / wanted for e.g., research (esp. initially), teaching ( early 2000s), to develop R itself or to support other projets they were involve with.

**Internationalization**

- The core member are all native speaker if a Western european language which can be written in Latin-1.
- Japanese statisticians became interested in working in R.

**The Future**

- R is heavily dependent on a small group of altruistic people.
- They do feel that their contributions are not treated with respect.
- People needs to trust the decision of the core team.

**Trend prediction**

- Window will remain out-of-step with other OSec.
- The number of packages will grow inexorably. Whereas they provide a wonderfully comprehensive test suite, they also provide a formidable barrier to change.

**8:45am Opening session now!**

Interesting numbers

440 participants

41 countries

342 EU, 60 N America, 16 Oceania, 13 Asia, 5 Central and South America, 4 Africa, 13 conference sponsors and exhibitor