# useR! 2011 Live Blog

**The R UseR Conference 2011
University of Warwick, UK**

15:59. **useR! 2012** will be hosted by Vanderbilt University, Nashville, USA.

- Pre-conference short course
- Tutorial June 12
- http://www.r-project.org/
**useR**-**2012**/

**15:02 FINALLY, the last invited talk** by **Simon Urbanek (At&T)** on

“**R Graphics: Supercharged – Recent Advances in Visualization and Analysis of Large Data in R**“ Download Abstract here

- 85% of seats in the great lecturer hall are filled! >250 people
- New feature in R graphic
- Real demo n city map, how can city be more green. estimate the traffic
- Integrated R graphic with the real map (like a google map). Looks nice.
- Polygons with holes:
`polypath()`

- regular`polygon()`

s can create holes - Most recent features -> screen output control
- so far there is no way to tell when to actually show graphic on the screen: now or only now???

**Challenges**

- Data size increases
- Large RAM (>100 GB) and CPU power is affordable
- Visualisation need to keep up

- redering, game industry provides solutions: OpenGL + GPUs

- visualization method for large data

-> interactivity 9divided and conquer, shift of focus)

-> sufficient statistics, aggregation

**Proposed solution**

- Redering speed -> use OpenGL back-ends for R devices (qtdevice, iPlot extreme)

- Example is showing much quicker speed from 5 secs to just a sec

- About iPlots
- iPlots = interactive plot for data analysis- selection, highlighting, brushing …

- interactive change of plot parameter

- queries

- all essential plots ( scatterplot, bar charts, histograms, parallel coordination) - Demo now. The interactive seems nice, similar to the set of new iPods colour set.
- Now we can ask the selected points in the graph by

>which(selected(p))

- DEMO: pca and now you can select those outliers and find what are they!

- DEMO: histogram with changing parameter and so does its graph

**Conclusion**

- R: rasterImage(), polypath(), dev.hold/flush()
- Large data requires fast graphics and interactivity
- OpenGL graphics devices (idev(), qtdevice, …)
- iPlot eXtreme: high performance interactive graphic.
- Fast (C++, OpenGL: Interactivity on > 1 mino points)
- Efficient (no copying, reference semantics)
- Extensible (custom visuals, statistical objects, plots)
**CRAN release as “ix” expected next month**

**References with Link**

**END! Q&A now**

**12:15An algorithm for the computation of the power of Monte Carlo tests with guaranteed precision by Patrick Rubin-Delanchy. Download abstract here**

- Statistical formulation. Data = N observable stream.
- Algorithm with finite time, an example of Permutation test for two Gaussian groups
- They have other “trick” to reduce the effort: Choice of N, hypothesis test on remaining stream.

##############################################

**
11:57am. Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions by Taylor Arnold**.

**Download abstract here**

- Deal with Komogorov-Smirnov Test
- Discreet K-S test – not implement in any of the major statistical computing package. Hence they aim to do in R
- Decided to require that discrete null dist be specified via class
*stepfun* - After obtaining test statistics, the
*p*-value must be calculated - Implementation is tedious but relatively straight forward
- Using Discrete Cramér-von Mises Test

##############################################

**11:31am. The benchden Package: Benchmark Densities for Nonparametric Density Estimation by Henrike Weinert (Inference Session) download abstract here**

- the package implements 28 benchmark densities
- from stat package e.g., uniform, exponential
- normal mixtures (Marronite, claw)
- Support: compact = infinite peack, uniform scale mixture or sawtooth
- Support: gaps e.g., Matterhorn, caliper and trimodal uniform
- Support: half line e.g., Maxwell, Pareto and inverse exponential
- Support: Real line e.g., logistic, double exponential
- Simulation study to compare two different bandwidth selector for a kernel estimator

##############################################

** 11:15am. Density Estimation Packages in R by Henry Deng** Download abstract here (Inference Session)

- Review packages in CRAN on density function
- Calculating speed, random sert of n normally distributed. ash is the fastest and pendensity is the slowest
- Estimation accuracy using mean absolute error. varying results based on type of distribution and data point, interesting.
- Additional idea, trade-off between speed and accuracy
- Well-performing packages seems to have long establishment in R with frequent updates.
- Recommended packages: KernSmooth or ASH

** **

##############################################

**18 Aug 2011, The last day (but by no mean the least)**

10:32am **Binomial regression** model by Merete Download abstract here

- Three differnt methods for extractor of residuals, unstandardized, standardized and studentized.
- Exact deletion residuals, new type of residual implementaed in binomTools
- approx.deletion (rstudent) residual function
- Parallel histrograms = explorative vertion of Hosmer-Lemeshow goodness-of-fit test (with fixed cut point)
- Half normal plot, uses absolute residual values but otherwise equivalent to a normal plot. Optimal simulated envelopes to support interpretation
- Profile likelihood from MASS package -> return and plot the profile likelihood root – nor the profile likelihood
- Misc = to grop binary or complete grouped fara based on a specified data, empirical area under ROC curve

##############################################

**6:12pm Interval before Conference Dinner!**

**5:33pm Missing Values in Principle Component Analysis (pca) by Julie Josse download abstract**

- Starting with… theoretical stuffs
- Overfitting issues from missing values -> fixed by shrinkage method
**Procedures**in misMDA package- Step1: Estimation of number of dimensions
- Step2: Imputation of missing values by ‘imputePCA’ function
- Step3: PCA on the completed data set, ‘MIPCA’ function

- Iterative PCA: single imputation method

- A unique alue cannot reflect the cariability of prediction

- MUltiple imputation: generating plausible values for each missing value - Supplementary projection via ‘plot()’ function

- Individual position (and variables) with other predictions - Between imputation variability too!

##############################################

**5:06pm Here is the HIGHLIGHT. John Fox! on Tests for Multivariate Linear Models with the car Package Download abstract**

- Discussion on fitting multivariate linear models (MLMs) in R with the lm function
- The anova function is flexible but calculating sequential (TypeI) test and performing other common tests, especially for repeat-measures designs, is relatively inconvenient.
- The Anova function (with a capital A) in car package (FOx and Weisberg, 2011) can perform partial (type II or type III) test for the terms in a multivariate linear model, including simply specified multivariate and univariate test for repeated-measures models.
- The linearHypothesis function in the car package can test arbitrary linear hypothesis for multivariate linear models, including models for repeated measures.
- Both the anova and linearHypothesis functions return a variety of information useful in further computation on multivariate linear model
- Now he’s demonstrating how to use ‘car’ package using the Anderson-Fisher Iris data

Correlation plot, basic box-plot > mod.iris <- lm(cbind(Sepal.Lenght, Sepal.Width, Petal.Length, Petal.Width) ~ Species, data=iris) > (monova.iris <- Anova(mod.iris)) Type II MONOVA Tests: ... > anova(mod.iris) gave exact result as default function

- Summary

>summary() - Also handling repeated measures = a single repeated-measure. it can be handled in anova function in R but it is simpler to get common tests from Anova and LinearHypothesis function in the car package.
- {21% of my MacBook battery now, please no blackout any time soon.
- 5 mins left in this presentation!
- It’s done! Q&A now.

##############################################

**4:47pm Multiple choice models: why not the same answer? A comparison among LIMDEP, R, SAS and Stata by Giuseppe Bruno Download Abstract**

- Similar to my presentation but focus on the application of R packages on Choice Model with other proprietary softwares and more technical!

##############################################

**4:35pm Regression Models for Ordinal Data: Introducing R-package ordinal by Rune Haubo B. Christensen Download Abstract**

- The package offers the regression model for ordinal data.
- Providing various standard model fit indices.
- Extends the basic model with scale effect, normal effect, random effects, structured thresholds.
- Future work. more flexible random effect structures and nested effects.

##############################################

**17 August 2011, 14:02pm. Invited talk on Modelling Three-dimensional surface in R by Adrian Bowman, University of Glasgow. **

- He is showing an application on people faces.
- Three-dimensional point graph he’s presenting is pretty much like a picture! Cool Stuff again.
- Now how to model such data.
- Face3D research consortium: http://www.face3d.ac.uk/wiki/index.php/The_Face3D_project

Breast surgery/reconstructionIdentifying breast boundary

->

-> Begin at the landmark which represent the most prominent point

-> Identifying breast boundary by the point of maximum curve.

-> Subsequent boundary points are now identified by rotation

-> Fit a principle curve ti the single point

-> Then Decomposing asymmetry - surfaces. The component can be also examined by an individual patient - Identifying curves

-> Surface curvature is one of the key issue in the area. We can measure the direction of the curve using this. - Change point detection. There are many approaches.
- He is now showing an example of curve identification of lips, tracking where the lips meet! Hence such a curve is changing dynamically and not linear.
- Disclaimer! Image application in R is not my cup of tea. So the note may looks weird!
- Principle component for faces now!
- Then, application in Orthognatic surgery. Comparing before and after!

- key issue id the prediction after the surgery

- Use CT scanning before the surgery then get the data of your face to predict what’s gonna happens after the surgery

- Taking some measure of uncertainty into the model too. - The last topic Magneto-encephagraphy (MEG)

- data could be very noisy in this case

- Showing a typical dipole topographies on a single dyad data

- Possible dipole? Result on a single trial experiment using dynamic and multiple colour looks nice!

- Result presented in term of both picture and also graph

- variation across trials -> All trails dipole

- A visualization tool in ‘rpanel’ package is a GUI one!

##############################################

**4:40pm Just finished my presentation. Relax time!**

4:41pm A presentation before me is very interesting. It’s about Inventory but also deal with Bullwhip Effect and Supply Chain Performance. Nice one. The package creator is also a PhD student from Brasil. Gotta tell my supervisor.

4:43pm Now in M02. Ortolani Millo is presenting “Integrating R and Excel for Automatic business Forecasting. It works as an add-in in Excel offering options to do the forecasting. ARIMA is in there too!

##############################################

**2:46pm Nomograms for visualising relationships between variables by Janathan Rougier
**

- He is showing how to use monogram by fitting a donkey hand-drawing picture
- See picture from David Smith http://yfrog.com/kgev6rvj
- using pynomo package see http://www.pynomo.org

##############################################

2:02pm: Design of Experiment (DoE) in R by Ulrike Grömping

- She is explaining Principles of DoE.

Block what you can and randomise what you cannot (Box, Hunter and Hunter 1978; 2005).

Randomisation: Balance out unknown influences. - DoE in R: What is there?

- Task Views, thanks to Achim Zeileis

- started February 2008

- currently contains 37 R packages related to DoE

- Main Purposes = Pointer to existing functionality and support synergies. avoid double work

- First package in 2000, conf.design (core) and roughly exponentially increase since 2004 - Key driver for her work on DoE in R

- Wanted free software solution for industrial experimentation

Most often-needed: fractional factorial 2-level designs (->FrF2)

- Also sometimes needed: orthogonal array - Mission

- Free researcher’s and experimenters’ brains

- From intricate mathematical and/or programming tasks

- For thinking about application problem - Package suite for industrial DoE in R

- ‘DoE.base

- FrF2

- DoE.wrapper. for wrapping existing functionality - DoE available in Rcmdr (John Fox) as Rcmdr.Plugin.DoE <- So now it seems not too difficult for me!
- Call for activities

- Make R cover a boarder range of DoE facilities

- Writer a package, or contribute functionality to an existing package

- Try to stay close to existing structures

##############################################

R Studio by J.J.Allaire

- RStudio = R coding Tool available on Window, MacOS X, and Linux and on the web
- Screenshots look very similar on any platform.
- Highlight = Extract function to re-run a chunk of code
- Conventional R history mechanism = save every command entered, searchable history, code navigation (in the next beta release)
- In 10 year it will be almost impossible to justify NOT using open source software.
- Future plan = make the capabilities of R more transparant and accessibility

##############################################

**9:58am Keynote by Brian D. Ripley**

- A brief Timeline
- Prehistory – 1997
- JCGS paper summitted Mar 1995
- The ealiest extant version seems to be Jun 1995 (456KB); 0.1 alpha (842KB)
- R 2.14.0 Oct 2011
- R 2.15.0 is scheduled in Mar 2012
- R 3.0.0 will be a Major change but no plan for this yet.

**CRAN**

- CRAN: 2 packages in 1997, >100 in 2001 an now ~3200 current packages
- ~80 successful submission per week to CRAN
- 10,000 current packages for Christmas 2016?
- Infrastructure provided by wu.ac.at and Stefan Theußl

CRAN was replaced by ‘repos’ and provided tool to suport other repositories in 2004, but rather few public repositories have emerged.

**The R Development Process**

- R is run by the active member of its core team. Meet in person only every couple of year.
- The day-to-day business is by email. 3 in NZ 1 India, 8 EU, 3 America

**How do features get into R?**

- R was principally developed for the benefit of the core team. Only they have votes.
- Most of what we have seen in R is there because core team members needed / wanted for e.g., research (esp. initially), teaching ( early 2000s), to develop R itself or to support other projets they were involve with.

**Internationalization**

- The core member are all native speaker if a Western european language which can be written in Latin-1.
- Japanese statisticians became interested in working in R.

**The Future**

- R is heavily dependent on a small group of altruistic people.
- They do feel that their contributions are not treated with respect.
- People needs to trust the decision of the core team.

**Trend prediction**

- Window will remain out-of-step with other OSec.
- The number of packages will grow inexorably. Whereas they provide a wonderfully comprehensive test suite, they also provide a formidable barrier to change.

##############################################

**8:45am Opening session now!**

Interesting numbers

440 participants

41 countries

342 EU, 60 N America, 16 Oceania, 13 Asia, 5 Central and South America, 4 Africa, 13 conference sponsors and exhibitor

Regardless of what style you prefer, the quality is something that really counts

when keeping your home safe and secure. Palladium style windows, if they match the decor of your house can really make an

great impression. If your garage door is old and worn out, a new

one can do a lot to improve the look of your home.