Skip to content

Posts tagged ‘useR 2012’

useR! 2011 Live Blog


The R UseR Conference 2011
University of Warwick, UK

15:59. useR! 2012 will be hosted by Vanderbilt University, Nashville, USA.

15:02 FINALLY, the last invited talk by Simon Urbanek (At&T) on
R Graphics: Supercharged – Recent Advances in Visualization and Analysis of Large Data in R Download Abstract here

  • 85% of seats in the great lecturer hall are filled! >250 people
  • New feature in R graphic
  • Real demo n city map, how can city be more green. estimate the traffic
  • Integrated R graphic with the real map (like a google map). Looks nice.
  • Polygons with holes: polypath()
    – regular polygon()s can create holes
  • Most recent features -> screen output control
  • so far there is no way to tell when to  actually show graphic on the screen: now or only now???

Challenges

  • Data size increases
  • Large RAM (>100 GB) and CPU power is affordable
  • Visualisation need to keep up
    – redering, game industry provides solutions: OpenGL + GPUs
    – visualization method for large data
    -> interactivity 9divided and conquer, shift of focus)
    -> sufficient statistics, aggregation

Proposed solution
– Redering speed -> use OpenGL back-ends for R devices (qtdevice, iPlot extreme)
– Example is showing much quicker speed from 5 secs to just a sec

  • About iPlots
  • iPlots = interactive plot for data analysis- selection, highlighting, brushing …
    – interactive change of plot parameter
    – queries
    – all essential plots ( scatterplot, bar charts, histograms, parallel coordination)
  • Demo now. The interactive seems nice, similar to the set of new iPods colour set.

    iPlot (credit: Ashley Ford's Facebook)

  • Now we can ask the selected points in the graph by
    >which(selected(p))
    – DEMO: pca and now you can select those outliers and find what are they!
    – DEMO: histogram with changing parameter and so does its graph

Conclusion

  • R: rasterImage(), polypath(), dev.hold/flush()
  • Large data requires fast graphics and interactivity
  • OpenGL graphics devices (idev(), qtdevice, …)
  • iPlot eXtreme: high performance interactive graphic.
  • Fast (C++, OpenGL: Interactivity on > 1 mino points)
  • Efficient (no copying, reference semantics)
  • Extensible (custom visuals, statistical objects, plots)
  • CRAN release as “ix” expected next month

References with Link

END! Q&A now

##############################################

12:15An algorithm for the computation of the power of Monte Carlo tests with guaranteed precision by Patrick Rubin-Delanchy. Download abstract here

  • Statistical formulation. Data = N observable stream.
  • Algorithm with finite time, an example of Permutation test for two Gaussian groups
  • They have other “trick” to reduce the effort: Choice of N, hypothesis test on remaining stream.

##############################################

11:57am. Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions by Taylor Arnold
. Download abstract here

  • Deal with Komogorov-Smirnov Test
  • Discreet K-S test – not implement in any of the major statistical computing package. Hence they aim to do in R
  • Decided to require that discrete null dist be specified via class stepfun
  • After obtaining test statistics, the p-value must be calculated
  • Implementation is tedious but relatively straight forward
  • Using Discrete Cramér-von Mises Test

##############################################

11:31am. The benchden Package: Benchmark Densities for Nonparametric Density Estimation by Henrike Weinert (Inference Session) download abstract here

  • the package implements 28 benchmark densities
  • from stat package e.g., uniform, exponential
  • normal mixtures (Marronite, claw)
  • Support: compact = infinite peack, uniform scale mixture or sawtooth
  • Support: gaps e.g., Matterhorn, caliper and trimodal uniform
  • Support: half line e.g., Maxwell, Pareto and inverse exponential
  • Support: Real line e.g., logistic, double exponential
  • Simulation study to compare two different bandwidth selector for a kernel estimator

##############################################

 11:15am. Density Estimation Packages in R by Henry Deng Download abstract here (Inference Session) 

  • Review packages in CRAN on density function
  • Calculating speed, random sert of n normally distributed. ash is the fastest and pendensity is the slowest
  • Estimation accuracy using mean absolute error. varying results based on type of distribution and data point, interesting.
  • Additional idea, trade-off between speed and accuracy
  • Well-performing packages seems to have long establishment in R with frequent updates.
  • Recommended packages: KernSmooth or ASH    

 

##############################################

18 Aug 2011, The last day (but by no mean the least)

10:32am Binomial regression model by Merete Download abstract here

  • Three differnt methods for extractor of residuals, unstandardized, standardized and studentized.
  • Exact deletion residuals, new type of residual implementaed in binomTools
  • approx.deletion (rstudent) residual function
  • Parallel histrograms = explorative vertion of Hosmer-Lemeshow goodness-of-fit test (with fixed cut point)
  • Half normal plot, uses absolute residual values but otherwise equivalent to a normal plot. Optimal simulated envelopes to support interpretation
  • Profile likelihood from MASS package -> return and plot the profile likelihood root – nor the profile likelihood
  • Misc = to grop binary or complete grouped fara based on a specified data, empirical area under ROC curve

##############################################

6:12pm Interval before Conference Dinner!

5:33pm Missing Values in Principle Component Analysis (pca) by Julie Josse download abstract

  • Starting with… theoretical stuffs
  • Overfitting issues from missing values -> fixed by shrinkage method
  • Procedures in misMDA package
  • Step1: Estimation of number of dimensions
  • Step2: Imputation of missing values by ‘imputePCA’ function
  • Step3: PCA on the completed data set, ‘MIPCA’ function
    – Iterative PCA: single imputation method
    – A unique alue cannot reflect the cariability of prediction
    – MUltiple imputation: generating plausible values for each missing value
  • Supplementary projection via ‘plot()’ function
    – Individual position (and variables) with other predictions
  • Between imputation variability too!

##############################################

5:06pm Here is the HIGHLIGHT. John Fox! on Tests for Multivariate Linear Models with the car Package Download abstract

    • Discussion on fitting multivariate linear models (MLMs) in R with the lm function
    • The anova function is flexible but calculating sequential (TypeI) test and performing other common tests, especially for repeat-measures designs, is relatively inconvenient.
    • The Anova function (with a capital A) in car package (FOx and Weisberg, 2011) can perform partial (type II or type III) test for the terms in a multivariate linear model, including simply specified multivariate and univariate  test for repeated-measures models.
    • The linearHypothesis function in the car package can test arbitrary linear hypothesis for multivariate linear models, including models for repeated measures.
    • Both the anova and linearHypothesis functions return a variety of information useful in further computation on multivariate linear model
    • Now he’s demonstrating how to use ‘car’ package using the Anderson-Fisher Iris data
Correlation plot, basic box-plot
> mod.iris <- lm(cbind(Sepal.Lenght, Sepal.Width,
 Petal.Length, Petal.Width) ~ Species, data=iris)
> (monova.iris <- Anova(mod.iris))
Type II MONOVA Tests: ...
> anova(mod.iris)
gave exact result as default function
  • Summary
    >summary()
  • Also handling repeated measures = a single repeated-measure. it can be handled in anova function in R but it is simpler to get common tests from Anova and LinearHypothesis function in the car package.
  • {21% of my MacBook battery now, please no blackout any time soon.
  • 5 mins left in this presentation!
  • It’s done! Q&A now.

##############################################

4:47pm Multiple choice models: why not the same answer? A comparison among LIMDEP, R, SAS and Stata by Giuseppe Bruno Download Abstract

  • Similar to my presentation but focus on the application of R packages on Choice Model with other proprietary softwares and more technical!

##############################################

4:35pm Regression Models for Ordinal Data: Introducing R-package ordinal by Rune Haubo B. Christensen Download Abstract

  • The package offers the regression model for ordinal data.
  • Providing various standard model fit indices.
  • Extends the basic model with scale effect, normal effect, random effects, structured thresholds.
  • Future work. more flexible random effect structures and nested effects.

##############################################

17 August 2011, 14:02pm. Invited talk on Modelling Three-dimensional surface in R by Adrian Bowman, University of Glasgow.  

  • He is showing an application on people faces.
  • Three-dimensional point graph he’s presenting is pretty much like a picture! Cool Stuff again.
  • Now how to model such data.
  • Face3D research consortium: http://www.face3d.ac.uk/wiki/index.php/The_Face3D_project
    Breast surgery/reconstruction
    -> 
    Identifying breast boundary
    -> Begin at the landmark which represent the most prominent point
    -> Identifying breast boundary by the point of maximum curve.
    -> Subsequent boundary points are now identified by rotation
    -> Fit a principle curve ti the single point
    -> Then Decomposing asymmetry – surfaces. The component can be also examined by an individual patient
  • Identifying curves
    -> Surface curvature is one of the key issue in the area. We can measure the direction of the curve using this.
  • Change point detection. There are many approaches.
  • He is now showing an example of curve identification of lips, tracking where the lips meet! Hence such a curve is changing dynamically and not linear. 
  • Disclaimer! Image application in R is not my cup of tea. So the note may looks weird!
  • Principle component for faces now!
  • Then, application in Orthognatic surgery. Comparing before and after!
    – key issue id the prediction after the surgery
    – Use CT scanning before the surgery then get the data of your face to predict what’s gonna happens after the surgery
    – Taking some measure of uncertainty into the model too.
  • The last topic Magneto-encephagraphy (MEG)
    – data could be very noisy in this case
    – Showing a typical dipole topographies  on a single dyad data
    – Possible dipole? Result on a single trial experiment using dynamic and multiple colour looks nice!
    – Result presented in  term of both picture and also graph
    – variation across trials -> All trails dipole
    – A visualization tool in ‘rpanel’ package is a GUI one!

##############################################

4:40pm Just finished my presentation. Relax time!

4:41pm A presentation before me is very interesting. It’s about Inventory but also deal with Bullwhip Effect and Supply Chain Performance. Nice one. The package creator is also a PhD student from Brasil. Gotta tell my supervisor.

4:43pm Now in M02. Ortolani Millo is presenting “Integrating R and Excel for Automatic business Forecasting. It works as an add-in in Excel offering options to do the forecasting. ARIMA is in there too!

##############################################

2:46pm Nomograms for visualising relationships between variables by Janathan Rougier

  • He is showing how to use monogram by fitting a  donkey hand-drawing picture
  • See picture from David Smith http://yfrog.com/kgev6rvj
  • using pynomo package see http://www.pynomo.org

##############################################


2:02pm: Design of Experiment (DoE) in R by  Ulrike Grömping

  •  She is explaining Principles of DoE.
    Block what you can and randomise what you cannot (Box, Hunter and Hunter 1978; 2005).
    Randomisation: Balance out unknown influences.
  • DoE in R: What is there?
    – Task Views, thanks to Achim Zeileis
    – started February 2008
    – currently contains 37 R packages related to DoE
    – Main Purposes = Pointer to existing functionality and support synergies. avoid double work
    – First package in 2000, conf.design (core) and roughly exponentially increase since 2004
  • Key driver for her work on DoE in R
    – Wanted free software solution for industrial experimentation
    Most often-needed: fractional factorial 2-level designs (->FrF2)
    – Also sometimes needed: orthogonal array
  • Mission
    – Free researcher’s and experimenters’ brains
    – From intricate mathematical and/or programming tasks
    – For thinking about application problem
  • Package suite for industrial DoE in R
    – ‘DoE.base
    – FrF2
    – DoE.wrapper. for wrapping existing functionality
  • DoE available in Rcmdr (John Fox) as Rcmdr.Plugin.DoE <- So now it seems not too difficult for me!
  • Call for activities
    – Make R cover a boarder range of DoE facilities
    – Writer a package, or contribute functionality to an existing package
    – Try to stay close to existing structures

##############################################


R Studio by J.J.Allaire

  • RStudio =  R coding Tool available on Window, MacOS X, and Linux and on the web
  • Screenshots look very similar on any platform.
  • Highlight = Extract function to re-run a chunk of code
  • Conventional R history mechanism = save every command entered, searchable history, code navigation (in the next beta release)
  • In 10 year it will be almost impossible to justify NOT using open source software.
  • Future plan = make the capabilities of R more transparant and accessibility

##############################################

9:58am Keynote by Brian D. Ripley

  • A brief Timeline
  • Prehistory – 1997
  • JCGS paper summitted Mar 1995
  • The ealiest extant version seems to be Jun 1995 (456KB); 0.1 alpha (842KB)
  • R 2.14.0 Oct 2011
  • R 2.15.0 is scheduled in Mar 2012
  • R 3.0.0 will be a Major change but no plan for this yet.

CRAN

  • CRAN: 2 packages in 1997, >100 in 2001 an now ~3200 current packages
  • ~80 successful submission per week to CRAN
  • 10,000 current packages for Christmas 2016?
  • Infrastructure provided by wu.ac.at and Stefan Theußl

CRAN was replaced by ‘repos’ and provided tool to suport other repositories in 2004, but rather few public repositories have emerged.

The R Development Process

  • R is run by the active member of its core team. Meet in person only every couple of year.
  • The day-to-day business is by email. 3 in NZ 1 India, 8 EU, 3 America

How do features get into R?

  • R was principally developed for the benefit of the core team. Only they have votes.
  • Most of what we have seen in R is there because core team members needed / wanted for e.g., research (esp. initially), teaching ( early 2000s), to develop R itself or to support other projets they were involve with.

Internationalization

  • The core member are all native speaker if  a Western european language which can be written in Latin-1.
  • Japanese statisticians became interested in working in R.

The Future

  • R is heavily dependent on a small group of altruistic people.
  • They do feel that their contributions are not treated with respect.
  • People needs to trust the decision of the core team.

Trend prediction

  • Window will remain out-of-step with other OSec.
  • The number of packages will grow inexorably. Whereas they provide a wonderfully comprehensive test suite, they also provide a formidable barrier to change.

##############################################

8:45am Opening session now!

Interesting numbers
440 participants
41 countries
342 EU, 60 N America, 16 Oceania, 13 Asia, 5 Central and South America, 4  Africa, 13 conference sponsors and exhibitor

%d bloggers like this: