Software | Pairach Piboonrungroj, PhD

Jun 18

Reproducible reports & research with knitr in R Studio

Arguably, knitr (CRAN link) is the most outstanding R package of this year and its creator, Yihui Xie is the star of the useR! conference 2012. This is because the ease of use comparing to Sweave for making reproducible report. Integration of knitR and R Studio has made reproducible research much more convenience, intuitive and easier to use.

R Studio: A user friendly and cross platform IDE for R

Screenshot of R Studio (Windows PC)

This post is an example, based on the demo by Yihui Xie himself, I will show how to create a reproducible report consisting the R code in a LaTeX style in the Cardiff R User Group session at the Cardiff Business School (CARBS) Research Fair tomorrow (19 June 2012).

knitr option for sweave document in R Studio

In the code

Lines 001-043 are just normal preamble syntax of the LaTeX code I took from the Template of useR! conference abstract.
Lines 044-96 are the R codes and descriptions.

A chuck of R code is wrapped in the following code:

<<chunk1, echo=TRUE, results='hide'>>=
Put your R code here
@

chunk1 is the name of the chuck

echo = TRUE to show your R code in the chuck, = FALSE if you do not want to show the code.

result = ‘markup’ to show the result unless = ‘hide’

The code of the whole document is as follow:

\documentclass[11pt, a4paper]{article}
\usepackage{amsfonts, amsmath, hanging, hyperref, natbib, parskip, times}
\hypersetup{
 colorlinks,
 linkcolor=blue,
 urlcolor=blue
}
\setlength{\topmargin}{-15mm}
\setlength{\oddsidemargin}{-2mm}
\setlength{\textwidth}{165mm}
\setlength{\textheight}{250mm}

\let\section=\subsubsection
\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\let\proglang=\textit
\let\code=\texttt
\renewcommand{\title}[1]{\begin{center}{\bf \LARGE #1}\end{center}}
\newcommand{\affiliations}{\footnotesize}
\newcommand{\keywords}{\paragraph{Keywords:}}

\begin{document}
\pagestyle{empty}

\title{Using knitR (R + \LaTeX) in R Studio: A Demo}

\begin{center}
 {\bf Pairach Piboonrungroj$^{1,2,^\star}$}
\end{center}

\begin{affiliations}
1. Logistics Systems Dynamics Group, Cardiff Business School, Cardiff University, United Kingdom \\[-2pt]
2. Chiang Mai School of Economics, Chiang Mai University, Thailand \\[-2pt]
%3. Second affiliation of author B \\[-2pt]
$^\star$Email: \href{mailto:me@pairach.com}{me@pairach.com}
\end{affiliations}
\vskip -0.5cm
%%%%%%%%%%%%%%%%%%
%Add Breaking Line
\begin{center}
\linethickness{1mm}
\line(1,0){480}
\end{center}
%%%%%%%%%%%%%%%%%%
1. Show only R source code
<<chunk1, echo=TRUE, results='hide'>>=
1 + 1
@

2. Show only output
<<chunk2, ref.label='chunk1', echo=FALSE, results='markup'>>=
@

3. Show both source code and output
<<chunk3, echo=TRUE, results='markup'>>=
1 + 1
@

4. Show source code in grey shade but the output
<<chunk4, echo=TRUE, results='asis'>>=
1 + 1
@

5. Now, testing a linear model
<<chunk5, echo=TRUE, results='markup'>>=
# generating value for x variable from 1 to 100
x <- c(1:100)
# creat error term
e <- rnorm(100, mean = 5, sd = 10000)
# computing y equal to 3 plus five times x plus random number
y = 10 + 100*x + e
@

Set the format of all object called pdf()
<<custom-dev2>>=
my_pdf = function(file, width, height) {pdf(file, width = 5, height = 5, pointsize = 10)}
@

6. See the scatter plot
<<chunk6, echo=TRUE, results='markup', dev='my_pdf', fig.ext='pdf'>>=
plot(x, y)
@

7. Let's build a linear model by regressing y on x
<<chunk7, echo=TRUE, results='markup'>>=
# creating a linear model by regressing y on x as 'lm1' object
lm1 <- lm(y ~ x)
# calling a summary of linear model result
summary(lm1)
@

8. Now we can create a post-hoc plots to check assumptions of regression
<<chunk8, echo=TRUE, results='markup', dev='my_pdf', fig.ext='pdf'>>=
# Creating post-hoc plot for lm1
par(mfrow=c(2,2))
plot(lm1)
@
\end{document}

And this is the Output

 

		View this document on Scribd

List of Free Online R Tutorials

According to the post on FREE online R tutorials from universities, I have received many email suggesting more and more tutorials. However some tutorials are not hosted in an academic institutes, so I decided to create this post to list such tutorials. If you know other tutorials, please kindly suggest me by email to me@pairach.com or post the link in the comment section.

A list of R tutorials, which are hosted in the webpages of academic institutes can be found here.

The tutorials are listed in no particular order but categorised by subjects and/or topics.

General guides

R Wiki Documentation about R contributed by the R community
Quick-R by Rob Kabacoff
R Programming by Wikibooks
How to use R by Wikiuniversity
R4stats – R for SAS and SPSS Users – R for Stata Users by Bob Muenchen
Programming in R by Vincent Zoonekynd
R programming for those coming from other languages by John D. Cook
Cookbook for R by Winston Chang (Not related to Paul Teetor’s R Cookbook!)
A short introduction to the R programming language by Leibniz-Rechenzentrum
The Guerilla Guide to R by Nikhil Gopal

Online Tutorial

Introduction to R by Data Camp
Introduction to statistical programming in R by Leada
R Coder by José Carlos

VDO/Audio tutorials

Twotorials More than 90 Two minute tutorials on several topics by Anthony Damico
The R-Podcast by Eric Nantz
R language for Statistical Computing by Sentiment Mining Research Center
A list of Videos on Data Analysis with R: Introductory, Intermediate, and Advanced Resources by Jeromy Anglim

Subject specific

Economics & Econometrics

Forecasting: principles and practice (Online Book) by Rob J Hyndman and George Athanasopoulos
Econometrics in R (CRAN) by Grant V. Farnsworth

Ecology

R in Ecology and Evolution
Ecology and Epidemiology in R by Various authors

Psychology

Using R for Personality Research – Introduction to R [slide] – R: Statistics for all of us [slide] by William Revelle
Learning R for Researchers in Psychology by Jeromy Anglim

Using R in/for Governments

Recently British government (by Office of National Statistics: ONS) just published their version of R manual for analysis of the government survey. The links to PDF and MS word versions of the manual including the R syntax are as below.

Note: The R syntax link is not working now. I am contacting the ONS, hope they will fix it soon.

The R Guide to ESDS Large-Scale Government Surveys
PDF, Word

For the US governemnt, there is an emerging awareness and recognition of the power of R in their Big Data Initiative. David Smith (Revolution Analytics) has summarised the application of R in the US governemnt in his post here.

Jun 14

1 Comment

How to post R code on WordPress blogs

Most WordPress BloggeRs are using this text highlight syntax, some are not.
I hope that this post would be a reference source for new WordPress BloggeRs for posting their R code on their blog posts.

According to an official guide by WordPress.com on “Posting Source Code“, To post R code in the WordPress.com, just wrap R code as follows (without “#” in both wrappers):

######################################################

[#sourcecode language="r"]
Your R code and comments
x <- rnorm(100)
y <- x + 10
[#/sourcecode]

######################################################

From above, before your R code put the command in line 1, or [#sourcecode language=”r”], but without #
Then, place your R code (line 2-4).
End the code box by put the command line as in line 5, but without # or “[/sourcecode]“

Then the code will appear as following.

Your R code and comments
x <- rnorm(100)
y <- x + 10

Moreover, more options can be configured to better describe the code efficiently.

autolinks (true/false)
TRUE: Makes all URLs in your posted code clickable.
Defaults: TRUE
collapse (true/false)
TRUE: The code box will be collapsed when the page loads, requiring the visitor to click to expand it.
Comment: Good for large code posts.
Defaults: False.
firstline (number)
Comments: Use this to change what number the line numbering starts at
Defaults = 1
gutter (true/false)
TRUE: Show the line numbering on the left hand side.
FALSE: The line numbering on the left side will be hidden.
Defaults = TRUE
highlight (comma-seperated list of numbers)
You can list the line numbers you want to be highlighted.
Example = “4,7,19″.
htmlscript (true/false)
TRUE: Any HTML/XML in your code will be highlighted.
Comment: This is useful when you are mixing code into HTML, such as PHP inside of HTML.
Defaults = FALSE (only work with certain code languages)
light (true/false)
TRUE: The gutter (line numbering) and toolbar (see below) will be hidden.
Comment: This is helpful when posting only one or two lines of code.
Defaults = FALSE
padlinenumbers (true/false/integer)
TRUE: Automatic padding
FALSE: No padding, and entering a number will force a specific amount of padding.
Comment: Allows you to control the line number padding.
toolbar (true/false)
FALSE: The toolbar containing the helpful buttons that appears when you hover over the code will not be shown.
Defaults = TRUE
wraplines (true/false)
FALSE: Line wrapping will be disabled. This will cause a horizontal scrollbar to appear for long lines of code.

If you are using WordPress.org, here is a plugin.

Update: I just found a nice post by William K. Morris on How to update your WordPress.com blog from R

Jun 8

8 Comments

EURO 2012 Forecast: Spain will beat Germany in the Final again!? predicted Economists

(Featureก Image from The New York Times)

As EURO 2012 (European Football Championship) will kick starts today (in seconds)!

Recently, there is a working paper from Faculty of Economics and Statistics, University of Innsbruck predicting the winner of the tournament to be Spain (again). Achim Zeileis (achim.zeileis@r-project.org), Christoph Leitner (christoph.leitner@wu.ac.at) and Kurt Hornik(kurt.hornik@wu.ac.at) used data from odds provided by 23 online book makers (as the experts’ opinion) in their simulation for each match from the the group round to the final, which is predicted to be Spain vs Germany. I think that they used in their study as well.

EURO 2012 winning probabilities from the bookmaker consensus rating.

The figure below show the probability that Team i will beat Team j, calculated by this formula;

Pr (Team i beat Team j) = (Ability of Team i) / (Ability of Team i / Ability of Team j)

Winning probabilities, that Team i will beat Team j, in pairwise comparisons of all EURO 2012 teams

As an Econometrician and a football fan, I really like this paper and wish I can replicate their work for the tournament related to my home country team, Thailand.

Probability for each team to survive in the EURO 2012 ,i.e., proceed from the group-phase to the quarter finals, semi-finals, the final and to win the tournament.

The paper details are as follows.

Achim Zeileis, Christoph Leitner, Kurt Hornik (2012). History Repeating: Spain Beats Germany in the EURO 2012 Final. Working Paper 2012-09, Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universität Innsbruck.

Abstract

Four years after the last European football championship (EURO) in Austria and Switzerland, the two finalists of the EURO 2008 – Spain and Germany – are again the clear favorites for the EURO 2012 in Poland and the Ukraine. Using a bookmaker consensus rating – obtained by aggregating winning odds from 23 online bookmakers – the forecast winning probability for Spain is 25.8% followed by Germany with 22.2%, while all other competitors have much lower winning probabilities (The Netherlands are in third place with a predicted 11.3%). Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, we can infer that the probability for a rematch between Spain and Germany in the final is 8.9% with the odds just slightly in favor of Spain for prevailing again in such a final (with a winning probability of 52.9%). Thus, one can conclude that – based on bookmakers’ expectations – it seems most likely that history repeats itself and Spain defends its European championship title against Germany. However, this outcome is by no means certain and many other courses of the tournament are not unlikely as will be presented here.

All forecasts are the result of an aggregation of quoted winning odds for each team in the EURO 2012: These are first adjusted for profit margins (“overrounds”), averaged on the log-odds scale, and then transformed back to winning probabilities. Moreover, team abilities (or strengths) are approximated by an “inverse” procedure of tournament simulations, yielding estimates of all pairwise probabilities (for matches between each pair of teams) as well as probabilities to proceed to the various stages of the tournament. This technique correctly predicted the EURO 2008 final (Leitner, Zeileis, Hornik 2008), with better results than other rating/forecast methods (Leitner, Zeileis, Hornik 2010a), and correctly predicted Spain as the 2010 FIFA World Champion (Leitner, Zeileis, Hornik 2010b). Compared to the EURO 2008 forecasts, there are many parallels but two notable differences: First, the gap between Spain/Germany and all remaining teams is much larger. Second, the odds for the predicted final were slightly in favor of Germany in 2008 whereas this year the situation is reversed.

Link in EconPapers [url]
Paper [pdf]
Presentation [pdf]
Interview of Achim Zeileis on EURO 2012 forecast in ORF: ZIB 24
(2012-06-06, 00:10) (in German, I guess)

Will the history repeat?

11 Jun 2012 Update: Diffuseprior Blog is updating the odd ratio on EURO 2012 weekly.

Supply Chain Economics

Posts from the ‘Software’ Category

Reproducible reports & research with knitr in R Studio

In the code

The code of the whole document is as follow:

And this is the Output

Related posts

List of Free Online R Tutorials

A list of R tutorials, which are hosted in the webpages of academic institutes can be found here.

General guides

Online Tutorial

VDO/Audio tutorials

Subject specific

Economics & Econometrics

Ecology

Psychology

Related posts

Using R in/for Governments

How to post R code on WordPress blogs

EURO 2012 Forecast: Spain will beat Germany in the Final again!? predicted Economists

Abstract

Will the history repeat?

My Facebook

My Twitter

Email Subscription

Page view

Welcome to Pairach.com

Supply Chain Economics

Posts from the ‘Software’ Category

Reproducible reports & research with knitr in R Studio

In the code

The code of the whole document is as follow:

And this is the Output

Related posts

Share this:

List of Free Online R Tutorials

A list of R tutorials, which are hosted in the webpages of academic institutes can be found here.

General guides

Online Tutorial

VDO/Audio tutorials

Subject specific

Economics & Econometrics

Ecology

Psychology

Related posts

Share this:

Using R in/for Governments

Share this:

How to post R code on WordPress blogs

Share this:

EURO 2012 Forecast: Spain will beat Germany in the Final again!? predicted Economists

Abstract

Will the history repeat?

Share this:

My Facebook

My Twitter

Email Subscription

Page view

Welcome to Pairach.com