R | Pairach Piboonrungroj, PhD

Jun 8

EURO 2012 Forecast: Spain will beat Germany in the Final again!? predicted Economists

(Featureก Image from The New York Times)

As EURO 2012 (European Football Championship) will kick starts today (in seconds)!

Recently, there is a working paper from Faculty of Economics and Statistics, University of Innsbruck predicting the winner of the tournament to be Spain (again). Achim Zeileis (achim.zeileis@r-project.org), Christoph Leitner (christoph.leitner@wu.ac.at) and Kurt Hornik(kurt.hornik@wu.ac.at) used data from odds provided by 23 online book makers (as the experts’ opinion) in their simulation for each match from the the group round to the final, which is predicted to be Spain vs Germany. I think that they used in their study as well.

EURO 2012 winning probabilities from the bookmaker consensus rating.

The figure below show the probability that Team i will beat Team j, calculated by this formula;

Pr (Team i beat Team j) = (Ability of Team i) / (Ability of Team i / Ability of Team j)

Winning probabilities, that Team i will beat Team j, in pairwise comparisons of all EURO 2012 teams

As an Econometrician and a football fan, I really like this paper and wish I can replicate their work for the tournament related to my home country team, Thailand.

Probability for each team to survive in the EURO 2012 ,i.e., proceed from the group-phase to the quarter finals, semi-finals, the final and to win the tournament.

The paper details are as follows.

Achim Zeileis, Christoph Leitner, Kurt Hornik (2012). History Repeating: Spain Beats Germany in the EURO 2012 Final. Working Paper 2012-09, Working Papers in Economics and Statistics, Research Platform Empirical and Experimental Economics, Universität Innsbruck.

Abstract

Four years after the last European football championship (EURO) in Austria and Switzerland, the two finalists of the EURO 2008 – Spain and Germany – are again the clear favorites for the EURO 2012 in Poland and the Ukraine. Using a bookmaker consensus rating – obtained by aggregating winning odds from 23 online bookmakers – the forecast winning probability for Spain is 25.8% followed by Germany with 22.2%, while all other competitors have much lower winning probabilities (The Netherlands are in third place with a predicted 11.3%). Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, we can infer that the probability for a rematch between Spain and Germany in the final is 8.9% with the odds just slightly in favor of Spain for prevailing again in such a final (with a winning probability of 52.9%). Thus, one can conclude that – based on bookmakers’ expectations – it seems most likely that history repeats itself and Spain defends its European championship title against Germany. However, this outcome is by no means certain and many other courses of the tournament are not unlikely as will be presented here.

All forecasts are the result of an aggregation of quoted winning odds for each team in the EURO 2012: These are first adjusted for profit margins (“overrounds”), averaged on the log-odds scale, and then transformed back to winning probabilities. Moreover, team abilities (or strengths) are approximated by an “inverse” procedure of tournament simulations, yielding estimates of all pairwise probabilities (for matches between each pair of teams) as well as probabilities to proceed to the various stages of the tournament. This technique correctly predicted the EURO 2008 final (Leitner, Zeileis, Hornik 2008), with better results than other rating/forecast methods (Leitner, Zeileis, Hornik 2010a), and correctly predicted Spain as the 2010 FIFA World Champion (Leitner, Zeileis, Hornik 2010b). Compared to the EURO 2008 forecasts, there are many parallels but two notable differences: First, the gap between Spain/Germany and all remaining teams is much larger. Second, the odds for the predicted final were slightly in favor of Germany in 2008 whereas this year the situation is reversed.

Link in EconPapers [url]
Paper [pdf]
Presentation [pdf]
Interview of Achim Zeileis on EURO 2012 forecast in ORF: ZIB 24
(2012-06-06, 00:10) (in German, I guess)

Will the history repeat?

11 Jun 2012 Update: Diffuseprior Blog is updating the odd ratio on EURO 2012 weekly.

Jun 6

10 Comments

R Style Guide

Having a systematic and consistent guide is important for programming, so does in R.

Here are some recommended style guides for programming in R.

Pick one of them and strictly follow it.

[update] There is a research showing that underscore allow programmers tracking the code easier. (link)

1. Google’s R Style Guide

Link to Google’s R Style Guide (official URL, pdf)
Providing 14 R Style Rules including filenames, identifiers, indentation, spacing etc.
“These rules were designed in collaboration with the entire R user community at Google”

2. Henrik Bengtsson’s R Coding Conventions

Link to R Coding Conventions
Developed since 2002. It’s version 0.9 (January 2009)
Please read this message in the introduction part before applying this coding convention.

” Please note that this document is under construction since mid October 2002 and should still be seen as a first rought draft. There is no well defined coding recommendations for the R language [1] and neither is there a de facto standard. This document will give some recommendations, which are very similar to the ones in the Java programming style [2][3], which have found to be helpful for both the developer as well as the end user of packages and functions written in R.“

3. Hadley Wickham’s Style Guide

Link to his R style guide at GitHib wiki of devtool package and had.co.nz
Mainly based on the Google’s style.

Here is what he mentions in the introduction.
“Good coding style is like using correct punctuation when writing: you can manage without it, but it sure makes things easier to read. As with punctuation, there are many possible variations, and the main thing is to be consistent. The following guide describes the style that I use – you don’t have to use it, but you need to have some consistent style that you do follow. My style is based on Google’s R style guide, with a few tweaks.Good style is important because while your code only has one author, it will usually have multiple readers, and when you know you will be working with multiple people on the same code, it’s a good idea to agree on a common style up-front.”

4. R Core’s coding standard

Official R coding standard in R Internals Manual

5. formatR package

Links to the package’s CRAN , GitHub site
One can also use Yihui’s formatR package to tidy R codes with function tidy.source().
Package description is as follows

“The formatR package was designed to reformat R code to improve readability; the main workhorse is the function tidy.source(). Features include:

long lines of code and comments are reorganized into appropriately shorter ones

spaces and indent are added where necessary

comments are preserved in most cases

the number of spaces to indent the code (i.e. tab width) can be specified (default is 4)

an else statement in a separate line without the leading } will be moved one line back

= as an assignment operator can be replaced with <-"

6. The State of Naming Conventions in R

by Rasmus Bååth in The R Journal (pdf)
(Added on 15 December 2012)

Testing R Markdown with R Studio and posting it on RPubs.com

Today R Studio has announced RPubs.com, a new platform to publish an R Markdown document in html format.

It seems that RStudio is a very fast growing IDE for not only data analysis in R but also for reproducible report in PDF and HTML using R and knitr package.

I then tried to create an R Markdown file with R Studio for my recent data visualisation for Thai tourist data.

And here is my first published HTML report using R Markdown with RStudio.

The following is the code in the R Markdown file.


Visualising International Tourist Arrival
========================================================

by **Pairach Piboonrungroj**
email: <me@pairach.com>
twitter: [@piboonrungroj](https://twitter.com/#!/Piboonrungroj)

This is a test for an R Markdown document using my analysis of *tourist data of Thailand*.
You may see the original post in [my website](https://pairach.com/2012/05/31/using-r-to-analysis-tourism-data-1/)

Tourism is an important sector in the global economy. In many countries, tourism is the main source of revenue, Thailand is one of them. However, tourism sector is a fast moving sector. It is very sensitive to various factors and also vulnerable. The tourism markets for each destination (country) are also very diverse. Tourism data are available and updated frequently. One of the most important report of national tourism statistics; number of tourist arrivals from each country of origin, their average length of stay and total receipt or expenditure. These tourism statistics are important but often reported separately due to the limitation of software used by analysts.

The following are the steps to produce a comprehensive profile of international tourists in Thailand in 2005 with ggplot2 package in R.

### 1: Import data into R
```{r}
exp05 <- read.csv("http://dl.dropbox.com/u/46344142/thai_tour_2005.csv", head = T)
```

### 2: Load 'ggplot2' package for plotting elegant data visualisation
```{r}
library(ggplot2)
```

### 3: Specify x and y axis, label, size of the bubbles and colour of the region
```{r fig.width=7, fig.height=6}
exp <- ggplot(exp05, aes(x=number, y=length, label=country, size=receipt, colour = region))
```

### 4: Create a plot and add texts to x and y axis
```{r fig.width=11, fig.height=6}
exp + geom_point() + geom_text(hjust=0.7, vjust=2) + labs(x = "Number of Tourist Arrivals", y = "Length of Stay (days)") + scale_area("Receipt (M. USD)") + scale_colour_hue("Region")
```

Jun 3

3 Comments

Resources for Structural Equation Model (SEM)

This post lists some SEM resources available to learn online.
As I am adding and updating the list, if you know more useful resource for SEM please leave then in the comments., Thank you 🙂

Tutorials

SEM tutorial
by David A. Kenney
Wikia-Psychology
Structural Equation Modeling including steps in performing SEM
An Introduction to Structural Equation Modeling for Ecology and Evolutionary Biology
by Jjarrett Byrnes

Software to fit SEM

List of R packages for Structural Equation Model [url]
SEM
– Uses of packages in R (sem, OpenMx)
Edinburgh R user group

Miscellaneous

Prof. Karl Joreskog’s story by David Burns [url]

May 31

2 Comments

Using R to Analyse Tourism Data – Part 1: Visualising Tourist Profile

Tourism is an important sector in the global economy. In many countries, tourism is the main source of revenue, Thailand is one of them. However, tourism sector is a fast moving sector. It is very sensitive to various factors and also vulnerable. The tourism markets for each destination (country) are also very diverse. Tourism data are available and updated frequently. One of the most important report of national tourism statistics; number of tourist arrivals from each country of origin, their average length of stay and total receipt or expenditure. These tourism statistics are important but often reported separately due to the limitation of software used by analysts.

The following graph represents profile of international tourists in Thailand in 2005.

The picture above was produced in R with package ‘ggplot2’ using the code below.


# Step 1: Import data into R
exp05 <- read.csv("http://dl.dropbox.com/u/46344142/thai_tour_2005.csv", head = T)
# Step 2: Load 'ggplot2' package for plotting elegent data visualisation
library(ggplot2)
# Step 3: Specify x and y axis, label, size of the bubbles and colour of the region
exp <- ggplot(exp05, aes(x=number, y=length, label=country, size=receipt, colour = region))
# Step 4: Create a plot and add texts to x and y axis
exp + geom_point() + geom_text(hjust=0.7, vjust=2) + labs(x = "Number of Tourist Arrivals", y = "Length of Stay (days)") + scale_area("Receipt (M. USD)") + scale_colour_hue("Region")

Supply Chain Economics

Posts from the ‘R’ Category

EURO 2012 Forecast: Spain will beat Germany in the Final again!? predicted Economists

Abstract

Will the history repeat?

R Style Guide

1. Google’s R Style Guide

2. Henrik Bengtsson’s R Coding Conventions

3. Hadley Wickham’s Style Guide

4. R Core’s coding standard

5. formatR package

6. The State of Naming Conventions in R

More posts about R

Testing R Markdown with R Studio and posting it on RPubs.com

Resources for Structural Equation Model (SEM)

Tutorials

Software to fit SEM

Miscellaneous

Using R to Analyse Tourism Data – Part 1: Visualising Tourist Profile

My Facebook

My Twitter

Email Subscription

Page view

Welcome to Pairach.com

Supply Chain Economics

Posts from the ‘R’ Category

EURO 2012 Forecast: Spain will beat Germany in the Final again!? predicted Economists

Abstract

Will the history repeat?

Share this:

R Style Guide

1. Google’s R Style Guide

2. Henrik Bengtsson’s R Coding Conventions

3. Hadley Wickham’s Style Guide

4. R Core’s coding standard

5. formatR package

6. The State of Naming Conventions in R

More posts about R

Share this:

Testing R Markdown with R Studio and posting it on RPubs.com

Share this:

Resources for Structural Equation Model (SEM)

Tutorials

Software to fit SEM

Miscellaneous

Share this:

Using R to Analyse Tourism Data – Part 1: Visualising Tourist Profile

Share this:

My Facebook

My Twitter

Email Subscription

Page view

Welcome to Pairach.com