Posts from the ‘R’ Category

Oct 18

Econometrics with R – Part2

R นั้นเป็นโปรแกรมที่เหมาะที่จะวิเคราะห์ข้อมูล การป้อนข้อมูลเข้าวิเคราะห์ใน R นั้นจึงนิยมทำในโปรแกรมอื่นก่อน เช่น ใน spreadsheet (MS Excel) หรือ SPSS แล้วค่อยนำเข้า (Import) ข้อมูลดังกล่าวเข้ามาวิเคราห์ใน R โดยสามารถทำได้โดยใช้ command ด้านล่างนี้

1. จากไฟล์นามสกุล .csv

สมมุติว่าต้องการนำเข้าไฟล์ data.csv
โดยแถวแรกของข้อมูลนั้นเป็นชื่อตัวแปร

data <- read.table("c:/data.csv",
header=TRUE,
sep=",",
row.names="id")

2. จาก MS Excel

ให้ export ไฟล์ออกมาให้อยู่ในรูป .csv แล้วทำตามข้อที่ 1.

3. จาก SPSS หรือ PASW (IBM)

ให้ export ไฟล์ออกมาให้อยู่ในรูป .csv ก่อน
โดยใช้คำสั่ง “Save as” แล้วเลือกนามสกุลของไฟล์เป็น .csv
จากนั้นทำการนำเข้าข้อมูลไปในโปรแกรม R ตามข้อที่ 1

4. จาก Stata

library(foreign)
data <- read.dta("c:/data.dta")

ตัวอย่างการเลือกตัวแปรเพื่อสร้างฐานข้อมูลย่อย (sub_data) จากฐานข้อมูลหลัก (data)

โดยสามารถเลือกตัวแปรได้โดยการ คลิ๊ก ร่วมกับ Shift (เลือกช่วงตัวแปร) และ Ctrl (เลือกทีละตัว)

data <- data.frame(replicate(26,list(rnorm(5))))
names(data) <- LETTERS
sub_data <- data[select.list(names(data), multiple=TRUE)]

Source: Adapted from an answer of Josh O’Brien in Stackoverflow to the question from Jeromy Anglim

กลับสู่สารบัญ

Related posts

Oct 7

2 Comments

Econometrics with R – Part 1

Econometrics with R- Part 1

Why have I written this manual

I love R. R is a great software for statistical analysis and producing beautiful and reliable graphic from data. I initially learned to use R myself and found it was a steep learning curve to do it alone. However, I have been successfully used R for my research thanks to such an useRs community. Whatever and whenever I got a problem in R, I can find an answer (or usually more than one answer) available freely in the Internet or just emailing the package creator and get the answer with CODE in minutes. Hence I have owned the useRs a great deal. Once I believe that I can contribute back to the R community, I thought that writing my version of R manual (from the non-programmer & econometrician aspect) would make a small contribution to the R ecosystem. I am not the expert in R, but believe in its power and potentials for others for research or teaching. If you find any bugs or error found in this manual, please do not hesitate to let me know. Then I will correct it ASAP.

Why do econometricians should use R

Free
Anyone can use R without any financial cost.
Reliable
R has been accepted as a lingua franca for statistical analysis. It has been widely and intensively used by both academics and practitioners.
Available Window PC, Mac and Linux (with high compatibility)
ข้อนี้ผมชอบมากเพราะส่วนตัวใช้ Mac แต่ที่โรงเรียนเป็น window PC
การใช้ ทำให้ไม่ประสบปัญหาเรื่องการย้ายเครื่องคอมพิวเตอร์ในการวิเคราะห์ สามารถทำงานได้ทั้งบน Mac และ Window PC
Fresh and Up-to-date
เนื่องจากเป็น จึงมีผุ้ร่วมพัมนาโดยการเขียน Packages ซึ่งนักวิจัยและอาจารย์ที่มีชื่อเสียงหลายๆ ท่านก็จะเขียน Packages สำหรับเครื่องมือทางสถิติใหม่ๆ ซึ่งสามารถใช้ได้ฟรีเช่นกัน
Worldwide community of useRs
มีผู้ใช้งาน R มากมายทั่วโลก และหลายๆ คนก็เขียนคู่มือ ข้อแนะนำลงในอินเตอร์เน็ต หากมีปัญหาอะไร ลองค้นดูในอินเตอร์เน็ตกจะเจอคำตอบมากมาย นอกจากนี้ก็มี R-bloggers, Twitter (#rstats), stackoverflow ที่เราสามาาถติดต่อและถามคำถาม ผู้ใช้และผู้ที่พัฒนา R ได้อย่างง่ายดาย
Reproducibility
R มีการเก้บบันทึกคำสั่งในการวิเคราะห์ทำให้เราไม่ต้องจำว่ากดปุ่มไหนไปบ้าง ทำให้สามารถกลับไปดูการวิเคราะห์เก่าๆ และทำซ้ำได้อย่างง่ายดาย
Synergy with LaTeX
R สามารถใช้ร่วมกับ โปรแกรมการสร้่างเอกสารอย่างมืออาชีพอย่าง TeX หรือ LaTeX ผ่านการ Sweave ได้อย่างมีประสิทธิภาพ
สำหรับผู้ที่ต้องใช้การวิเคราะหืแบบเดิมๆ กับข้อมูลใหม่ที่มาอยู่เสมอ เช่น รายงานการเงินประจำปี นั้น การใช้ R จะทำให้ประหยัดเวลาได้เยอะมาก

Continue to Part 2

Other posts about R

Aug 16

5 Comments

My presentation at The R useR! Conference 2011

At useR! 2011, I talked about using R (with packages sem, lavaan and OpenMx) for Structural Equation Modeling by comparing to other commercial software i.e., AMOS, Lisrel and Mplus.

In this study, I compare R and other software by running the same model of ‘Transaction Costs in Supply Chain“.

Followings are the presentation slide, R codes and the abstract.

useR! 2011 slide

Click here to download the slide.

The R Codes

Abstract

View this document on Scribd

Aug 16

5 Comments

useR! 2011 Live Blog

The R UseR Conference 2011
University of Warwick, UK

15:59. useR! 2012 will be hosted by Vanderbilt University, Nashville, USA.

Pre-conference short course
Tutorial June 12
http://www.r-project.org/useR–2012/

15:02 FINALLY, the last invited talk by Simon Urbanek (At&T) on
“R Graphics: Supercharged – Recent Advances in Visualization and Analysis of Large Data in R“ Download Abstract here

85% of seats in the great lecturer hall are filled! >250 people
New feature in R graphic
Real demo n city map, how can city be more green. estimate the traffic
Integrated R graphic with the real map (like a google map). Looks nice.
Polygons with holes: polypath()
– regular polygon()s can create holes
Most recent features -> screen output control
so far there is no way to tell when to actually show graphic on the screen: now or only now???

Challenges

Data size increases
Large RAM (>100 GB) and CPU power is affordable
Visualisation need to keep up
– redering, game industry provides solutions: OpenGL + GPUs
– visualization method for large data
-> interactivity 9divided and conquer, shift of focus)
-> sufficient statistics, aggregation

Proposed solution
– Redering speed -> use OpenGL back-ends for R devices (qtdevice, iPlot extreme)
– Example is showing much quicker speed from 5 secs to just a sec

About iPlots
iPlots = interactive plot for data analysis- selection, highlighting, brushing …
– interactive change of plot parameter
– queries
– all essential plots ( scatterplot, bar charts, histograms, parallel coordination)
Demo now. The interactive seems nice, similar to the set of new iPods colour set.
iPlot (credit: Ashley Ford's Facebook)
Now we can ask the selected points in the graph by
>which(selected(p))
– DEMO: pca and now you can select those outliers and find what are they!
– DEMO: histogram with changing parameter and so does its graph

Conclusion

R: rasterImage(), polypath(), dev.hold/flush()
Large data requires fast graphics and interactivity
OpenGL graphics devices (idev(), qtdevice, …)
iPlot eXtreme: high performance interactive graphic.
Fast (C++, OpenGL: Interactivity on > 1 mino points)
Efficient (no copying, reference semantics)
Extensible (custom visuals, statistical objects, plots)
CRAN release as “ix” expected next month

References with Link

END! Q&A now

##############################################

12:15An algorithm for the computation of the power of Monte Carlo tests with guaranteed precision by Patrick Rubin-Delanchy. Download abstract here

Statistical formulation. Data = N observable stream.
Algorithm with finite time, an example of Permutation test for two Gaussian groups
They have other “trick” to reduce the effort: Choice of N, hypothesis test on remaining stream.

##############################################

11:57am. Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions by Taylor Arnold. Download abstract here

Deal with Komogorov-Smirnov Test
Discreet K-S test – not implement in any of the major statistical computing package. Hence they aim to do in R
Decided to require that discrete null dist be specified via class stepfun
After obtaining test statistics, the p-value must be calculated
Implementation is tedious but relatively straight forward
Using Discrete Cramér-von Mises Test

##############################################

11:31am. The benchden Package: Benchmark Densities for Nonparametric Density Estimation by Henrike Weinert (Inference Session) download abstract here

the package implements 28 benchmark densities
from stat package e.g., uniform, exponential
normal mixtures (Marronite, claw)
Support: compact = infinite peack, uniform scale mixture or sawtooth
Support: gaps e.g., Matterhorn, caliper and trimodal uniform
Support: half line e.g., Maxwell, Pareto and inverse exponential
Support: Real line e.g., logistic, double exponential
Simulation study to compare two different bandwidth selector for a kernel estimator

##############################################

11:15am. Density Estimation Packages in R by Henry Deng Download abstract here (Inference Session)

Review packages in CRAN on density function
Calculating speed, random sert of n normally distributed. ash is the fastest and pendensity is the slowest
Estimation accuracy using mean absolute error. varying results based on type of distribution and data point, interesting.
Additional idea, trade-off between speed and accuracy
Well-performing packages seems to have long establishment in R with frequent updates.
Recommended packages: KernSmooth or ASH

##############################################

18 Aug 2011, The last day (but by no mean the least)

10:32am Binomial regression model by Merete Download abstract here

Three differnt methods for extractor of residuals, unstandardized, standardized and studentized.
Exact deletion residuals, new type of residual implementaed in binomTools
approx.deletion (rstudent) residual function
Parallel histrograms = explorative vertion of Hosmer-Lemeshow goodness-of-fit test (with fixed cut point)
Half normal plot, uses absolute residual values but otherwise equivalent to a normal plot. Optimal simulated envelopes to support interpretation
Profile likelihood from MASS package -> return and plot the profile likelihood root – nor the profile likelihood
Misc = to grop binary or complete grouped fara based on a specified data, empirical area under ROC curve

##############################################

6:12pm Interval before Conference Dinner!

5:33pm Missing Values in Principle Component Analysis (pca) by Julie Josse download abstract

Starting with… theoretical stuffs
Overfitting issues from missing values -> fixed by shrinkage method
Procedures in misMDA package
Step1: Estimation of number of dimensions
Step2: Imputation of missing values by ‘imputePCA’ function
Step3: PCA on the completed data set, ‘MIPCA’ function
– Iterative PCA: single imputation method
– A unique alue cannot reflect the cariability of prediction
– MUltiple imputation: generating plausible values for each missing value
Supplementary projection via ‘plot()’ function
– Individual position (and variables) with other predictions
Between imputation variability too!

##############################################

5:06pm Here is the HIGHLIGHT. John Fox! on Tests for Multivariate Linear Models with the car Package Download abstract

Discussion on fitting multivariate linear models (MLMs) in R with the lm function
The anova function is flexible but calculating sequential (TypeI) test and performing other common tests, especially for repeat-measures designs, is relatively inconvenient.
The Anova function (with a capital A) in car package (FOx and Weisberg, 2011) can perform partial (type II or type III) test for the terms in a multivariate linear model, including simply specified multivariate and univariate test for repeated-measures models.
The linearHypothesis function in the car package can test arbitrary linear hypothesis for multivariate linear models, including models for repeated measures.
Both the anova and linearHypothesis functions return a variety of information useful in further computation on multivariate linear model
Now he’s demonstrating how to use ‘car’ package using the Anderson-Fisher Iris data

Correlation plot, basic box-plot
> mod.iris <- lm(cbind(Sepal.Lenght, Sepal.Width,
 Petal.Length, Petal.Width) ~ Species, data=iris)
> (monova.iris <- Anova(mod.iris))
Type II MONOVA Tests: ...
> anova(mod.iris)
gave exact result as default function

Summary
>summary()
Also handling repeated measures = a single repeated-measure. it can be handled in anova function in R but it is simpler to get common tests from Anova and LinearHypothesis function in the car package.
{21% of my MacBook battery now, please no blackout any time soon.
5 mins left in this presentation!
It’s done! Q&A now.

##############################################

4:47pm Multiple choice models: why not the same answer? A comparison among LIMDEP, R, SAS and Stata by Giuseppe Bruno Download Abstract

Similar to my presentation but focus on the application of R packages on Choice Model with other proprietary softwares and more technical!

##############################################

4:35pm Regression Models for Ordinal Data: Introducing R-package ordinal by Rune Haubo B. Christensen Download Abstract

The package offers the regression model for ordinal data.
Providing various standard model fit indices.
Extends the basic model with scale effect, normal effect, random effects, structured thresholds.
Future work. more flexible random effect structures and nested effects.

##############################################

17 August 2011, 14:02pm. Invited talk on Modelling Three-dimensional surface in R by Adrian Bowman, University of Glasgow.

He is showing an application on people faces.
Three-dimensional point graph he’s presenting is pretty much like a picture! Cool Stuff again.
Now how to model such data.
Face3D research consortium: http://www.face3d.ac.uk/wiki/index.php/The_Face3D_project
Breast surgery/reconstruction
-> Identifying breast boundary
-> Begin at the landmark which represent the most prominent point
-> Identifying breast boundary by the point of maximum curve.
-> Subsequent boundary points are now identified by rotation
-> Fit a principle curve ti the single point
-> Then Decomposing asymmetry – surfaces. The component can be also examined by an individual patient
Identifying curves
-> Surface curvature is one of the key issue in the area. We can measure the direction of the curve using this.
Change point detection. There are many approaches.
He is now showing an example of curve identification of lips, tracking where the lips meet! Hence such a curve is changing dynamically and not linear.
Disclaimer! Image application in R is not my cup of tea. So the note may looks weird!
Principle component for faces now!
Then, application in Orthognatic surgery. Comparing before and after!
– key issue id the prediction after the surgery
– Use CT scanning before the surgery then get the data of your face to predict what’s gonna happens after the surgery
– Taking some measure of uncertainty into the model too.
The last topic Magneto-encephagraphy (MEG)
– data could be very noisy in this case
– Showing a typical dipole topographies on a single dyad data
– Possible dipole? Result on a single trial experiment using dynamic and multiple colour looks nice!
– Result presented in term of both picture and also graph
– variation across trials -> All trails dipole
– A visualization tool in ‘rpanel’ package is a GUI one!

##############################################

4:40pm Just finished my presentation. Relax time!

4:41pm A presentation before me is very interesting. It’s about Inventory but also deal with Bullwhip Effect and Supply Chain Performance. Nice one. The package creator is also a PhD student from Brasil. Gotta tell my supervisor.

4:43pm Now in M02. Ortolani Millo is presenting “Integrating R and Excel for Automatic business Forecasting. It works as an add-in in Excel offering options to do the forecasting. ARIMA is in there too!

##############################################

2:46pm Nomograms for visualising relationships between variables by Janathan Rougier

He is showing how to use monogram by fitting a donkey hand-drawing picture
See picture from David Smith http://yfrog.com/kgev6rvj
using pynomo package see http://www.pynomo.org

##############################################

2:02pm: Design of Experiment (DoE) in R by Ulrike Grömping

She is explaining Principles of DoE.
Block what you can and randomise what you cannot (Box, Hunter and Hunter 1978; 2005).
Randomisation: Balance out unknown influences.
DoE in R: What is there?
– Task Views, thanks to Achim Zeileis
– started February 2008
– currently contains 37 R packages related to DoE
– Main Purposes = Pointer to existing functionality and support synergies. avoid double work
– First package in 2000, conf.design (core) and roughly exponentially increase since 2004
Key driver for her work on DoE in R
– Wanted free software solution for industrial experimentation
Most often-needed: fractional factorial 2-level designs (->FrF2)
– Also sometimes needed: orthogonal array
Mission
– Free researcher’s and experimenters’ brains
– From intricate mathematical and/or programming tasks
– For thinking about application problem
Package suite for industrial DoE in R
– ‘DoE.base
– FrF2
– DoE.wrapper. for wrapping existing functionality
DoE available in Rcmdr (John Fox) as Rcmdr.Plugin.DoE <- So now it seems not too difficult for me!
Call for activities
– Make R cover a boarder range of DoE facilities
– Writer a package, or contribute functionality to an existing package
– Try to stay close to existing structures

##############################################

R Studio by J.J.Allaire

RStudio = R coding Tool available on Window, MacOS X, and Linux and on the web
Screenshots look very similar on any platform.
Highlight = Extract function to re-run a chunk of code
Conventional R history mechanism = save every command entered, searchable history, code navigation (in the next beta release)
In 10 year it will be almost impossible to justify NOT using open source software.
Future plan = make the capabilities of R more transparant and accessibility

##############################################

9:58am Keynote by Brian D. Ripley

A brief Timeline
Prehistory – 1997
JCGS paper summitted Mar 1995
The ealiest extant version seems to be Jun 1995 (456KB); 0.1 alpha (842KB)
R 2.14.0 Oct 2011
R 2.15.0 is scheduled in Mar 2012
R 3.0.0 will be a Major change but no plan for this yet.

CRAN

CRAN: 2 packages in 1997, >100 in 2001 an now ~3200 current packages
~80 successful submission per week to CRAN
10,000 current packages for Christmas 2016?
Infrastructure provided by wu.ac.at and Stefan Theußl

CRAN was replaced by ‘repos’ and provided tool to suport other repositories in 2004, but rather few public repositories have emerged.

The R Development Process

R is run by the active member of its core team. Meet in person only every couple of year.
The day-to-day business is by email. 3 in NZ 1 India, 8 EU, 3 America

How do features get into R?

R was principally developed for the benefit of the core team. Only they have votes.
Most of what we have seen in R is there because core team members needed / wanted for e.g., research (esp. initially), teaching ( early 2000s), to develop R itself or to support other projets they were involve with.

Internationalization

The core member are all native speaker if a Western european language which can be written in Latin-1.
Japanese statisticians became interested in working in R.

The Future

R is heavily dependent on a small group of altruistic people.
They do feel that their contributions are not treated with respect.
People needs to trust the decision of the core team.

Trend prediction

Window will remain out-of-step with other OSec.
The number of packages will grow inexorably. Whereas they provide a wonderfully comprehensive test suite, they also provide a formidable barrier to change.

##############################################

8:45am Opening session now!

Interesting numbers
440 participants
41 countries
342 EU, 60 N America, 16 Oceania, 13 Asia, 5 Central and South America, 4 Africa, 13 conference sponsors and exhibitor

Aug 13

12 Comments

R packages for Structural Equation Model: SEM with R

Structural Equation Model (SEM) was first examined by a software called LISREL. Then, SEM has been mainly run by several proprietary software i.e., Mplus, AMOS, EQS, SAS and a new version of Stata (v.12).

However, you may also run SEM with a great but free software like R.

To the best of my knowledge, there are now four active packages that you can use to fit SEM. Here they are:

Main Packages (for fitting SEM models)

sem (John Fox, 2006):The first R package for SEM ” fit by maximum likelihood assuming multinormality, and single-equation estimation for observed-variable models by two-stage least.squares.” It was also the first package I tried to run SEM in R. Thanks to a very quick response from Prof.Fox on my question I emailed him.
See Example of ‘sem’ package here.
OpenMx (Boker et al, 2011)
A very active package that “is free and open source software for use with R that allows estimation of a wide variety of advanced multivariate statistical models.” contributed by experts in R and SEM.
See Example of ‘OpenMx’ package here.
lavaan (Yves Rosseel, 2012)
A promising package for SEM. Its command language is similar to those of Mplus. Hence it is perhaps the most user-friendly package for SEM to date.
See Example of ‘lavaan’ package here.
Link to JSS paper
semPLS (Armin Monecke, 2012)
Fitting Structural Equation Model Using Partial Least Squares
See: CRAN link, JSS paper
plspm (Gaston Sanchez, 2012)
R package dedicated to Partial Least Squares (PLS) methods (CRAN, plsmodeling.com)
by Gaston Sanchez and Laura Trinchera
A corresponding book titled “PLS Path Modeling with R” can be downloaded here.

My paper in useR! 2011 has evaluated R packages vs. Proprietary software i.e., AMOS & Lisrel.

Today (30 May 2012), I gladly found that there are also complementary packages for SEM in R as follows.

Complementary packages

SEMplusR: Functions, examples and datasets to learn, use and teach Structural Equation Modeling (SEM) [GitHub]
by Pairach Piboonrungroj
SEMModComp: Model Comparisons for SEM [CRAN link, Additional Documents]
by Roy Levy
semGOF: an add-on package which provides fourteen goodness-of-fit indeces for structural equation models using ‘sem’ package.[CRAN]
by Elena Bertossi
stremo: Functions to help the process of learning structural equation modelling [CRAN link]
by Gustavo Carvalho, Marco Batalha, and Owen Petchey
FIAR: Functional Integration Analysis in R [CRAN link]
by Bjorn Roelstraete
semTools: Useful tools for structural equation modeling [CRAN link]
by Sunthud Pornprasertmanit, Patrick Miller, Alex Schoemann, Yves Rosseel
simsem: SIMulated Structural Equation Modeling [CRAN link]
by Sunthud Pornprasertmanit, Patrick Miller, Alexander Schoemann
pathmox R package dedicated to segmentation trees in PLS Path Modeling [CRAN, plsmodeling.com]

Packages for SEM plotting and graphics

qgraph: Network representations of relationships in data [CRAN link]
by Sacha Epskamp, Angelique O. J. Cramer, Lourens J. Waldorp, Verena D. Schmittmann and Denny Borsboom
psych: Procedures for Psychological, Psychometric, and Personality Research [CRAN link]
by William Revelle

Packages that link R with other software to fit SEM

Mplus
Automating Mplus Model Estimation and Interpretation [CRAN link]
by Michael Hallquist
EQS
R/EQS Interface [CRAN link]
by Patrick Mair and Eric Wu

More external resources on SEM in R

CRAN Task view on ‘Structural Equation Models, Factor Analysis, PCA’ in Psychometrics [url]
by Patrick Mair
A tutorial on the use of sem package [url]
by William Revelle
A post on ‘Structural Equation Modeling in R‘ [url]
by Jeromy Anglim

Supply Chain Economics

Posts from the ‘R’ Category

Econometrics with R – Part2

Econometrics with R – Part 1

Why have I written this manual

Why do econometricians should use R

My presentation at The R useR! Conference 2011

useR! 2011 Live Blog

R packages for Structural Equation Model: SEM with R

Main Packages (for fitting SEM models)

Complementary packages

Packages for SEM plotting and graphics

Packages that link R with other software to fit SEM

More external resources on SEM in R

My Facebook

My Twitter

Email Subscription

Page view

Welcome to Pairach.com

Supply Chain Economics

Posts from the ‘R’ Category

Econometrics with R – Part2

Share this:

Econometrics with R – Part 1

Why have I written this manual

Why do econometricians should use R

Share this:

My presentation at The R useR! Conference 2011

Share this:

useR! 2011 Live Blog

Share this:

R packages for Structural Equation Model: SEM with R

Main Packages (for fitting SEM models)

Complementary packages

Packages for SEM plotting and graphics

Packages that link R with other software to fit SEM

More external resources on SEM in R

Share this:

My Facebook

My Twitter

Email Subscription

Page view

Welcome to Pairach.com