Using Growl with R

Posted in code on July 2nd, 2010 by Michael Ewens – 1 Comment

If you have a long R script and would like to be notified when computation is complete, follow these directions to get Growl notifications:

  1. Install Growl and leave the disk image open.
  2. Open Terminal.
  3. Run “cd /Volumes/Growl-1.2/Extras/growlnotify”
  4. Run “./install.sh”

Then follow these directions to get your R script to talk to Growl by adding something like:
system(paste("growlnotify -a R -t \"R is done\" -m", "\"Inserted the data\"", sep=""))

Stata, ODBC and date columns

Posted in code on May 23rd, 2010 by Michael Ewens – 2 Comments

Stata has a ton of flexilbility for creating and manipulating dates. However, if you want to save Stata data to an external database (e.g. Accesss, PostgreSQL, MySQL, etc.) the numeric date format in Stata will be difficult to interpret outside the program. My use case involved working with Stata to merge and clean some data that was pushed to a MySQL database with odbc and later loaded in R with its odbc functionality. It was in the last step that I learned of Stata’s dating conventions when writing to external databases. So I asked the Stata list.

A very helpful subsciber presented a solution very similar to one that I mocked up. With some of the code posted on the Statalist and some new additions, I present odbc2create. UPDATE: I fixed an issue when dealing with a database with no dates and had to add a loop.

This modified odbc command does the following:

  • searches all your variables for dates (they must be formatted as such or Stata’s ability to detect them is impossible)
  • converts those dates to the YYYY-MM-DD format
  • inserts the dates into your mysql database as strings
  • re-types those date columns in the newly created database as DATEs

The best part: when you load a table created this way back into Stata, it immediately recognized the DATEs as dates.  I hope they build this functionality into Stata in the future.  One caveat (which may explain why they haven’t built it internally) is that the ALTER command in the ado file is specific to MySQL.  Someone should generalize the code to recognize the datasource engine and modify the ALTER command accordingly.

Latex regression and summary tables in Stata

Posted in code on April 9th, 2010 by Michael Ewens – 1 Comment

A clean, well organized latex table is difficult to build. If you do a lot of analysis in Stata, there are several tools to output latex tables of your regressions or summary statistics. These packages do not always work perfectly with the standard options. Below I present two example code snippets to produce a latex table of a set of regressions that includes a IV estimator and a summary statistics table that compares two groups in a database. Each uses the eststo package.

Regression Table with Multiple Equations and Stages

Here I run a couple of limited dependent variable models and a two-stage bivariate probit with an IV. The output isn’t perfect, but it works for pre-submission distribution.

The output looks like this:

Latex regression output from eststo

Summary statistics with a by variable

Next, consider summarizing the characteristics of two groups in your data. For example, I want to compare the age, number of boards seats and other features of venture capital spinoff founders to everyone else.

The output will look like this:

Found

The full details of the esttab, estpost and eststo have many more options and a lot of examples.

Running sums in Stata

Posted in code on February 21st, 2010 by Michael Ewens – 2 Comments

Perhaps it is bad that I didn’t know this before, but the following code for Stata would have saved a week off of my dissertation work.  Suppose that you have data structured like so:

firm_id,date,amount

and you want to create a new variable that is the total amount as of each date for each firm.  In Stata, you simply type:

sort firm_id date
bysort firm_id: gen total_t = sum(amount)

Note the use of ‘gen‘ rather than ‘egen.’ The ‘sum’ command differs by the type of generate command (i.e. gen or egen), so about 500 lines of loops written in Stata code could be condensed in a few lines.  Stata needs to fix the ‘egen’ and ‘gen’ distinction or I need to port more of my projects to R.

Map of jobs for economists

Posted in economics, visualization on October 31st, 2009 by Michael Ewens – 2 Comments

The AEA’s JOE postings present the near-population of jobs available for newly-minted economic PhDs.  I used the XML data available for download to create a mash-up of job locations on Google Maps.  I break the posting down into US full-time academic, international full-time academic and non-academic.  Here is how I create the maps: Screen shot 2009-10-31 at 9.05.55 PM

  1. Select the subset of the data you want (e.g. US academic) and download the XML file.
  2. Fix some validation errors: take out the “<” and “>” within the text of nodes (I use TextMate for this).
  3. Parse the XML file with a custom PHP script that creates a csv file with school, position, location and url to posting. Here is my simple script for the academic XML file.
  4. Save the csv file produced by the script in step 3 as an Excel spreadsheet (Google Docs doesn’t like csv’s).  Add a “Latitude” and “Longitude” column to the spreadsheet.
  5. Upload the Excel file to Google docs.
  6. Follow these directions to populate the latitude and longitude of each position+location.
  7. Publish the Google spreadsheet and save the unique id in the url that Google gives you.
  8. Sign up for a Google Maps API account.
  9. Follow these directions to produce a Google map of your postings.

UPDATE: This service may make this process a bit easier, produce cleaner maps and allow the incorporation of more information.

Maybe the AEA can follow these directions to produce these maps after this year.  Contact me with any suggestions or questions.

Publication quality images in Matlab

Posted in code on September 15th, 2009 by Michael Ewens – 2 Comments

Matlab has some great plotting tools, but the output of export of  ”Save As” rarely produces consistent and clean results. Enter export_fig. The best feature is its anti-aliasing that produces clean, crisp fonts. Just download the package and add

add_path('export_fig')

to your m file and you can use the function. I had the best luck (see example below) with the follow command:

export_fig('figures/updates/risk_over_size.png', '-png', '-nocrop');

selection_to_mixture

Cell arrays and strings in Matlab

Posted in code on September 2nd, 2009 by Michael Ewens – Be the first to comment

Suppose you want to create a dynamic matrix of strings in Matlab.  For example, you might want the legend of your graph to depend on the data (which changes on a daily basis).  Cell arrays are your best bet.  However, be warned on how to access the elements of said arrays.  Suppose I have a cell array constructed as follows:

names = cell(3,2);
names(high_regime,: ) = [{'Probability of a home run'} {'Home Run'}];
names(low_regime, : ) = [{'Probability of bankruptcy'} {'Bankruptcy'}];
names(middle_regime, : ) = [{'Probability break-even'} {'Break-even'}];

If you want to access a particular element of this cell array as a string, you must use the curly brackets like so:

set(plot1(1),'LineStyle','-.','DisplayName',names{1,2});

If you try the standard names(1,2), the function set() will not think the result is a string.

Create a dated log file in Matlab

Posted in Uncategorized on July 23rd, 2009 by Michael Ewens – 1 Comment

If you change your datasets or code a lot in Matlab, it is smart to keep track of the results over time (trust me….). The ‘diary’ function allows you to record all output of your scripts to a file. Append your main .m file with the following code to create a diary/log file that is uniquely dated to the time (to the minute) that you ran the script:

date_now = clock;
date_now = strcat(num2str(date_now(1)),'_',num2str(date_now(2)),'_', num2str(date_now(3)), num2str(date_now(4)), num2str(date_now(5)));
diary(strcat('log', date_now,'.log'));

Setting $PATH in Mac OS X

Posted in Uncategorized on June 21st, 2009 by Michael Ewens – Be the first to comment

The myriad of Google searches did not help me set a new PATH variable on my Mac. Here is how I did it:
1. cd to /etc
2. Edit the file ‘profile’
3. Append the path you want to the end of the /usr/local … line.

Stata ODBC and Mysql collations

Posted in Uncategorized on June 9th, 2009 by Michael Ewens – Be the first to comment

What I learned today

Make sure that you set your column collations to latin1_swedish_c if you want to load a mysql table not created by Stata.