Posted in code on July 2nd, 2010 by Michael Ewens – 1 Comment
If you have a long R script and would like to be notified when computation is complete, follow these directions to get Growl notifications:
- Install Growl and leave the disk image open.
- Open Terminal.
- Run “cd /Volumes/Growl-1.2/Extras/growlnotify”
- Run “./install.sh”
Then follow these directions to get your R script to talk to Growl by adding something like:
system(paste("growlnotify -a R -t \"R is done\" -m", "\"Inserted the data\"", sep=""))
Posted in code on May 23rd, 2010 by Michael Ewens – 2 Comments
Stata has a ton of flexilbility for creating and manipulating dates. However, if you want to save Stata data to an external database (e.g. Accesss, PostgreSQL, MySQL, etc.) the numeric date format in Stata will be difficult to interpret outside the program. My use case involved working with Stata to merge and clean some data that was pushed to a MySQL database with odbc and later loaded in R with its odbc functionality. It was in the last step that I learned of Stata’s dating conventions when writing to external databases. So I asked the Stata list.
A very helpful subsciber presented a solution very similar to one that I mocked up. With some of the code posted on the Statalist and some new additions, I present odbc2create. UPDATE: I fixed an issue when dealing with a database with no dates and had to add a loop.
This modified odbc command does the following:
- searches all your variables for dates (they must be formatted as such or Stata’s ability to detect them is impossible)
- converts those dates to the YYYY-MM-DD format
- inserts the dates into your mysql database as strings
- re-types those date columns in the newly created database as DATEs
The best part: when you load a table created this way back into Stata, it immediately recognized the DATEs as dates. I hope they build this functionality into Stata in the future. One caveat (which may explain why they haven’t built it internally) is that the ALTER command in the ado file is specific to MySQL. Someone should generalize the code to recognize the datasource engine and modify the ALTER command accordingly.
Posted in code on April 9th, 2010 by Michael Ewens – 1 Comment
Posted in code on February 21st, 2010 by Michael Ewens – 2 Comments
Perhaps it is bad that I didn’t know this before, but the following code for Stata would have saved a week off of my dissertation work. Suppose that you have data structured like so:
firm_id,date,amount
and you want to create a new variable that is the total amount as of each date for each firm. In Stata, you simply type:
sort firm_id date
bysort firm_id: gen total_t = sum(amount)
Note the use of ‘gen‘ rather than ‘egen.’ The ‘sum’ command differs by the type of generate command (i.e. gen or egen), so about 500 lines of loops written in Stata code could be condensed in a few lines. Stata needs to fix the ‘egen’ and ‘gen’ distinction or I need to port more of my projects to R.
Posted in code on September 15th, 2009 by Michael Ewens – 2 Comments
Matlab has some great plotting tools, but the output of export of ”Save As” rarely produces consistent and clean results. Enter export_fig. The best feature is its anti-aliasing that produces clean, crisp fonts. Just download the package and add
add_path('export_fig')
to your m file and you can use the function. I had the best luck (see example below) with the follow command:
export_fig('figures/updates/risk_over_size.png', '-png', '-nocrop');

Posted in code on September 2nd, 2009 by Michael Ewens – Be the first to comment
Suppose you want to create a dynamic matrix of strings in Matlab. For example, you might want the legend of your graph to depend on the data (which changes on a daily basis). Cell arrays are your best bet. However, be warned on how to access the elements of said arrays. Suppose I have a cell array constructed as follows:
names = cell(3,2);
names(high_regime,: ) = [{'Probability of a home run'} {'Home Run'}];
names(low_regime, : ) = [{'Probability of bankruptcy'} {'Bankruptcy'}];
names(middle_regime, : ) = [{'Probability break-even'} {'Break-even'}];
If you want to access a particular element of this cell array as a string, you must use the curly brackets like so:
set(plot1(1),'LineStyle','-.','DisplayName',names{1,2});
If you try the standard names(1,2), the function set() will not think the result is a string.
Posted in code on May 19th, 2009 by Michael Ewens – 1 Comment
UPDATE: this code is stupid. You can do this in a few lines.
Although Stata has powerful lag operators for both time series and panel settings, it is difficult to construct lagged sums of variables across an unbalanced panel. Suppose you have
observations for
individuals. Each time period you observe some value like investments, hours worked, etc. The example I discuss below simply counts the number of times an event has occurred up to time
for each observation. Here an event is “IPO” where I need the total number of IPOs prior to $latex t$. First, I construct an indicator for the event and generate a simple date variable that ignores the unbalanced nature of the dataset:
gen ipo = (outcome == "IPO")
sort panelid datevar
bysort panelid: gen date = _n
sum date
local total = r(max)
keep panelid date ipo
I save this as a temporary dataset for the unfortunate for loop:
save temp, replace
and reshape for lagged summing:
reshape wide ipo, i(panelid) j(date)
The following for loop exploits the wide nature of the reshaped data and computes a sum of the IPO variable as of date
:
forvalues x = 2(1)`total'{
if(`x'==2){
gen total_ipo2 = ipo1
}
else {
local minx = `x'-1
gen total_ipo`x' = total_ipo`minx' + ipo`minx'
}
}
Now have to reshape back
drop ipo1-ipo`total'
reshape long total_ipo, i(panelid) j(date)
drop if total_ipo == .
sort panelid date
The sums in the for loop above generate one too many observations at the end of each observations time series that we need to drop:
bysort panelid: gen n = _n
bysort panelid: gen nn = _N
drop if n == nn
drop n nn
keep panelid date total_*
sort panelid date
And it is done. I am sure that there is a more elegant solution….
Posted in code on March 25th, 2009 by Michael Ewens – 1 Comment
Let x be your vector of data which has to be bootstrapped. For each instance of the loop write:
x = x(floor(rows(x)*rand(rows(x),1))+1,:); % bootstrap observations
Now we have a matrix with the same number of rows and columns, but with re-sampled data with replacement. A one line bootstrap! (Modified code from John Cochrane)
Posted in code on March 24th, 2009 by Michael Ewens – 1 Comment
If you have an FTP server set-up somewhere with ample space and bandwidth, Matlab can store and retrieve its files and data remotely. Just use these simple commands to connect and disconnect:
% File to connect to the server where my Matlab/data resides
% connect to the db
f = ftp('yourdomain.com', 'user', 'password');
% change the directory
cd(f, 'matlab');
% now change the directory that we want to download the directory
cd '/';
% Download the directory
mget(f, 'remote_dir');
% *****************
% INSERT PROGRAM HERE
% *****************
% when done move a directory here
cd ..
% now move the directory back to the server
mput(f, 'remote_dir');
disp('Files have been put back on the server');
% close the connect
close(f);
Posted in code on March 23rd, 2009 by Michael Ewens – 4 Comments
If you want to compare kernel density estimates across years for a particular variable, putting each estimate on one graph will make it easy. The process is fairly straightforward in Stata (and even easier in Matlab…). First, we start with the simple ‘kdensity‘ command
kdensity income if year == 1990
Next, we append this command with the ‘addplot‘ function:
kdensity income if year == 1990, addplot(kdensity income if year == 1991)
and we can add even more with the ‘||’ syntax:
kdensity income if year == 1990, addplot(kdensity income if year == 1991 || kdensity income if year == 1992)
If we could use the ‘by’ option, this process would be much cleaner. Finally, we add a legend:
kdensity income if year == 1990, addplot(kdensity income if year == 1991 || kdensity income if year == 1992) legend(ring(0) pos(2) label(1 "1990") label(2 "1991") label(3 "1992"))