<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Michael Ewens &#187; lag operators</title>
	<atom:link href="http://michaelewens.com/tag/lag-operators/feed/" rel="self" type="application/rss+xml" />
	<link>http://michaelewens.com</link>
	<description>Tepper School of Business, Carnegie Mellon Univ.</description>
	<lastBuildDate>Tue, 10 Jan 2012 19:00:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Create a lagged sum variable in Stata</title>
		<link>http://michaelewens.com/2009/05/19/create-a-lagged-sum-variable-in-stata/</link>
		<comments>http://michaelewens.com/2009/05/19/create-a-lagged-sum-variable-in-stata/#comments</comments>
		<pubDate>Tue, 19 May 2009 09:03:11 +0000</pubDate>
		<dc:creator>Michael Ewens</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[lag operators]]></category>
		<category><![CDATA[stata]]></category>

		<guid isPermaLink="false">http://michaelewens.com/?p=367</guid>
		<description><![CDATA[UPDATE: this code is stupid. You can do this in a few lines. Although Stata has powerful lag operators for both time series and panel settings, it is difficult to construct lagged sums of variables across an unbalanced panel.  Suppose you have observations for individuals.   Each time period you observe some value like investments, [...]]]></description>
			<content:encoded><![CDATA[<p>UPDATE: this code is stupid.  <a href="http://michaelewens.com/2010/02/21/running-sums-in-stata/">You can do this in a few lines</a>.</p>
<p>Although <a href="http://www.stata.com/help.cgi?tsvarlist">Stata has powerful lag operators</a> for both time series and panel settings, it is difficult to construct lagged sums of variables across an unbalanced panel.  Suppose you have <img src='http://s.wordpress.com/latex.php?latex=T_i&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='T_i' title='T_i' class='latex' /> observations for <img src='http://s.wordpress.com/latex.php?latex=i%3D1%2C%5Cldots%2CN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='i=1,\ldots,N' title='i=1,\ldots,N' class='latex' /> individuals.   Each time period you observe some value like investments, hours worked, etc.   The example I discuss below simply counts the number of times an event has occurred up to time <img src='http://s.wordpress.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t' title='t' class='latex' /> for each observation.  Here an event is &#8220;IPO&#8221; where I need the total number of IPOs prior to $latex  t$.  First, I  construct an indicator for the event and generate a simple date variable that ignores the unbalanced nature of the dataset:</p>
<p><code>gen ipo = (outcome == "IPO")<br />
sort panelid datevar<br />
bysort panelid: gen date = _n<br />
sum date<br />
local total = r(max)<br />
keep panelid date ipo</code></p>
<p>I save this as a temporary dataset for the unfortunate for loop:</p>
<p><code>save temp, replace</code></p>
<p>and reshape for lagged summing:</p>
<p><code>reshape wide ipo, i(panelid) j(date)</code></p>
<p>The following for loop exploits the wide nature of the reshaped data and computes a sum of the IPO variable as of date <img src='http://s.wordpress.com/latex.php?latex=t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='t' title='t' class='latex' />:</p>
<p><code>forvalues x = 2(1)`total'{<br />
if(`x'==2){<br />
gen total_ipo2 = ipo1<br />
}<br />
else {<br />
local minx = `x'-1<br />
gen total_ipo`x' = total_ipo`minx' + ipo`minx'<br />
}<br />
}</code></p>
<p>Now have to reshape back</p>
<p><code>drop ipo1-ipo`total'<br />
reshape long total_ipo, i(panelid) j(date)<br />
drop if total_ipo == .<br />
sort panelid date</code></p>
<p>The sums in the for loop above generate one too many observations at the end of each observations time series that we need to drop:</p>
<p><code>bysort panelid: gen n = _n<br />
bysort panelid: gen nn = _N<br />
drop if n == nn<br />
drop n nn<br />
keep panelid date total_*<br />
sort panelid date</code></p>
<p>And it is done.  I am sure that there is a more elegant solution&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://michaelewens.com/2009/05/19/create-a-lagged-sum-variable-in-stata/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

