Create a lagged sum variable in Stata

UPDATE: this code is stupid. You can do this in a few lines.

Although Stata has powerful lag operators for both time series and panel settings, it is difficult to construct lagged sums of variables across an unbalanced panel.  Suppose you have T_i observations for i=1,\ldots,N individuals.   Each time period you observe some value like investments, hours worked, etc.   The example I discuss below simply counts the number of times an event has occurred up to time t for each observation.  Here an event is “IPO” where I need the total number of IPOs prior to $latex  t$.  First, I  construct an indicator for the event and generate a simple date variable that ignores the unbalanced nature of the dataset:

gen ipo = (outcome == "IPO")
sort panelid datevar
bysort panelid: gen date = _n
sum date
local total = r(max)
keep panelid date ipo

I save this as a temporary dataset for the unfortunate for loop:

save temp, replace

and reshape for lagged summing:

reshape wide ipo, i(panelid) j(date)

The following for loop exploits the wide nature of the reshaped data and computes a sum of the IPO variable as of date t:

forvalues x = 2(1)`total'{
if(`x'==2){
gen total_ipo2 = ipo1
}
else {
local minx = `x'-1
gen total_ipo`x' = total_ipo`minx' + ipo`minx'
}
}

Now have to reshape back

drop ipo1-ipo`total'
reshape long total_ipo, i(panelid) j(date)
drop if total_ipo == .
sort panelid date

The sums in the for loop above generate one too many observations at the end of each observations time series that we need to drop:

bysort panelid: gen n = _n
bysort panelid: gen nn = _N
drop if n == nn
drop n nn
keep panelid date total_*
sort panelid date

And it is done.  I am sure that there is a more elegant solution….