Data and code

Data and code

Here are the data and code releases from my research and other projectsMy Github repositories have the same listings as below.

Intangible valuations from Purchase Price Allocations (Ewens, Peters and Wang, 2023)

This data (Github) is the baseline of acquired intangible valuations from Ewens, Peters, and Wang (2023). When a public firm acquires a significant enough target, it must disclose the assets and liabilities they acquire. In this purchase price allocation (PPA), one can see a collection of intangible assets in a 10-K. In our paper, we collected and parsed these manually.

Citation: Ewens, M., Peters, R.H. and Wang, S., 2023. Measuring intangible capital with market prices”. Forthcoming, Management Science.

Intangible Capital Stocks (Ewens, Peters, and Wang, 2023) 

This repository (Github) contains the parameter estimates for intangible capital accumulation and estimated knowledge and organization capital stocks from Ewens, Peters and Wang (2023) work “Measuring Intangible Capital with Market Prices.” If you need estimated intangibles for publicly-traded firms, then we show that our stocks out-perform those using alternative parameter estimates and cover many more industries. 

Citation: Ewens, M., Peters, R.H. and Wang, S., 2023. Measuring intangible capital with market prices”. Forthcoming, Management Science.

Non-Compete Law Changes and Court Decisions (Ewens, and Marx, 2023) 

Ewens and Marx (2017) build a series of state-level law changes in the U.S. from 1995-2016 to study the impact of founder replacement on startup outcomes. The data and code (Github) below allow others to incorporate the law changes into their research.

Citation: Ewens, Michael, and Matt Marx. “Founder replacement and startup performance.” The Review of Financial Studies 31.4 (2017): 1532-1565.

Mapping SDC to Compustat, 1996-2023 (Ewens, Peters, and Wang, 2024) 

A mapping file between SDC’s “sdc_dealno” to “gvkey” (Github).  Connecting SDC’s M&A database to Compustat requires gvkey’s for both the acquirer and target (if they are traded/public). Gordon Phillips and Alexei Zhdanov (RFS, 2013) created the first major mapping between firms in SDC and Compustat using a combination of name and date matching. Gaps were filled in using a fuzzy string search aided by manual checks that replicates the gvkey search in WRDS. The data was used in Ewens, Peters and Wang (2024). Either myself or a research assistant searched all M&As deals from 1996-2016 that did not have a gvkey in the Phillips and Zhdanov data, comparing names and dates by hand.

Citation: Ewens, M., Peters, R.H. and Wang, S., 2023. Measuring intangible capital with market prices”. Management Science, 2024.

Venture Capital Revenue Reported by US Banks (Chen and Ewens, 2023) 

This repository (Github) contains data (and code to create it) on U.S. bank “venture capital (VC) revenue.” The data plays an important role in the paper “Venture capital and Startup Agglomeration” (Chen and Ewens 2023). That paper uses this data to assess the importance of banking institutions as limited partners in venture capital. The zip file contains the csv file and the Stata do file allows you to recreate it. You will need to download the raw Call Reports data from the FDIC.

Citation: Chen, Jun and Michael Ewens. “Venture Capital and Startup Agglomeration”, Working Paper, 2023.

Regulatory Public Float and 10-K information (Ewens, Xiao and Xu, 2023) 

We collect public float data from firms’ 10-K filings (Github). These filings disclose the market value of all outstanding common equity (voting and non-voting) held by non-affiliates at the end of the second fiscal quarter. The data (public_float.csv) contains cik (SEC identifier), the year (publicfloat_year) and public float in millions (publicfloat_mil). For the set of public firms with $25m or less public float, we also  collected all the 10-K and 10-KSB filings from 1994-2007. We then parsed each file (after removing html and line breaks) to create simple string lengths of each of the 4 parts of the annual report. The goal of this exercise was to determine the real differences in total disclosure (in characters) for “small business issuers” vs. other firms during the sample period.

Citation: Ewens, Michael, Kairong Xiao, and Ting Xu. 2023. “Regulatory Costs of Being Public: Evidence from Bunching Estimation.” doi:10.31235/

10-K MD&A Sections and Parser of 10-Ks (Ewens, Peters and Wang, 2023) 

This repository (Github) contains an “index” file that can be used to process the raw MD&A data parsed using this code, which is a fork/update of this repo. After following the instructions below, you can have your own panel database of public firm MD&A text.

Citation: Ewens, M., Peters, R.H. and Wang, S., 2023. Measuring intangible capital with market prices”. Forthcoming, Management Science.

Form D filings events 1992 – 2008 (Ewens and Farre-Mensa 2020) 

The data (Github) on Form D filings was made available from a FOIA request with the Securities and Exchange Commission (SEC) in 2018. The data is used in the paper “The Deregulation of the Private Equity Markets and the Decline in IPOs” by Ewens and Farre-Mensa (2019) to study the impact of regulatory changes to Regulation D and was originally part of an analysis (Figure 2, page 10) done by the SEC “Capital Raising in the U.S.: An Analysis of the Market for Unregistered Securities Offerings, 2009-2014″(pdf). Vladimir Ivanov was instrumental in helping us acquire the data.

Citation: Ewens, Michael, and Joan Farre-Mensa. “The deregulation of the private equity markets and the decline in IPOs.” The Review of Financial Studies 33, no. 12 (2020): 5463-5509.