Worked example: supply chain disruption

This page walks one construct through the full research-idea-to-panel workflow. All queries, sentences, and numbers below are illustrative; they show the shape of the work, not real dataset results.

The construct

Research idea: firms differ in how exposed they are to supply chain disruption, and that exposure varies over time and across sectors.

Construct statement before opening any tool:

Supply chain disruption exposure: the extent to which a firm's management and analysts discuss shortages, delays, logistics bottlenecks, and supplier failures affecting the firm's supply chain on earnings calls.

Target sample: listed manufacturers and retailers, 2018–2025, with enough availability in North America, Europe, and East Asia to compare regions.

First-pass query

Translate the construct into terms. A deliberately narrow start:

supply chain OR supplier OR logistics

Run this in the Keyword Tool. Among the suggestions, accept terms like shortage, lead times, and bottleneck; reject terms like value chain (too broad for this construct) after checking their example sentences.

Snippet review

Run an exploratory Risk Tool search for 2018–2025 and read matched sentences in the Snippet Tool. Three kinds of findings typically appear. Illustrative examples of what such sentences look like:

A false positive. A sentence such as "we continue to monitor money supply and rate expectations" matches supply -derived terms but is about monetary conditions, not supply chains. Fix: keep the phrase supply chain rather than the bare term supply.
A missing term. Sentences such as "congestion at the ports added three weeks to delivery" describe the construct without containing any first-pass term. Fix: add port congestion OR freight OR shipping delays.
An ambiguous term. logistics sometimes refers to the firm's own logistics business segment rather than to disruption. Decision: keep it, but record the ambiguity and check whether results change without it.

Refined query

(supply chain OR supplier OR shortage OR lead times OR bottleneck OR port congestion OR freight OR shipping delays) AND NOT money supply

The parentheses matter: AND NOT binds before OR, so without them the exclusion would apply only to the last term. See precedence.

Rerun and re-inspect. When the matches consistently describe the construct, record the final query and the rejected alternatives.

Decide whether to continue

Check the results overview and sample availability. In this illustration, suppose matches are frequent for manufacturers and retailers but sparse for financial firms: the right call is to narrow the sample to the sectors with enough availability, not to broaden the query until everything matches.

Interpret the measures

For each call, exposure counts sentences matching the refined query; risk counts the subset that also contains risk or uncertainty language. See measure definitions. An illustrative slice of firmlevel.csv looks like this:

`earningscallID`	`company_name`	`date`	`exposure`	`risk`	`nr_of_sentences`
(id)	Alpha Manufacturing	2021-11-04	14	6	412
(id)	Beta Retail Group	2021-11-12	9	2	388
(id)	Gamma Components	2021-11-18	0	0	351

The zero row stays in the panel: Gamma's call had no matching sentences in this illustration, which is information, not a missing observation. See zero rows.

Export, join, normalize

Export the panel and join it to the downstream firm data. This example uses Compustat via gvkeys.csv, with match quality checked on a sample.

Because long calls mechanically allow more matches, normalize with exposure / nr_of_sentences, or keep raw counts and control for nr_of_sentences. This example used no section or speaker restrictions, so the full-transcript denominator is the right one. See the normalization caveat for when it is not.

The reproducibility record

What gets stored with the exported files, per the checklist:

Final query string and date range; no section, speaker, country, sector, or adjacent-sentence filters.
Keyword Tool protocol: accepted and rejected suggestions, with notes on logistics.
Snippet-review notes: the false-positive pattern, added terms, the exclusion.
Export date and the corpus statement shown on the corpus availability page that day.
Join key (gvkey_compustat via the crosswalk, verified on a sample) and the normalization choice.

What this would look like in a paper

A data section built from this record needs only a few sentences, along the lines of:

We measure supply chain disruption exposure as the number of sentences in a firm's earnings calls matching a curated keyword set (listed in Appendix A, with rejected alternatives), normalized by transcript length. Counts are sentence-level keyword matches, not model scores; we validated the keyword set by manual review of matched sentences and report the false-positive patterns and exclusions in Appendix A. Calls with no matching sentences remain in the panel with a count of zero.

Every claim in that paragraph is backed by an artifact recorded above. That is the point of the workflow.