Worked example: supply chain disruption

This page walks one construct through the full research-idea-to-panel workflow. All queries, sentences, and numbers below are illustrative — they show the shape of the work, not real search results.

The construct

Research idea: firms differ in how exposed they are to supply chain disruption, and that exposure varies over time and across sectors.

Construct statement before opening any tool:

Supply chain disruption exposure: the extent to which a firm's management and analysts discuss disruptions to the firm's supply chain — shortages, delays, logistics bottlenecks, supplier failures — on its earnings calls.

Target sample: listed manufacturers and retailers, 2018–2025, with enough coverage in North America, Europe, and East Asia to compare regions.

First-pass query

Translate the construct into terms. A deliberately narrow start:

supply chain OR supplier OR logistics

Run this in the Keyword Tool. Among the suggestions, accept terms like shortage, lead times, and bottleneck; reject terms like value chain (too broad for this construct) after checking their example sentences.

Snippet review

Run an exploratory Risk Tool search for 2018–2025 and read matched sentences in the Snippet Tool. Three kinds of findings typically appear. Illustrative examples of what such sentences look like:

  • A false positive. A sentence such as "we continue to monitor money supply and rate expectations" matches supply -derived terms but is about monetary conditions, not supply chains. Fix: keep the phrase supply chain rather than the bare term supply.
  • A missing term. Sentences such as "congestion at the ports added three weeks to delivery" describe the construct without containing any first-pass term. Fix: add port congestion OR freight OR shipping delays.
  • An ambiguous term. logistics sometimes refers to the firm's own logistics business segment rather than to disruption. Decision: keep it, but record the ambiguity and check whether results change without it.

Refined query

(supply chain OR supplier OR shortage OR lead times OR bottleneck OR port congestion OR freight OR shipping delays) AND NOT money supply

The parentheses matter: AND NOT binds before OR, so without them the exclusion would apply only to the last term — see precedence.

Rerun and re-inspect. When the matches consistently describe the construct, record the final query and the rejected alternatives.

Decide whether to continue

Check the results overview and coverage. In this illustration, suppose matches are frequent for manufacturers and retailers but sparse for financial firms: the right call is to narrow the sample to the sectors with coverage, not to broaden the query until everything matches.

Interpret the metrics

For each call, exposure counts sentences matching the refined query; risk counts the subset that also contains risk or uncertainty language — see metric definitions. An illustrative slice of what firmlevel.csv rows look like:

earningscallIDcompany_namedateexposurerisknr_of_sentences
(id)Alpha Manufacturing2021-11-04146412
(id)Beta Retail Group2021-11-1292388
(id)Gamma Components2021-11-1800351

The zero row stays in the panel: Gamma's call had no matching sentences in this illustration, which is information, not a missing observation — see zero rows.

Export, join, normalize

Export the panel and join it to the downstream firm data — Compustat via gvkeys.csv in this example, with match quality checked on a sample.

Because long calls mechanically allow more matches, normalize: exposure / nr_of_sentences, or keep raw counts and control for nr_of_sentences. This example used no section or speaker restrictions, so the full-transcript denominator is the right one — see the normalization caveat for when it is not.

The reproducibility record

What gets stored with the exported files, per the checklist:

  • Final query string and date range; no section, speaker, or adjacent-sentence options.
  • Keyword Tool protocol: accepted and rejected suggestions, with notes on logistics.
  • Snippet-review notes: the false-positive pattern, added terms, the exclusion.
  • Export date and the coverage statement shown on the coverage page that day.
  • Join key (gvkey_compustat via the crosswalk, verified on a sample) and the normalization choice.

What this would look like in a paper

A data section built from this record needs only a few sentences, along the lines of:

We measure supply chain disruption exposure as the number of sentences in a firm's earnings calls matching a curated keyword set (listed in Appendix A, with rejected alternatives), normalized by transcript length. Counts are sentence-level keyword matches, not model scores; we validated the keyword set by manual review of matched sentences and report the false-positive patterns and exclusions in Appendix A. Calls with no matching sentences remain in the panel with a count of zero.

Every claim in that paragraph is backed by an artifact recorded above — which is the point of the workflow.

Was this page helpful?