Corpus scope and coverage

NL Analytics searches English earnings-call transcripts. This page describes what the corpus contains and how to check whether it covers your sample.

Current scope

The corpus consists of English earnings-call transcripts from LSEG that are available to NL Analytics. You cannot upload your own documents or search other text sources.

Coverage snapshot

These figures summarize corpus-level metadata available to the documentation.

Coverage statistic	Snapshot value
Earnings-call transcripts	461,849
Date coverage	2002-01-14 to 2026-06-15
Covered entities	15,952
Headquarters countries	94
Total sentences	177.6 million
Median transcript length	378 sentences
Latest complete year	27,529 calls in 2025
Largest country share	United States, 59.3% of calls
RIC coverage	98.4% of call rows
GVKey coverage	96.7% of call rows
PermID coverage	86.9% of call rows
CIK coverage	81.3% of call rows
ISIN coverage	75.3% of call rows

Snapshot generated from corpus metadata on June 20, 2026.

Covered entities use the internal company key with company-name fallback. Identifier coverage is row-level coverage in the corpus metadata. Use these numbers as a starting point, not as proof that a specific sample is covered.

Check coverage for your sample

Aggregate coverage can hide gaps in exactly the firms, countries, sectors, or years your design needs. Before relying on a measure:

Define the target sample: the firms, countries, sectors, and time period the analysis is about.
Run an exploratory Risk Tool search over the relevant date range.
Read the results overview: which keywords, companies, and sectors drive the matches, and how many firms and transcripts were searched.
Export the call-level output and tabulate calls by headquarterscountry, economic_sector, and date_q in your analysis tool.
Compare those tabulations against the target sample. Keep zero rows — calls with no matches are part of the sample, and dropping them overstates coverage.
Record what you find: the date range, the sample definition, the export date, any missing coverage, and any restrictions you introduce downstream.

A topic can appear often in the broad corpus and still be too sparse, or too concentrated in a few firms, to support variation in the target sample. If coverage is thin, consider widening the date range, broadening the query, or narrowing the research question.

Refresh timing

The corpus refreshes roughly every two weeks, typically every 13 to 16 days. Treat the timing as approximate rather than guaranteed.

When publishing research, record the coverage statement and export date used for the analysis.

Future data sources

NL Analytics materials discuss possible future expansion to annual reports, quarterly reports, job postings, patent filings, or user-owned documents. None of these are currently available in the product; the corpus today is earnings calls only.