Corpus scope and coverage
NL Analytics searches English earnings-call transcripts. This page describes what the corpus contains and how to check whether it covers your sample.
Current scope
The corpus consists of English earnings-call transcripts from LSEG that are available to NL Analytics. You cannot upload your own documents or search other text sources.
Coverage snapshot
These figures summarize corpus-level metadata available to the documentation.
| Coverage statistic | Snapshot value |
|---|---|
| Earnings-call transcripts | 461,849 |
| Date coverage | 2002-01-14 to 2026-06-15 |
| Covered entities | 15,952 |
| Headquarters countries | 94 |
| Total sentences | 177.6 million |
| Median transcript length | 378 sentences |
| Latest complete year | 27,529 calls in 2025 |
| Largest country share | United States, 59.3% of calls |
| RIC coverage | 98.4% of call rows |
| GVKey coverage | 96.7% of call rows |
| PermID coverage | 86.9% of call rows |
| CIK coverage | 81.3% of call rows |
| ISIN coverage | 75.3% of call rows |
Snapshot generated from corpus metadata on June 20, 2026.
Covered entities use the internal company key with company-name fallback. Identifier coverage is row-level coverage in the corpus metadata. Use these numbers as a starting point, not as proof that a specific sample is covered.
Check coverage for your sample
Aggregate coverage can hide gaps in exactly the firms, countries, sectors, or years your design needs. Before relying on a measure:
- Define the target sample: the firms, countries, sectors, and time period the analysis is about.
- Run an exploratory Risk Tool search over the relevant date range.
- Read the results overview: which keywords, companies, and sectors drive the matches, and how many firms and transcripts were searched.
- Export the call-level output and tabulate calls by
headquarterscountry,economic_sector, anddate_qin your analysis tool. - Compare those tabulations against the target sample. Keep zero rows — calls with no matches are part of the sample, and dropping them overstates coverage.
- Record what you find: the date range, the sample definition, the export date, any missing coverage, and any restrictions you introduce downstream.
A topic can appear often in the broad corpus and still be too sparse, or too concentrated in a few firms, to support variation in the target sample. If coverage is thin, consider widening the date range, broadening the query, or narrowing the research question.
Refresh timing
The corpus refreshes roughly every two weeks, typically every 13 to 16 days. Treat the timing as approximate rather than guaranteed.
When publishing research, record the coverage statement and export date used for the analysis.
Future data sources
NL Analytics materials discuss possible future expansion to annual reports, quarterly reports, job postings, patent filings, or user-owned documents. None of these are currently available in the product; the corpus today is earnings calls only.