Citing SEC Filings Guide
How to Cite SEC Filings for Reproducible Financial Analysis
For structured-finance analysts and data engineers, citing an SEC filing correctly requires four key identifiers: the registrant's name, the form type (e.g., 10-K, 424B5), the filing date, and a unique identifier like the Accession Number. These elements establish a verifiable data lineage, linking any analysis or model directly back to its source document in the SEC's EDGAR database. This practice is fundamental for building reproducible, auditable financial models, especially when dealing with remittance data or monitoring deal performance. Platforms like Dealcharts are designed to visualize and cite this data, embedding lineage directly into the analytical workflow.
Why Verifiable Citations Matter in Credit Markets
In quantitative analysis and credit markets, a financial model is only as reliable as its underlying data. An unsourced number is a liability. Learning how to cite SEC filings is not an academic exercise; it's a critical discipline for ensuring that models for asset-backed securities (ABS) or commercial mortgage-backed securities (CMBS) are transparent, reproducible, and defensible. Without precise, machine-readable citations, an analyst's work becomes a "black box," impossible to validate or audit.
The primary technical challenge is ensuring that every data point—from a loan's LTV in a prospectus to a servicer's advance in a 10-D report—is tied to a permanent, verifiable source. Vague citations introduce unacceptable risks: a model might use data from an outdated 10-K instead of the amended 10-K/A, or a data pipeline could fail when it can't programmatically locate the correct document version. In structured finance, where a single deal can involve dozens of filings, this precision is paramount for accurate risk monitoring and due diligence.
The Data Source: Core SEC Filing Identifiers
The bedrock of any verifiable citation is a set of unique, machine-readable keys that allow programmatic access to the exact source document within the EDGAR database. This moves beyond simple hyperlinking; it’s about embedding structured metadata directly with the data it supports, ensuring a clear chain of custody from source to insight.
- Central Index Key (CIK): A permanent 10-digit number the SEC assigns to each filing entity. It serves as the primary key for aggregating all documents filed by a single company (e.g., Apple Inc.'s CIK is
). For developers, the CIK is essential for linking filings to counterparties and issuers.0000320193 - Accession Number: A unique identifier assigned to every individual submission to EDGAR (e.g.,
). This is the key to version control. An original 10-K and its amended 10-K/A will share a CIK but have distinct Accession Numbers. Citing the Accession Number removes all ambiguity about which version of a document was used.0001193125-24-012345 - File Number: This identifier groups related filings under a single registration, such as a shelf registration (
). In structured finance, the File Number is crucial for tracking the complete lifecycle of a transaction, from the initial 424B5 prospectus to ongoing 10-D servicer reports.333-234567
Mastering these identifiers is the first step toward building automated, explainable data pipelines. According to the National Archives, over 1.5 million documents were filed in 2023 alone, making programmatic access via these keys a necessity.
Example Workflow: Programmatic Citation Generation
Manually creating citations is inefficient and prone to error. A modern workflow embeds data lineage programmatically from the moment data is ingested. This ensures every metric is automatically linked to its source document, making the entire analysis transparent and auditable.
The following Python snippet demonstrates this data lineage mindset. It uses a simple API call to fetch filing metadata for a given Accession Number and constructs a clean, standardized citation. This workflow transforms a raw identifier into a structured, verifiable reference.
Source → Transform → Insight
- Source: The unique Accession Number for a specific SEC filing.
- Transform: An API call retrieves structured metadata (Company Name, CIK, Form Type, Filing Date).
- Insight: A machine-readable citation string is generated and attached to the data extracted from the filing.
import requestsdef generate_sec_citation(accession_no: str) -> str:"""Fetches filing metadata from the Dealcharts API and constructs a citation.Source -> Transform -> Insight"""try:# 1. Source: Fetch structured metadata using the accession numberapi_url = f"https://dealcharts.org/api/v1/sec/filings/{accession_no}"response = requests.get(api_url)response.raise_for_status() # Raise HTTPError for bad responsesdata = response.json()# 2. Transform: Construct a standardized citation stringcitation = (f"{data['company_name']} ({data['cik']}). "f"Form {data['form_type']}, filed {data['filing_date']}. "f"Accession No. {data['accession_no']}.")# 3. Insight: Return the verifiable citationreturn citationexcept requests.exceptions.RequestException as e:return f"API Error: {e}"# Example: Apple's 2023 10-Kaccession_number = "0000320193-23-000106"filing_citation = generate_sec_citation(accession_number)print(filing_citation)# Expected Output: Apple Inc. (320193). Form 10-K, filed 2023-11-03. Accession No. 0000320193-23-000106.
This programmatic approach ensures that data lineage is not an afterthought but an integral part of the data ingestion process. The Dealcharts API provides endpoints for this and other critical metadata, supporting the development of explainable data pipelines.
Implications for AI and Modeling
The practice of embedding structured, verifiable citations has profound implications for advanced financial modeling and the use of Large Language Models (LLMs). When every data point carries its own source metadata, it creates a "model-in-context." This means a risk model or an AI agent can not only process a number but also understand its origin, timeliness, and the regulatory context in which it was reported.
This level of explainability is critical for building trustworthy systems. For example, an LLM trained on financial data with embedded citations can provide sourced answers to complex queries, such as comparing delinquency rates across a 2025 CMBS vintage. Instead of giving a generic answer, it can point to the specific 10-D filings from which the data was derived. This transforms AI from a "black box" into an explainable context engine, a core principle of CMD+RVL. The result is a more robust, transparent, and auditable approach to quantitative finance.
How Dealcharts Enforces Data Lineage
This entire workflow—linking data points to verifiable sources—is the core principle behind Dealcharts. The platform is engineered to maintain data lineage by default, connecting filings, deals, shelves, tranches, and counterparties into a structured graph. When viewing a CMBS transaction overview or analyzing remittance data for a deal like the BMARK 2025-V17 CMBS transaction, the underlying source documents are directly linked. Dealcharts connects these datasets so analysts can publish and share verified charts without rebuilding data pipelines from scratch.
Citation Templates for Common Filings
For those building their own systems, here are standardized templates for key filings:
| Form Type | Structure | Example |
|---|---|---|
| 10-K/10-Q | [Company Name] ([CIK]). Form [Type], filed [YYYY-MM-DD]. Accession No. [Number]. | Apple Inc. (0000320193). Form 10-K, filed 2023-11-03. Accession No. 0000320193-23-000106. |
| 8-K | [Company Name] ([CIK]). Form 8-K, filed [YYYY-MM-DD]. Accession No. [Number]. | Microsoft Corporation (0000789019). Form 8-K, filed 2024-01-26. Accession No. 0000950170-24-007252. |
| 424B5 | [Issuer] ([CIK]). Form 424B5, filed [YYYY-MM-DD]. Accession No. [Number]. File No. [Number]. | Morgan Stanley Capital I Inc. (0001018292). Form 424B5, filed 2024-05-10. Accession No. 0001193125-24-129683. File No. 333-278065. |
Conclusion
Properly citing SEC filings is foundational to modern quantitative analysis. By embedding structured, machine-readable citations directly into data workflows, analysts and developers create a verifiable data lineage that enhances model accuracy, explainability, and trust. This systematic approach transforms raw data into reliable, context-rich insights, paving the way for more robust and auditable financial systems. Frameworks like CMD+RVL champion this vision of reproducible, explainable finance.
Article created using Outrank