Datasets

Every number on a Delphi dashboard traces back to a dataset. Connectors produce them, documents can become them, KPIs read from them, and visualizations render them. Before you can make sense of the rest of the Data tab, it helps to know what a dataset actually is in Delphi and what you can do with one.

What a dataset is

A dataset is a named, described collection of observations with a known source. The metadata is deliberately plain — a short metric name that doubles as a chart title, the source it came from (NWS, NOAA, your own warehouse, a connector you configured), a description of why it’s relevant to your command center, and an optional provenance link so anyone looking at the number can trace it back to where it came from.

The observations themselves are points with a date, a value, and — where applicable — a human-readable label, a place identifier, or latitude and longitude for map layers. The shape is intentionally generic, so one dataset model can carry streamflow readings, ticket counts, seismic events, reservoir levels, and quarterly financials without special cases.

How datasets get created

There are three common shapes:

Inline datasets ship with their data embedded. Useful for demos, one-off analyses, and anything Delphi fetched once and saved for you. Inline datasets are self-contained — the observations travel with the dataset itself.

Connector-backed datasets load observations from a live connector feed. The dataset points at the connector and carries a small mapping that tells Delphi which connector fields become the date, value, label, latitude, and longitude for each observation. When the connector pulls fresh data, the dataset renders it — no copy step required.

Sample-backed datasets read from a specific sample on a connector. Samples are lightweight representative slices of a larger connector feed, useful when you want to watch a filtered or aggregated view without paying the cost of streaming the whole thing. The mapping works the same way as connector-backed datasets.

All three live side by side on a dashboard. You rarely pick the shape yourself — Delphi chooses when you ask it to add a data source in chat. See chatting with Delphi for the request patterns that work best.

Data freshness

Every dataset carries a freshness tag so the agent and the UI both know how much weight to put on the number:

Realtime — live, pulled on demand or streamed from a connector.
Near-realtime — lagged behind live but still current enough for most decisions.
Historic — a static or frozen window. Good for trend analysis, not the current number.
Sample — a representative slice, typically for demos or scraped once for shape.
Projection — scenario output. Not observed; derived from assumptions. Carries its own scenario metadata and an expiry.
Field report — human-entered observation, pending review or deliberately unverified ground truth.

The provenance badge on any KPI card that reads from a dataset picks its colour directly from this tag, so the freshness of the underlying data propagates to the visible surface.

Classification and scope

Like every first-class object in Delphi, a dataset can be tagged with a classification: public, internal, confidential, or restricted. The classification determines who can see it regardless of whether they have access to the dashboard itself. See roles and permissions for the full scope matrix and how classification gates combine with role.

Datasets without an explicit classification are treated as public for backward compatibility, but unknown or misspelled classifications are denied outright.

Origin and credibility

Freshness tells you when the data was captured; origin tells you whether it’s real at all. Every dataset is tagged with one of three origins:

real — observed signal pulled from a connector, a sample, or a verified upload. The default for any dataset wired to a live source.
modeled — output of a model, simulation, or scenario projection. Plausible but synthesized; tied to the assumptions of whatever produced it.
synthetic — sample data, mock data, or anything fabricated for demos and stubs. Useful for shape, never to be confused with ground truth.

Origin matters because a confident-looking number on top of synthetic data is the single fastest way to lose trust. Delphi tracks origin explicitly so the UI and the agent can both signal it loudly. Synthetic datasets render with a SAMPLE label stamped diagonally across the card and a soft gaussian blur over the chart so you cannot mistake them for real numbers at a glance — even if a screenshot escapes into a slide deck. Modeled datasets carry a lighter affordance that flags them as derived rather than observed.

Origin also feeds a credibility floor. Synthetic data is capped at a credibility score of 0.1; modeled data caps at 0.4. Real data has no ceiling beyond whatever its underlying source supports.

When you graduate a command center from demo data to real connectors, ask Delphi in chat to retag the affected datasets — the agent flips the origin to real (or modeled if the data is still derived) and the inherited values on every downstream KPI and visualization update automatically.

Source classification

For real data, the credibility score isn’t pulled from thin air. Delphi inspects the source URL of each dataset and classifies it across four dimensions:

Institution — government, academic, non-profit, commercial, personal. A .gov source scores differently than a personal blog.
Content — primary research, secondary analysis, news, opinion, marketing.
Authorship — credentialed expert, journalist, anonymous, organizational byline.
Domain — the structural signal of the URL itself (.gov, .edu, .org, .com).

Each dimension contributes a weighted score; the dimensions you don’t have data for don’t drag the result down (a .gov source with no authorship signal scores well, not mid). The result is the dataset’s credibility number — a real-data-grounded credibility, not a fixed default.

How sensitivity and credibility propagate

A KPI or visualization that reads from one dataset is straightforward; a KPI that reads from several has to pick which source’s classification to honor. Delphi resolves both directions automatically:

Sensitivity propagates upward as the maximum. A KPI built on one public dataset and one confidential dataset is treated as confidential. A missing classification on any source counts as restricted — fail-closed. The inherited classification feeds the same role-based access checks as the underlying datasets.
Credibility propagates upward as the minimum. A KPI’s inherited credibility is the worst score across its sources, and the source classification of that weakest source travels with it so the badge can render the same four-bar meter the dataset itself would.

Both fields update automatically as upstream data changes. You don’t need to recompute lineage by hand or schedule a sweep — the inherited values stay live.

The inherited classification badge

Every KPI card, chart, and connector card in Delphi renders a compact inherited classification badge in its header: a small tier pill showing the propagated sensitivity (PUBLIC, INTERNAL, CONFIDENTIAL, RESTRICTED) next to a four-bar credibility meter that mirrors the one on Cited Sources. Hovering the badge reveals the precise classification and credibility score; you don’t have to dig into a side panel to see how trustworthy a number is.

If a card has neither a known classification nor a known credibility, the badge stays out of the way. If both are present, the badge gives you the answer at a glance — the same answer your auditor will get when they look at the same screen.

What reads a dataset

Once a dataset exists, the rest of the dashboard can attach to it:

KPIs reduce observations to a hero number with a formula like latest, sum, average, count, min, or max.
Visualizations render observations as charts, maps, or metric cards.
Scenarios read datasets as baselines and write projected datasets back.
Reports cite datasets inline so every claim has a source chain.
Chat can query, summarize, or compare datasets on demand.

If you want to change what a dataset represents — add a source, swap a connector, adjust the freshness tag, tighten the classification — ask in chat. The agent understands the dataset model and will wire up the update without you having to edit anything by hand.