Open Methodology · Version 3.1
The Data Board
"Given a good enough set of semantics — can we use language to represent data?"

Data has always required an intermediary to reach human thought — visualization to make patterns visible, statistics to surface relationships. Both are bottom-up: they start from numbers and work toward meaning.

Large language models may be the pivotal moment that changes this. Trained on the accumulated written knowledge of human civilisation, they encode domain semantics at a scale that has never existed before. The Data Board tests a top-down alternative: start with synthesized semantics — a vocabulary that, under the right conditions, is good enough to represent the data in question.

AI generates or human proposes. AI evaluates. Human decides.
AuthorRuth Aharon
Version3.1 · 2026
LicenseMIT · Open Source
Sitethedataboard.ai
The Method

From raw data
to inevitable narrative

The method is top-down. It begins with the question and the domain, not the columns. AI seeks concepts that express more than a basic variable — named mechanisms, not labels. Two parallel mechanisms feed the board: pseudo-antonym pairs that create structural tension, and evidence checking that grounds each concept in what is known about the domain. Both feed into a cohesiveness check across all candidates, producing a minimal board and an expandable board.

Data Board · Method Flow
RAW DATA Dataset · corpus · question SEMANTIC SYNTHESIS AI seeks concepts that express more than a basic variable PSEUDO- ANTONYMS© tension pairs create the story EVIDENCE CHECK grounded in corpus and domain knowledge COHESIVENESS CHECK MINIMAL BOARD EXPANDABLE BOARD WHAT MAKES A GOOD CONCEPT Not just a column — a named mechanism. "Wealthy Surcharge" not "rich countries" EXAMPLE PAIR Social Cohesion ↔ Atomized Autonomy TWO EVALUATION DIMENSIONS Relevance — density in LLM corpus Cohesiveness — fit with other concepts MINIMAL vs EXPANDABLE Minimal: core deducible space Expandable: + shadow / edge concepts
Glossary

Key concepts

Deducible Space

The minimal set of grounded, coherent, tension-bearing concepts from which consistent narrative conclusions follow inevitably. Not a list of variables — the conceptual foundation that makes reasoning possible and narrative non-arbitrary.

Pseudo-Antonyms©© Ruth Aharon

Concept pairs occupying opposite ends of the same analytical dimension. Not logical opposites — structurally opposing concepts within a shared domain. The mechanism that makes deductions inevitable rather than inferred. Without tension, there is no story — only a report. Attribution required when citing.

Goldilocks Handle

A concept at the right level of abstraction: precise enough to be grounded in evidence, general enough to reason from. The Wealthy Surcharge names a mechanism (systematic price premium driven by the Balassa-Samuelson effect), compresses a pattern across 50+ countries, and creates structural tension against its pseudo-antonym. Countries that are rich is not a Goldilocks handle — it describes a category, implies no mechanism, and generates no analytical direction.

Verification Shift

When vocabulary is supplied, the AI moves from invention to verification — checking whether concepts are descriptive, domain-coherent, and evidentially grounded rather than generating labels freely. The AI stops guessing meaning and starts checking it.

Semantic Weight

Centrality of a concept in the evidence base. Three levels:

DOMINANT — primary causal driver PRESENT — real but not decisive EDGE CASE — marginal or structural outlier
Minimal vs Expandable Board

The minimal board contains the core deducible space — the smallest coherent set of concepts from which the global story follows. The expandable board adds shadow concepts and edge cases: the structural tensions that challenge or complicate the dominant narrative.

Evaluation

What the color means

Every tile on the board is colored. The color is not aesthetic — it is the result of the logic audit. Here is what earns each color.

Dominant
The concept
drives the story
Earns green when
Descriptive — names a condition, not a conclusion
Grounded — supported by domain evidence
Coherent — fits the board and creates tension with its pseudo-antonym
"Social Cohesion" — passes all three. Gallup data, upstream of life satisfaction, contrasts with Atomized Autonomy.
Present
The concept
complements the story
Earns yellow when
Descriptive — names a condition, not a conclusion
Grounded — supported by domain evidence
~ Supplementary — real and evidenced, enriches the narrative but is not essential to it
"Generosity" — grounded, descriptive, adds texture to the happiness story but the story holds without it.
Edge Case
The concept
carries its own narrative
Earns red when
Descriptive — names a condition, not a conclusion
Grounded — supported by domain evidence
! Isolated or marginal narrative — anomaly, exception, or pattern that exists outside the dominant story. Essential for detecting outliers and understanding boundaries.
"Atomized Autonomy" — a real narrative in individualistic cultures, but marginal globally. Flags where the dominant story breaks down.
Rejection is insight — a concept that fails any test is not discarded silently. The failure names the assumption the analyst was making without knowing it.
Logic Layer

The YAML audit trail

Every accepted concept has a formal machine-readable representation — the YAML logic block. This is what separates the Data Board from a sticky-note exercise. The YAML makes each concept auditable: it documents the mechanism, the evidence, the scope conditions, the fidelity score, and the pseudo-antonym relationship. It is the reproducible, citable record of every analytical decision the board makes.

The contrasts_with field is the pseudo-antonym link. The valid_when conditions define when the concept holds and when it breaks down. The fidelity score (0–1) measures how well the concept survived the logic audit.

YAML Logic Block — Social Cohesion
concept "Social Cohesion" is a: driver context: "Social support systems" mechanism: "trusted social networks provide emotional and material safety nets" evidence: "Gallup World Poll social support metrics" covers: explains: [national_happiness_variance] aggregates: [social_support_score] contrasts_with: "Atomized Autonomy" ← pseudo-antonym link fidelity: 0.92 ← survives the logic audit fidelity_basis: empirical_test valid_when: - "strong community ties" - "institutional stability" ← scope conditions
For Practitioners

The system prompt

Copy this into any LLM (Claude, ChatGPT, Gemini) to activate the Data Board methodology before analysis begins.

Data Board System Prompt · v3.1
You are applying the Data Board methodology, created by Ruth Aharon (thedataboard.ai). Your role: Paradigm Generator, not Author. AI generates or human proposes vocabulary. You evaluate it. Core directives: 1. Naming is analysis. Treat every concept as a type that carries analytical weight. 2. Verification shift: check whether each concept is descriptive, coherent, and evidentially supported in this domain. 3. Pseudo-Antonyms©: for each accepted concept, identify its structural opposite. Tension pairs are where the non-trivial narrative lives. 4. Semantic weight: assign Dominant, Present, or Edge Case based on centrality in the evidence. 5. Rejection is insight: when you reject a concept, explain why. Workflow: 1. Review the raw data and question. 2. Generate or evaluate a vocabulary board (Dominant, Present, Edge Case). 3. Audit causal tension — identify pseudo-antonym© pairs. 4. Synthesize the global story based ONLY on the established board.
Worked Example

World Happiness 2025

World Happiness Report 2025
"What structural conditions explain why high GDP does not guarantee high happiness?"
Dominant
Economic Security
Sharpness
Dominant
Social Cohesion
Sharpness
Dominant
Healthy Life Expectancy
Sharpness
Present
Institutional Trust
Sharpness
Present
Individual Freedom
Sharpness
Present
Generosity
Sharpness
Edge Case
Systemic Distress
Sharpness
Edge Case
Atomized Autonomy
Sharpness
Edge Case
The Freedom Gap
Sharpness
Pseudo-antonyms©
Social CohesionAtomized Autonomy
Institutional TrustSystemic Distress
"Global well-being is a structural outcome of the balance between Institutional Trust and Individual Freedom. High GDP is necessary but not sufficient — Atomized Autonomy is the shadow of Individual Freedom that GDP cannot measure."
Cohesion 88
Coverage 92
Sharpness 90
Entropy 45

The non-trivial finding: the same freedom that produces the highest happiness scores in Nordic nations produces the highest loneliness rates in individualistic cultures without strong social infrastructure. The tension between Individual Freedom and Atomized Autonomy is the mechanism. The board makes it visible. A regression finds the correlation and calls it "freedom." The board names what is inside it.

Theoretical Anchors

Where this connects

Pearl, J. & Mackenzie, D. (2018). The Book of Why. Basic Books. — The ladder of causation. The Data Board addresses the prerequisite Pearl assumes: knowing which concepts belong before building the causal model.
Glaser, B. & Strauss, A. (1967). The Discovery of Grounded Theory. Aldine. — Open coding and axial coding are the qualitative precedents. The Data Board operationalises these steps computationally — weeks compressed into a session.
Wittgenstein, L. (1922). Tractatus Logico-Philosophicus. — "The limits of my language are the limits of my world." The Data Board is a formal process for extending the analytical vocabulary — and therefore the analytical world.
Luhn, H.P. (1958). "A Business Intelligence System." IBM Journal of Research and Development. — Intelligence as guiding action toward a desired goal. The Data Board formalises the naming step that makes the goal speakable.
License & Attribution

Open source.
Protected rights.

The Data Board methodology is released under the MIT License — free for commercial and non-commercial use, modification, and distribution globally.

The term Pseudo-Antonyms© is a proprietary conceptual framework created by Ruth Aharon. Attribution is required when using or citing this concept.

Cite as: Aharon, R. (2026). The Data Board: A Methodology for Language-Based Data Analysis. thedataboard.ai