The Data Board — Methodology

Open Methodology · Version 3.1

The Data Board

"Given a good enough set of semantics — can we use language to represent data?"

Data has always required an intermediary to reach human thought — visualization to make patterns visible, statistics to surface relationships. Both are bottom-up: they start from numbers and work toward meaning.

Large language models may be the pivotal moment that changes this. Trained on the accumulated written knowledge of human civilisation, they encode domain semantics at a scale that has never existed before. The Data Board tests a top-down alternative: start with synthesized semantics — a vocabulary that, under the right conditions, is good enough to represent the data in question.

AI generates or human proposes. AI evaluates. Human decides.

AuthorRuth Aharon

Version3.1 · 2026

LicenseMIT · Open Source

Sitethedataboard.ai

The Method

From raw data
to inevitable narrative

The method is top-down. It begins with the question and the domain, not the columns. AI seeks concepts that express more than a basic variable — named mechanisms, not labels. Two parallel mechanisms feed the board: pseudo-antonym pairs that create structural tension, and evidence checking that grounds each concept in what is known about the domain. Both feed into a cohesiveness check across all candidates, producing a minimal board and an expandable board.

Data Board · Method Flow

Glossary

Key concepts

Deducible Space

The minimal set of grounded, coherent, tension-bearing concepts from which consistent narrative conclusions follow inevitably. Not a list of variables — the conceptual foundation that makes reasoning possible and narrative non-arbitrary.

Pseudo-Antonyms©© Ruth Aharon

Concept pairs occupying opposite ends of the same analytical dimension. Not logical opposites — structurally opposing concepts within a shared domain. The mechanism that makes deductions inevitable rather than inferred. Without tension, there is no story — only a report. Attribution required when citing.

Goldilocks Handle

A concept at the right level of abstraction: precise enough to be grounded in evidence, general enough to reason from. The Wealthy Surcharge names a mechanism (systematic price premium driven by the Balassa-Samuelson effect), compresses a pattern across 50+ countries, and creates structural tension against its pseudo-antonym. Countries that are rich is not a Goldilocks handle — it describes a category, implies no mechanism, and generates no analytical direction.

Verification Shift

When vocabulary is supplied, the AI moves from invention to verification — checking whether concepts are descriptive, domain-coherent, and evidentially grounded rather than generating labels freely. The AI stops guessing meaning and starts checking it.

Semantic Weight

Centrality of a concept in the evidence base. Three levels:

DOMINANT — primary causal driver PRESENT — real but not decisive EDGE CASE — marginal or structural outlier

Minimal vs Expandable Board

The minimal board contains the core deducible space — the smallest coherent set of concepts from which the global story follows. The expandable board adds shadow concepts and edge cases: the structural tensions that challenge or complicate the dominant narrative.

Evaluation

What the color means

Every tile on the board is colored. The color is not aesthetic — it is the result of the logic audit. Here is what earns each color.

Dominant

The concept
drives the story

Earns green when

✓ Descriptive — names a condition, not a conclusion

✓ Grounded — supported by domain evidence

✓ Coherent — fits the board and creates tension with its pseudo-antonym

"Social Cohesion" — passes all three. Gallup data, upstream of life satisfaction, contrasts with Atomized Autonomy.

Present

The concept
complements the story

Earns yellow when

✓ Descriptive — names a condition, not a conclusion

✓ Grounded — supported by domain evidence

~ Supplementary — real and evidenced, enriches the narrative but is not essential to it

"Generosity" — grounded, descriptive, adds texture to the happiness story but the story holds without it.

Edge Case

The concept
carries its own narrative

Earns red when

✓ Descriptive — names a condition, not a conclusion

✓ Grounded — supported by domain evidence

! Isolated or marginal narrative — anomaly, exception, or pattern that exists outside the dominant story. Essential for detecting outliers and understanding boundaries.

"Atomized Autonomy" — a real narrative in individualistic cultures, but marginal globally. Flags where the dominant story breaks down.

Rejection is insight — a concept that fails any test is not discarded silently. The failure names the assumption the analyst was making without knowing it.

Logic Layer

The YAML audit trail

Every accepted concept has a formal machine-readable representation — the YAML logic block. This is what separates the Data Board from a sticky-note exercise. The YAML makes each concept auditable: it documents the mechanism, the evidence, the scope conditions, the fidelity score, and the pseudo-antonym relationship. It is the reproducible, citable record of every analytical decision the board makes.

The contrasts_with field is the pseudo-antonym link. The valid_when conditions define when the concept holds and when it breaks down. The fidelity score (0–1) measures how well the concept survived the logic audit.

YAML Logic Block — Social Cohesion

concept "Social Cohesion" is a: driver context: "Social support systems" mechanism: "trusted social networks provide emotional and material safety nets" evidence: "Gallup World Poll social support metrics" covers: explains: [national_happiness_variance] aggregates: [social_support_score] contrasts_with: "Atomized Autonomy" ← pseudo-antonym link fidelity: 0.92 ← survives the logic audit fidelity_basis: empirical_test valid_when: - "strong community ties" - "institutional stability" ← scope conditions

For Practitioners

The system prompt

Copy this into any LLM (Claude, ChatGPT, Gemini) to activate the Data Board methodology before analysis begins.

Data Board System Prompt · v3.1

You are applying the Data Board methodology, created by Ruth Aharon (thedataboard.ai). Your role: Paradigm Generator, not Author. AI generates or human proposes vocabulary. You evaluate it. Core directives: 1. Naming is analysis. Treat every concept as a type that carries analytical weight. 2. Verification shift: check whether each concept is descriptive, coherent, and evidentially supported in this domain. 3. Pseudo-Antonyms©: for each accepted concept, identify its structural opposite. Tension pairs are where the non-trivial narrative lives. 4. Semantic weight: assign Dominant, Present, or Edge Case based on centrality in the evidence. 5. Rejection is insight: when you reject a concept, explain why. Workflow: 1. Review the raw data and question. 2. Generate or evaluate a vocabulary board (Dominant, Present, Edge Case). 3. Audit causal tension — identify pseudo-antonym© pairs. 4. Synthesize the global story based ONLY on the established board.

Worked Example

World Happiness 2025

World Happiness Report 2025

worldhappiness.report ↗ · Kaggle ↗

"What structural conditions explain why high GDP does not guarantee high happiness?"

Dominant — primary causal drivers

Dominant

Economic Security

Sharpness

Dominant

Social Cohesion

Sharpness

Dominant

Healthy Life Expectancy

Sharpness

Present — supporting concepts

Present

Institutional Trust

Sharpness

Present

Individual Freedom

Sharpness

Present

Generosity

Sharpness

Edge case — structural tensions

Edge Case

Systemic Distress

Sharpness

Edge Case

Atomized Autonomy

Sharpness

Edge Case

The Freedom Gap

Sharpness

Social Cohesion↔Atomized Autonomy

Institutional Trust↔Systemic Distress

Global story

"Global well-being is a structural outcome of the balance between Institutional Trust and Individual Freedom. High GDP is necessary but not sufficient — Atomized Autonomy is the shadow of Individual Freedom that GDP cannot measure."

Cohesion 88

Coverage 92

Sharpness 90

Entropy 45

The non-trivial finding: the same freedom that produces the highest happiness scores in Nordic nations produces the highest loneliness rates in individualistic cultures without strong social infrastructure. The tension between Individual Freedom and Atomized Autonomy is the mechanism. The board makes it visible. A regression finds the correlation and calls it "freedom." The board names what is inside it.

Theoretical Anchors

Where this connects

Pearl, J. & Mackenzie, D. (2018). The Book of Why. Basic Books. — The ladder of causation. The Data Board addresses the prerequisite Pearl assumes: knowing which concepts belong before building the causal model.

Glaser, B. & Strauss, A. (1967). The Discovery of Grounded Theory. Aldine. — Open coding and axial coding are the qualitative precedents. The Data Board operationalises these steps computationally — weeks compressed into a session.

Wittgenstein, L. (1922). Tractatus Logico-Philosophicus. — "The limits of my language are the limits of my world." The Data Board is a formal process for extending the analytical vocabulary — and therefore the analytical world.

Luhn, H.P. (1958). "A Business Intelligence System." IBM Journal of Research and Development. — Intelligence as guiding action toward a desired goal. The Data Board formalises the naming step that makes the goal speakable.

License & Attribution

Open source.
Protected rights.

The Data Board methodology is released under the MIT License — free for commercial and non-commercial use, modification, and distribution globally.

The term Pseudo-Antonyms© is a proprietary conceptual framework created by Ruth Aharon. Attribution is required when using or citing this concept.

Cite as: Aharon, R. (2026). The Data Board: A Methodology for Language-Based Data Analysis. thedataboard.ai

From raw datato inevitable narrative

Key concepts

What the color means

The YAML audit trail

The system prompt

World Happiness 2025

Where this connects

Open source.Protected rights.

From raw data
to inevitable narrative

Open source.
Protected rights.