"Given a good enough set of semantics — can we use language to represent data?"
Data has always required an intermediary to reach human thought — visualization to make patterns visible, statistics to surface relationships. Both are bottom-up: they start from numbers and work toward meaning.
Large language models may be the pivotal moment that changes this. Trained on the accumulated written knowledge of human civilisation, they encode domain semantics at a scale that has never existed before. The Data Board tests a top-down alternative: start with synthesized semantics — a vocabulary that, under the right conditions, is good enough to represent the data in question.
AI generates or human proposes. AI evaluates. Human decides.
AuthorRuth Aharon
Version3.1 · 2026
LicenseMIT · Open Source
Sitethedataboard.ai
The Method
From raw data to inevitable narrative
The method is top-down. It begins with the question and the domain, not the columns. AI seeks concepts that express more than a basic variable — named mechanisms, not labels. Two parallel mechanisms feed the board: pseudo-antonym pairs that create structural tension, and evidence checking that grounds each concept in what is known about the domain. Both feed into a cohesiveness check across all candidates, producing a minimal board and an expandable board.
Data Board · Method Flow
Glossary
Key concepts
Deducible Space
The minimal set of grounded, coherent, tension-bearing concepts from which consistent narrative conclusions follow inevitably. Not a list of variables — the conceptual foundation that makes reasoning possible and narrative non-arbitrary.
Concept pairs occupying opposite ends of the same analytical dimension. Not logical opposites — structurally opposing concepts within a shared domain. The mechanism that makes deductions inevitable rather than inferred. Without tension, there is no story — only a report. Attribution required when citing.
Goldilocks Handle
A concept at the right level of abstraction: precise enough to be grounded in evidence, general enough to reason from. The Wealthy Surcharge names a mechanism (systematic price premium driven by the Balassa-Samuelson effect), compresses a pattern across 50+ countries, and creates structural tension against its pseudo-antonym. Countries that are rich is not a Goldilocks handle — it describes a category, implies no mechanism, and generates no analytical direction.
Verification Shift
When vocabulary is supplied, the AI moves from invention to verification — checking whether concepts are descriptive, domain-coherent, and evidentially grounded rather than generating labels freely. The AI stops guessing meaning and starts checking it.
Semantic Weight
Centrality of a concept in the evidence base. Three levels:
DOMINANT — primary causal driverPRESENT — real but not decisiveEDGE CASE — marginal or structural outlier
Minimal vs Expandable Board
The minimal board contains the core deducible space — the smallest coherent set of concepts from which the global story follows. The expandable board adds shadow concepts and edge cases: the structural tensions that challenge or complicate the dominant narrative.
Evaluation
What the color means
Every tile on the board is colored. The color is not aesthetic — it is the result of the logic audit. Here is what earns each color.
Dominant
The concept drives the story
Earns green when
✓Descriptive — names a condition, not a conclusion
✓Grounded — supported by domain evidence
✓Coherent — fits the board and creates tension with its pseudo-antonym
"Social Cohesion" — passes all three. Gallup data, upstream of life satisfaction, contrasts with Atomized Autonomy.
Present
The concept complements the story
Earns yellow when
✓Descriptive — names a condition, not a conclusion
✓Grounded — supported by domain evidence
~Supplementary — real and evidenced, enriches the narrative but is not essential to it
"Generosity" — grounded, descriptive, adds texture to the happiness story but the story holds without it.
Edge Case
The concept carries its own narrative
Earns red when
✓Descriptive — names a condition, not a conclusion
✓Grounded — supported by domain evidence
!Isolated or marginal narrative — anomaly, exception, or pattern that exists outside the dominant story. Essential for detecting outliers and understanding boundaries.
"Atomized Autonomy" — a real narrative in individualistic cultures, but marginal globally. Flags where the dominant story breaks down.
Rejection is insight — a concept that fails any test is not discarded silently. The failure names the assumption the analyst was making without knowing it.
Logic Layer
The YAML audit trail
Every accepted concept has a formal machine-readable representation — the YAML logic block. This is what separates the Data Board from a sticky-note exercise. The YAML makes each concept auditable: it documents the mechanism, the evidence, the scope conditions, the fidelity score, and the pseudo-antonym relationship. It is the reproducible, citable record of every analytical decision the board makes.
The contrasts_with field is the pseudo-antonym link. The valid_when conditions define when the concept holds and when it breaks down. The fidelity score (0–1) measures how well the concept survived the logic audit.
YAML Logic Block — Social Cohesion
concept"Social Cohesion"is a: drivercontext: "Social support systems"mechanism: "trusted social networks provide emotional and material safety nets"evidence: "Gallup World Poll social support metrics"covers:explains: [national_happiness_variance]aggregates: [social_support_score]contrasts_with: "Atomized Autonomy"← pseudo-antonym linkfidelity: 0.92← survives the logic auditfidelity_basis: empirical_testvalid_when:- "strong community ties"- "institutional stability"← scope conditions
For Practitioners
The system prompt
Copy this into any LLM (Claude, ChatGPT, Gemini) to activate the Data Board methodology before analysis begins.
"Global well-being is a structural outcome of the balance between Institutional Trust and Individual Freedom. High GDP is necessary but not sufficient — Atomized Autonomy is the shadow of Individual Freedom that GDP cannot measure."
Cohesion 88
Coverage 92
Sharpness 90
Entropy 45
The non-trivial finding: the same freedom that produces the highest happiness scores in Nordic nations produces the highest loneliness rates in individualistic cultures without strong social infrastructure. The tension between Individual Freedom and Atomized Autonomy is the mechanism. The board makes it visible. A regression finds the correlation and calls it "freedom." The board names what is inside it.
Theoretical Anchors
Where this connects
Pearl, J. & Mackenzie, D. (2018).The Book of Why. Basic Books. — The ladder of causation. The Data Board addresses the prerequisite Pearl assumes: knowing which concepts belong before building the causal model.
Glaser, B. & Strauss, A. (1967).The Discovery of Grounded Theory. Aldine. — Open coding and axial coding are the qualitative precedents. The Data Board operationalises these steps computationally — weeks compressed into a session.
Wittgenstein, L. (1922).Tractatus Logico-Philosophicus. — "The limits of my language are the limits of my world." The Data Board is a formal process for extending the analytical vocabulary — and therefore the analytical world.
Luhn, H.P. (1958). "A Business Intelligence System." IBM Journal of Research and Development. — Intelligence as guiding action toward a desired goal. The Data Board formalises the naming step that makes the goal speakable.
License & Attribution
Open source. Protected rights.
The Data Board methodology is released under the MIT License — free for commercial and non-commercial use, modification, and distribution globally.