The Databoard

Open Method · Version 1.0

Definition

A shared human-AI process.
The human builds a minimal vocabulary —
the AI constructs a minimal ontology from it (the simplest map of how the relevant concepts relate) —
on the fly, from whatever data it receives,
structured, coherent, and purpose-built
for the question at hand.

The human brainstorms candidate words from domain knowledge and imagination — concepts that may have no relationship to any column or field in the source material. The AI takes that vocabulary and constructs a minimal ontology on the fly: checking each word for semantic proximity and coherence with the others, assessing conceptual coverage, and building the tightest, most grounded conceptual network the evidence supports.

The AI builds this ontology from whatever data it receives. With a structured dataset, centrality is estimated from frequency distributions. With a text corpus, it is estimated from co-occurrence and semantic density. The form of the data changes. The standard for what makes a good ontology — coherent, grounded, covering the meaningful dimensions — does not.

Every tile is a word that survived that process, weighted by its centrality in the evidence. Green means dominant. Yellow means present but not decisive. Red means marginal. A word with a weight — that is what no other process produces.

The Databoard does not replace analysis tools. It operates upstream of them. Before you measure relationships between variables, you need to know whether you have the right variables. That is what the Databoard does.

The Problem

What graphs
cannot show.

In 1858, Florence Nightingale invented the modern data visualization — not because she liked graphs, but because words alone couldn't carry data to the people who needed to act on it. She needed a workaround for the limits of language.

170 years later, we face the opposite problem. We have more data than ever — and less shared language to reason about it. Graphs show us patterns. AI systems summarize them. Neither asks the question that matters: are these even the right concepts?

Data visualization has always worked through three fundamental operations. Contrast — this differs from that, expressed through the height of bars and the distance between points. Alignment — these move together, expressed through slope and correlation. Productive tension — this should be one way but is another, the surprise in the pattern that generates the next question.

Every graph expresses one or two of these operations through geometry. The Databoard expresses all three through weighted vocabulary. VS is contrast. AND is alignment. BUT is productive tension — the collision of two true things that together produce a question nobody thought to ask. Graphs can show contrast and alignment. The Databoard foregrounds narrative tension — BUT — as a first-class analytical operation, making it explicit in the vocabulary itself rather than leaving it implicit in the interpretation.

"The limits of my language are the limits of my world." — Wittgenstein, 1922. AI changed the limits of our world. The limits of the language are now the limits of a new one.

The Joker Words

VS, AND, BUT —
the narrative grammar.

The joker words are not analytical tools. They are a human technique for generating vocabulary that works together as a coherent set — words that can compose into narratives, not just individually valid words.

When a human frames a vocabulary proposal using a joker word, they signal to the AI the kind of ontological relationship they are looking for. The AI draws on its implicit domain knowledge — its training corpus — to find words that can stand in those relationships. The human provides the direction. The AI searches the domain.

VS Contrast. What is the opposite of this concept? Forces vocabulary that reveals opposition in the data. "Women survived more VS men" — now the board needs words that can hold both sides of that opposition simultaneously.

AND Alignment. What combines with this concept to produce something richer? Forces vocabulary that accumulates into patterns. "Language barrier AND unaware" — two tiles that together tell one story neither could tell alone.

BUT Productive tension. What should be true but isn't? The engine of every story ever told. "Class predicts survival BUT what does class actually proxy for?" — the BUT surfaces a question the data contains but no graph has shown.

The vocabulary that results from joker-word prompting is not just a list of valid words. It is a set of words that can compose into narratives — because the joker words shaped the search from the beginning.

The joker words are not part of the board. The board is tiles with color weights — a completed minimal ontology. The joker words are an optional subsequent layer: a storytelling and analysis technique a facilitator may apply to the completed board. They may or may not be used. The board is complete and valid without them.

The LLM's Role

Encoded associations
as evaluation engine.

LLMs do not contain explicit domain ontologies — but they encode statistical associations learned during training. When given a proposed vocabulary and a context brief, the AI uses those encoded associations to evaluate each word: does this concept fit coherently with the others? Does it belong in this domain? Does it add a meaningful dimension or duplicate something already present?

This is not automatic computation of metrics. It is the AI's semantic knowledge — built from its training corpus — applied to the specific vocabulary the human has proposed. The result is a proposed evaluation. Humans confirm or override. The AI does not decide. It recommends.

The color weight — green, yellow, red — reflects the concept's centrality in the evidence provided. With a structured dataset, quantitative signals like frequency distributions can directly inform the weighting. With a text corpus or qualitative material, the AI estimates centrality from its encoded associations with the domain. When quantitative signals are available, they should be used. When they are not, the AI's estimate is a starting point for human judgment — not a final answer.

This means the Databoard is not limited to structured datasets. It works for any subject of inquiry: a researcher working from interview transcripts, a policy team reasoning from reports, a data scientist with a structured dataset, or any combination. The input changes. The process does not.

When the AI evaluates "language barrier" for a 1912 passenger ship dataset, it finds that concept consistent with what it knows about emigration patterns, multilingual populations, and crisis communication. It proposes: grounded. The human confirms or challenges.

When it evaluates "equal access," it finds the concept inconsistent with what it knows about class-based deck access on that ship. It proposes: rejected. Again — the human confirms or challenges. The AI never has the final word.

The context brief at Stage B is what activates the right associations. Without it, the AI has no domain anchor. Two or three sentences is enough.

The Method

Five stages.
One principle.

The Databoard method proceeds in five sequential stages. Each stage is necessary. Skipping any stage — particularly Stage C — undermines the integrity of the analysis that follows.

Raw Data — The Starting Point

The process begins with a dataset, AI-generated output, or body of discourse in its original, uninterpreted form. No meaning is assigned. No vocabulary is assumed.

Before proposing vocabulary, the group takes note of what the data actually contains: column names, value ranges, what is measurable and what is absent. This prevents proposals that simply relabel existing columns — which is not vocabulary construction, it is transcription.

Titanic Dataset Stage A

The raw data contains six columns. These are the only concepts the dataset formally knows. Everything that actually explains survival — distance from a lifeboat, whether a passenger understood the crew, whether they knew the ship was sinking — does not exist here yet.

pclasssurvivedsex ageembarkedfare

Human Proposals — Powered Thinking

Before proposing vocabulary, the group provides a context brief — two or three sentences describing what the data is about and what question they are trying to answer. This activates the LLM's implicit domain knowledge.

Participants then propose words that make each data concept human-readable. Words that do not exist in the dataset are not just permitted — they are the point. Domain knowledge, lived experience, and intuition all enter here.

To generate vocabulary that works together as a coherent set, participants use joker words as thinking prompts. A joker word signals to the AI the kind of ontological relationship being sought — and the AI searches its domain knowledge accordingly.

Titanic Proposals Stage B

Context brief: "Titanic passenger survival data from 1912. We want to understand what actually determined survival — beyond the raw class and gender numbers."

Joker-word prompts used: "Women survived more VS men who died alone" → vocabulary needed for location and access. "Language barrier AND unaware" → two words that might work together. "Class predicts survival BUT what does class proxy for?" → forces vocabulary that names the underlying mechanism.

upper deckbelow deck near exitinformed early unawareunderstood crew language barrieralone with familyequal access

None of these words exist in the dataset. They come from human domain knowledge — historical, contextual, expert. The Databoard explicitly allows documented contextual knowledge to enter the vocabulary even when absent from the dataset variables. That is not a weakness. It is the point.

AI Evaluation — The Filter

The AI evaluates each proposed word using its encoded semantic knowledge and the context brief provided. It proposes a verdict for each word. Three questions structure the evaluation:

Does it belong in this domain? Based on its training associations, does the concept fit the domain the context brief describes — or is it imported from a different conceptual world?

Does it cohere with the other proposed words? Does it add a distinct dimension, or does it duplicate something already present? Does it contradict an accepted word?

Does it describe a condition or embed a conclusion? "Below deck" describes a physical location. "Disadvantaged" describes a judgment about that location. The first belongs on a Databoard. The second does not.

Solid border — passed evaluation, enter the board

Dashed border — valid but carries an assumption worth naming

Strike-through — rejected, does not belong

Green fill — dominant in the evidence

Yellow fill — present but not decisive

Red fill — marginal or rare

Border style = validity. Fill color = frequency weight. Two distinct signals on one tile.

Color and border are two distinct signals. Border style reflects validity — solid means passed, dashed means carries an assumption, strike-through means rejected. Fill color reflects frequency weight in the evidence — green means dominant, yellow means present, red means marginal. A word can be grounded but rare. A word can be frequent but carry an assumption. The tile shows both.

The AI runs a two-pass evaluation. First pass: each word individually against the domain ontology. Second pass: the vocabulary set as a whole — does it cover the right conceptual dimensions? Are there gaps the human brainstorm missed? The AI may propose words to fill those gaps, which the group then accepts or rejects.

The AI's evaluation is not final. Participants may challenge any verdict. A rejected word is not a failure — it reveals an assumption the group was making without knowing it.

Titanic Evaluation Stage C

upper deckbelow deck near exitinformed early unawareunderstood crew language barrieralone with familyequal access

"Equal access" rejected — it asserts a conclusion rather than describing a condition. "With family" yellow — valid but carries an assumption about whether family proximity affected behaviour. "Language barrier" green — documented in survivor testimony, consistent with the emigrant composition of 3rd class.

The Board — Shared Vocabulary

Only evaluated tiles appear on the Databoard. The board is the group's shared, minimal ontology of the data — human-readable, collectively owned, and frequency-weighted. It is complete at this stage. No further steps are required for the board to be valid and usable.

Dataset variables — class, gender, age — appear on the board alongside human vocabulary, weighted the same way. This is the critical move: it allows the group to see which dataset variables are actually differentiating and which are proxies for something the vocabulary has now named. The board is what the group thinks with.

Titanic Databoard Stage D

dataset

1st class2nd class 3rd classfemalemale

location

upper deckbelow decknear exit

awareness

informed earlyinformed lateunaware

communication

understood crewlanguage barrier

company

with familyalone

Class tiles are yellow — present but not decisive. Location and communication tiles carry the weight. The board already suggests where the analysis will go before a single split is made.

Storytelling — Optional Further Analysis

The board is complete at Stage D. What follows is optional — a storytelling and analysis layer a facilitator may choose to apply. The joker words used during brainstorming in Stage B can now be used again to structure a reading of the completed board.

VS — split the board by outcome or group. What is green on one side and red on the other? That contrast is the first story. AND — which tiles are green on both sides? Those are the variables that persist regardless of outcome. BUT — which two tiles together produce a surprise that neither produces alone? That is where the insight lives.

Because the vocabulary has been stress-tested against domain ontologies and weighted by data, the stories that emerge are traceable, shared, and defensible. They do not require interpretation — they require reading.

Titanic — Split Board Analysis Stage E

▲ Who Survived

1st class3rd class femalemale upper deckbelow deck near exitinformed early unawareunderstood crew language barrieralone

but

and

▼ Who Died

1st class3rd class femalemale upper deckbelow deck near exitinformed early unawareunderstood crew language barrieralone

VS — 3rd class is yellow on the death board. "Below deck" is green. Class appears on both boards at medium weight — it is not the differentiating variable. Position is. Class was traveling with the real variable. It was not the real variable.

BUT — "Understood crew" is green on survival. "Language barrier" is green on death. Same dimension, perfectly inverted. The word that explains this inversion does not exist in any column of the original dataset. It came from human knowledge applied through a joker-word prompt.

AND — "Language barrier" and "unaware" are both green on the death board. Two separate tiles — but together they tell one story: many passengers could not understand the crew and did not know the ship was sinking. Neither tile alone explains the death rate. Together they do. No graph can show this. The Databoard was built for it.

"Once the vocabulary is sound, the analysis is inevitable, objective, and sound."

Core Principles

The human-AI
co-authorship principle.

The Databoard method is built on a specific understanding of the relationship between human intelligence and AI. Neither alone is sufficient. The method is designed to make their collaboration explicit, structured, and traceable.

Principle 01

Humans author meaning

All vocabulary proposals originate with humans. Domain knowledge, context, and lived experience cannot be replaced. The human brings what the data cannot contain.

Principle 02

AI proposes. Humans decide.

The AI proposes evaluations — this word is grounded, this one carries an assumption, this one does not belong. Humans confirm or override every verdict. The AI reduces blindspots. It does not replace judgment. This is non-negotiable in the method.

Principle 03

Vocabulary before analysis

No analysis begins until the conceptual vocabulary is agreed upon. This sequencing is non-negotiable. It is where the method's integrity lives.

Principle 04

Subjectivity made legible

The goal is not to eliminate subjectivity but to make it visible and negotiable. Yellow and red tiles do not disqualify — they invite examination before they harden into conclusions.

Principle 05

Shared ownership

The board belongs to the group. No individual owns the vocabulary. Every participant can challenge, propose, and deliberate. The method is participatory by design.

Principle 06

Traceable reasoning

Every conclusion is traceable back to the tiles it was built from. Decisions are not just made — they are legible. This is accountability built into the process.

Principle 07

The AI uses encoded associations, not pre-built ontologies

LLMs do not contain explicit domain ontologies — but they encode statistical associations learned during training. The AI applies those associations to evaluate vocabulary coherence and domain fit. It proposes evaluations. Humans decide. Color weight reflects evidence centrality — from frequency data where available, from the AI's domain associations where not.

Principle 08

The board is complete without storytelling

The Databoard is complete at Stage D — a minimal ontology, frequency-weighted, collectively owned. The joker words are a separate optional layer for storytelling and further analysis. The board does not require them to be valid. VS, AND, and BUT may be applied to structure a narrative from the board — but the board stands on its own.

License

Open method.
Protected rights.

The Databoard method is released as an open practice. Free to use, adapt, teach, and build upon — with two conditions.

Creative Commons Attribution — Non-Commercial 4.0 International (CC BY-NC 4.0)

You are free to:

✓ Use the Databoard method in your own work and research
✓ Adapt, extend, or build upon the method
✓ Share and redistribute in any medium
✓ Teach in educational or civic contexts

Under the following conditions:

① Attribution. Credit must be given to Ruth Aharon with a link to thedataboard.ai.

② Non-Commercial. Commercial use requires explicit written permission from the author.

Full license: creativecommons.org/licenses/by-nc/4.0

Citation

How to cite
this method.

APA 7th Edition

Aharon, R. (2025). The Databoard: An AI-powered verbal visualization of data (Version 1.0). thedataboard.ai. CC BY-NC 4.0.

Short Attribution

The Databoard Method — Ruth Aharon (2025) · thedataboard.ai · CC BY-NC 4.0

The Databoard

What graphscannot show.

VS, AND, BUT —the narrative grammar.

Encoded associationsas evaluation engine.

Five stages.One principle.

The human-AIco-authorship principle.

Open method.Protected rights.

How to citethis method.