CHI '26 · Honorable mention · full-paper review · confidence medium-high

SemTabla: A Human-in-the-Loop Framework for Semantic Enrichment and Validation of Data Tables

Zhuochen Jin , Yingjie Mi , Yehang Zhu , yichen yao , Chongyang Yu , Ke Xu

SemTabla is a credible CHI-style systems paper: it takes a real pain point in Table QA, proposes a human-in-the-loop semantic enrichment workflow, and backs it with usability and downstream-task evidence. The novelty is strongest as a framework plus interaction design, not as a standalone algorithmic breakthrough.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: method knowledge typical · 29/268
Novelty type: framework typical · 59/268
Abstraction level: system typical · 61/268
Generalization target: task class typical · 63/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: moderate typical · 105/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

SemTabla reads as a solid CHI systems contribution because it targets a concrete and important failure mode in table question answering: schema metadata alone often do not expose the business semantics needed for reasoning. The paper’s core move is to replace the usual “automate everything” stance with an interactive human-in-the-loop workflow that combines hierarchical semantic extraction, rare-row sampling, and validation/refinement through a visual interface. That is a meaningful design choice because it treats semantic enrichment as an iterative sensemaking task rather than a one-shot prediction problem. The evidence packet supports this framing well: the abstract explicitly names the three-part contribution, and the evaluation spans usability, downstream QA performance, and latency on larger datasets. From a CHI perspective, that makes the paper more compelling as a framework and system contribution than as a pure ML method paper. The main caveat is that the validation is still bounded: the user study is relatively small, the QA evaluation is on the BIRD dev split, and the deployment story remains prototype-like, with local file upload and a scope centered on tables with missing semantics. So the paper’s claims are credible and well aligned with the evidence, but the generalization should be read as task-class level rather than field-wide. In short, this is a good example of interaction-centered data tooling with practical novelty and reasonable validation, but not a sweeping algorithmic advance.

What Changed

Canon before

Schema-only metadata and one-shot inspection are insufficient for recovering table semantics in Table QA settings; prior automated enrichment is limited by data utilization, feature coverage, and interpretability.

Departure from common sense

The paper argues that table semantics cannot be recovered reliably from schema metadata alone and instead need an interactive human-in-the-loop workflow with iterative sampling and validation. That departs from the common assumption that automated enrichment or a single pass over representative rows is enough for semantic understanding.

Actual novelty

SemTabla’s novelty is the combination of a hierarchical semantic enrichment framework with a feature-specific iterative sampling loop that targets critical but rare rows, plus an interface for visualization, validation, and refinement. The contribution is not just automation, but a structured human-in-the-loop process for semantic table understanding.

Evidence

The paper supports its claims with a pilot study, a 18-participant user study, a performance study on large datasets, and a BIRD dev-split Table QA ablation. The evidence is strongest for usability, workflow value, and downstream QA gains, while broader claims about general semantic understanding remain bounded by prototype scope and task-specific evaluation.

“ Our key contributions include: (1) a hierarchical framework for extracting semantic attributes; (2) a novel sampling method that identifies critical but rare row instances; and (3) an interactive interface that supports visualization, validation, and refinement of the extracted table semantics”

actual novelty · Abstract/Introduction + Section 4.1 (Semantic Enrichment) + Section 4.1.1 (Sampling strategy) · confidence 0.74

“ To overcome these limitations, we propose SemTabla, an interactive system that employs a human-in-the-loop mechanism to extract comprehensive and interpretable semantics from tabular data”

departure from common sense · Abstract/Introduction + Semantic Validation + Sampling strategy · confidence 0.66

“ Crossref Google Scholar [20] Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts”

limitation · Discussion 6.3 Limitation and future work · confidence 0.84

“ Information & Contributors Bibliometrics & Citations Reading Options References Figures Tables Media Share Abstract Data tables are widely used to record critical information, enabling decision”

validation scope · Evaluation (5.2, 5.3, 5.4) · confidence 0.77

Limits

Method limits

The method depends on iterative human validation and a prototype workflow, so its efficiency and robustness are tied to the quality of user feedback and the chosen sampling strategy. The paper also frames the approach around multi-table data modeling with missing semantics, which narrows the method’s intended scope.

Deployment limits

The current prototype requires users to upload local data table files, which limits scale. The paper also notes that if SQL scripts are available, they could improve comprehension of computation logic and lineage, implying the current deployment does not fully exploit all available provenance.

Boundary conditions

SemTabla is positioned for scenarios with missing or weak table semantics, especially multi-table data modeling. Its utility may be lower for well-documented single-table datasets with clear, stable semantics, and for settings where record-level observation is insufficient without SQL lineage.

Position in field

This sits at the intersection of semantic table understanding, human-in-the-loop data curation, and Table QA support. The paper’s main field contribution is a practical framework for enriching tables with interpretable semantics rather than a new benchmark or purely automated model.

Abstract

Data tables are widely used to record critical information, enabling decision-makers to derive insights through table question answering (Table QA). However, the metadata from table schemas alone often fail to capture the underlying business semantics embedded in the tabular data, leading to reasoning errors. Existing automated approaches to semantic enrichment face challenges in insufficient data utilization, narrow feature coverage, and limited interpretability. To overcome these limitations, we propose SemTabla, an interactive system that employs a human-in-the-loop mechanism to extract comprehensive and interpretable semantics from tabular data. Our key contributions include: (1) a hierarchical framework for extracting semantic attributes; (2) a novel sampling method that identifies critical but rare row instances; and (3) an interactive interface that supports visualization, validation, and refinement of the extracted table semantics. A user study confirmed the system’s usability, and quantitative experiments demonstrate that the extracted semantics significantly enhance the reasoning capabilities of large language models.