CHI '26 · Best paper · full-paper review · confidence high

iTagPDF: Towards Finally Automating PDF Accessibility

Peya Mowar , Aaron Steinfeld , Jeffrey P Bigham

This is a strong systems paper because it reframes PDF accessibility from a purely visual cleanup task into a semantic-preservation problem. The contribution is concrete, well-motivated, and backed by explicit evaluation and candid limitations, though current applicability still depends heavily on source availability and pipeline robustness.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: generative knowledge typical · 35/268
Novelty type: system architecture typical · 35/268
Abstraction level: artifact typical · 19/268
Generalization target: design family typical · 38/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

iTagPDF stands out because it attacks a long-frustrating accessibility problem at the right representational level. Rather than trying to infer everything from rendered PDF pages alone, the paper argues that much of the needed semantic structure already exists upstream in authoring artifacts and is merely discarded during rendering. The system contribution is therefore not just another tagging tool, but an architectural shift: combine visual evidence from the PDF, parse the source document, and align the two with an LLM so that accessibility metadata can be reconstructed with more fidelity. That framing is persuasive and well supported by the paper’s own description of prior remediation workflows as tedious, fragile, and overly dependent on visual inspection. The evaluation basis described in the focused sections is also substantial enough to justify taking the artifact seriously. The authors report a manual evaluation on a dataset of ACM research papers with source files, adapt metrics to accessibility-specific tagging concerns, and position their results as a baseline for the community. Just as important, the discussion and limitations sections are unusually useful: they narrow the true scope of the contribution to LaTeX-authored research papers, acknowledge rigidity in the heuristic-plus-LLM mapping process, note nondeterminism, and admit dependence on off-the-shelf detectors and OCR. They also state that some tags are still unsupported and that authors must review generated alt text. Those caveats prevent the work from being read as a universal solution. My overall assessment is that this is a meaningful best-paper-level contribution because it offers a compelling systems architecture, a credible empirical setup, and a field-shaping reframing of what automated PDF accessibility should optimize for. The main caution is external validity: the strongest claims are best interpreted as applying to source-available scholarly PDFs rather than all PDFs in the wild. Even so, the paper establishes an important direction for accessibility tooling and publishing workflows.

What Changed

Canon before

The dominant baseline assumption is that automating PDF accessibility remediation remains intractable because tags are generated primarily from the visual rendering of PDFs, which discards semantic authoring structure and requires tedious manual correction.

Departure from common sense

The paper argues that PDF accessibility automation should not be treated as a visual-only remediation problem; instead, semantic structure preserved in source documents can be aligned with rendered PDF content to improve tagging and reading order substantially.

Actual novelty

The main contribution is iTagPDF, a system that combines PDF visual rendering, source-document parsing, and LLM-based alignment to generate accessibility metadata including tags, reading order, and content-specific metadata more accurately than visual-only approaches.

Evidence

The paper grounds its claims in a manual evaluation on ACM research papers with source files, adapted accessibility-oriented metrics, and comparative discussion of performance. The strongest evidence supports the architectural novelty of combining visual and source representations and the evaluation scope. Limitations are explicitly discussed, especially dependence on LaTeX sources, heuristics plus LLM alignment, and reliance on off-the-shelf modules.

“ Our method (see Figure 3) comprises of three steps: i) processing the visual rendering (PDF pages), ii) parsing the source document (e.g., LaTeX files), and finally, iii) aligning both representations using a large language model (LLM).”

actual novelty · 3 Combining PDF Visual and Source Representations · confidence 0.97

“We propose a new method that combines the visual and source representations of a PDF document to gather semantic context that can then be used further for tagging a PDF far more accurately than visual-only methods.”

departure from common sense · 3 Combining PDF Visual and Source Representations · confidence 0.96

“ Firstly, our work focuses on research paper PDFs authored in LaTeX”

limitation · 7.1 Limitations & Future Work · confidence 0.95

“We performed a manual evaluation of iTagPDF by assembling a dataset of ACM research papers along with their source files. We adapted and redefined classical machine learning (ML) metrics to suit the specific challenges of PDF tagging, providing a more accurate estimate of automated tagging performance. Our system’s results serve as a baseline for the broader research community”

validation scope · 5 Experiments · confidence 0.92

Limits

Method limits

The current implementation depends on heuristics and an LLM-in-the-loop for mapping visual and source representations, can fail when reliable mappings are unavailable such as for figures or captions, is not strictly deterministic, and is bounded by the quality of off-the-shelf detection and OCR components.

Deployment limits

The system currently focuses on research paper PDFs authored in LaTeX and assumes access to corresponding source files; extending to Word, InDesign, and broader document domains remains future work, and some outputs such as alt text still require author review.

Boundary conditions

The approach is most applicable when a PDF has an accessible source representation with recoverable semantics and when visual detection plus OCR can anchor content reliably. Performance is weaker when source mappings are missing or ambiguous, especially for figures, captions, and unsupported tag types such as links.

Position in field

This paper pushes the field from post-hoc visual remediation toward semantic preservation across authoring and rendering pipelines. Its significance lies less in a new accessibility principle than in demonstrating a practical architecture that reframes automated PDF tagging as a multimodal alignment problem.

Abstract

Most academic research is ultimately disseminated through documents in the PDF format. This format has advantages in flexibility and portability, but presents challenges for accessibility that have stubbornly resisted solutions despite decades of attempts. Tagging PDFs is hard to automate because tags are currently generated visually, not semantically, which makes the output cluttered and manual correction tedious and error-prone. Ironically, this semantic structure already exists during authoring but is discarded during PDF rendering. This raises an obvious question, can we use this lost semantic information to better automate tagging in PDFs? In this paper, we develop iTagPDF, a system that refines generated metadata using the semantics in the source documents of research papers. We demonstrate that the metadata generated by iTagPDF already surpasses what authors currently submit to ACM conferences on many criteria. Our approach represents a concrete step toward finally automating accessibility remediation in research paper PDFs.