CHI '26 · Honorable mention · full-paper review · confidence medium-high

HiFiGaze: Improving Eye Tracking Accuracy Using Screen Content Knowledge

Taejun Kim , Vimal Mollyn , Riku Arakawa , Chris Harrison

HiFiGaze is a credible CHI-style method paper: the novelty is not a new sensor, but a clever re-reading of an existing signal. Its strongest contribution is showing that screen content can help segment eye reflections and improve gaze estimation, with measured gains and a clear account of when the approach breaks down.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: technical knowledge typical · 50/268
Novelty type: method typical · 21/268
Abstraction level: system typical · 61/268
Generalization target: task class typical · 63/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: moderate typical · 105/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

HiFiGaze’s contribution is best understood as a methodological reframing of gaze estimation on consumer devices. Instead of treating the eye image as the sole source of information, the paper uses the device’s own knowledge of what is on screen to interpret the reflection visible in the eye. That is a genuinely non-obvious move because the reflection is usually treated as nuisance structure or ignored entirely; here it becomes the carrier of gaze-relevant geometry. The paper’s novelty is therefore not a new theoretical account of gaze, but a practical method that combines high-resolution front cameras with screen-content-aware segmentation to recover the screen-relative gaze target. The evidence is reasonably aligned with that claim: the paper reports a user study and a supplemental camera-location study, with the best model reducing mean tracking error by about 18% over an appearance-based baseline and bottom-mounted cameras adding another 10-20% improvement. At the same time, the paper is candid about boundary conditions. It notes degradation with darker screen content, partial occlusion from eyelids or eyelashes during downward gaze, dependence on robust iris center estimation, and limited coverage of glasses users in the initial dataset. So the paper reads as a strong technical contribution with clear empirical support, but not as a universal solution; its value is in showing that screen content is a useful signal under the right device and viewing conditions.

What Changed

Canon before

Prior gaze estimation on consumer devices largely relied on appearance-based eye imagery and camera-only cues, with screen-content-aware use of the device’s own display reflection not established as the central signal.

Departure from common sense

The paper’s key move is to use the device’s own screen content as an input to interpret the reflection in the eye, rather than treating the eye image as self-contained. That is a non-obvious inversion of the usual appearance-based framing: the screen is not just the target of gaze, but also a source of structured information for segmenting the reflection that encodes gaze.

Actual novelty

The core novelty is a screen-content-aware gaze estimation method that leverages high-resolution user-facing cameras to capture the screen’s reflection in the eye and then uses knowledge of what is displayed to robustly segment that reflection. The paper positions this as a practical way to recover gaze-relevant structure from otherwise ambiguous eye imagery.

Evidence

The paper grounds its claim in a user study and a supplemental camera-location study. The reported best model reduces mean tracking error by about 18% versus an appearance-based baseline, and the supplemental study reports an additional 10-20% improvement when the camera is placed at the bottom of the device. The evidence supports a concrete method contribution with measured gains, while also showing sensitivity to device configuration and content conditions.

“, smartphones, laptops, and desktops — 4K or greater in high-end devices — such that it is now possible to capture the 2D reflection of a device’s screen in the user’s eyes.”

actual novelty · Abstract + Introduction (core insight + contrast to prior screen-glint work) · confidence 0.72

“ Crucially, however, the device knows what is being displayed on its own screen — in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user’s screen-relative gaze target”

departure from common sense · Abstract/Introduction narrative · confidence 0.76

“ This is particularly attractive as it provides the simplest pipeline with virtually no preprocessing (no iris center estimation, no computation of the reflection vector) and can work on any input”

limitation · Limitations & Future Work + Results/Discussion failure modes · confidence 0.62

“ Our best performing model reduces mean tracking error by ~18% compared to a baseline appearance-based model”

validation scope · User Study results + Supplemental Study · confidence 0.84

Limits

Method limits

The method depends on robust iris center estimation and on being able to capture usable screen reflections with sufficiently high-resolution user-facing cameras. Performance is reported to degrade with darker screen content and when the reflection is partially occluded by eyelids or eyelashes during downward gaze.

Deployment limits

The approach is tied to consumer devices with high-quality front cameras and to configurations where the screen reflection is visible enough to segment. The reported supplemental benefit from bottom-mounted cameras suggests deployment geometry matters, and the initial dataset excludes glasses users, limiting immediate applicability.

Boundary conditions

The paper notes darker screen content as a failure mode and reports partial occlusion from upper eyelids/eyelashes during downward gaze. It also indicates that screen-to-eye distance and camera placement affect performance, so the method is not uniform across all device setups or viewing conditions.

Position in field

This work sits at the intersection of gaze estimation and screen-content-aware sensing. It extends appearance-based eye tracking by exploiting a device-owned signal that prior methods typically ignore, and it frames screen reflection as a structured cue rather than a nuisance artifact.

Abstract

We present a new and accurate approach for gaze estimation on consumer computing devices. We take advantage of continued strides in the quality of user-facing cameras found in e.g., smartphones, laptops, and desktops — 4K or greater in high-end devices — such that it is now possible to capture the 2D reflection of a device's screen in the user's eyes. This alone is insufficient for accurate gaze tracking due to the near-infinite variety of screen content. Crucially, however, the device knows what is being displayed on its own screen — in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user's screen-relative gaze target. We explore several strategies to leverage this useful signal, quantifying performance in a user study. Our best performing model reduces mean tracking error by ~18% compared to a baseline appearance-based model. A supplemental study reveals an additional 10-20% improvement if the gaze-tracking camera is located at the bottom of the device.