HiFiGaze: Improving Eye Tracking Accuracy Using Screen Content Knowledge
HiFiGaze is a credible CHI-style method paper: the novelty is not a new sensor, but a clever re-reading of an existing signal. Its strongest contribution is showing that screen content can help segment eye reflections and improve gaze estimation, with measured gains and a clear account of when the approach breaks down.
Axes Lens
Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.
Contribution shape
- Knowledge form
- technical knowledge typical · 50/268
- Novelty type
- method typical · 21/268
- Abstraction level
- system typical · 61/268
- Generalization target
- task class typical · 63/268
- Validation mode
- mixed methods typical · 136/268
Evidence profile
- Evidence strength
- moderate typical · 105/268
- Claim alignment
- strong typical · 231/268
- Overclaim risk
- medium typical · 210/268
Review Summary
HiFiGaze’s contribution is best understood as a methodological reframing of gaze estimation on consumer devices. Instead of treating the eye image as the sole source of information, the paper uses the device’s own knowledge of what is on screen to interpret the reflection visible in the eye. That is a genuinely non-obvious move because the reflection is usually treated as nuisance structure or ignored entirely; here it becomes the carrier of gaze-relevant geometry. The paper’s novelty is therefore not a new theoretical account of gaze, but a practical method that combines high-resolution front cameras with screen-content-aware segmentation to recover the screen-relative gaze target. The evidence is reasonably aligned with that claim: the paper reports a user study and a supplemental camera-location study, with the best model reducing mean tracking error by about 18% over an appearance-based baseline and bottom-mounted cameras adding another 10-20% improvement. At the same time, the paper is candid about boundary conditions. It notes degradation with darker screen content, partial occlusion from eyelids or eyelashes during downward gaze, dependence on robust iris center estimation, and limited coverage of glasses users in the initial dataset. So the paper reads as a strong technical contribution with clear empirical support, but not as a universal solution; its value is in showing that screen content is a useful signal under the right device and viewing conditions.
What Changed
Canon before
Prior gaze estimation on consumer devices largely relied on appearance-based eye imagery and camera-only cues, with screen-content-aware use of the device’s own display reflection not established as the central signal.
Departure from common sense
The paper’s key move is to use the device’s own screen content as an input to interpret the reflection in the eye, rather than treating the eye image as self-contained. That is a non-obvious inversion of the usual appearance-based framing: the screen is not just the target of gaze, but also a source of structured information for segmenting the reflection that encodes gaze.
Actual novelty
The core novelty is a screen-content-aware gaze estimation method that leverages high-resolution user-facing cameras to capture the screen’s reflection in the eye and then uses knowledge of what is displayed to robustly segment that reflection. The paper positions this as a practical way to recover gaze-relevant structure from otherwise ambiguous eye imagery.
Evidence
The paper grounds its claim in a user study and a supplemental camera-location study. The reported best model reduces mean tracking error by about 18% versus an appearance-based baseline, and the supplemental study reports an additional 10-20% improvement when the camera is placed at the bottom of the device. The evidence supports a concrete method contribution with measured gains, while also showing sensitivity to device configuration and content conditions.
“, smartphones, laptops, and desktops — 4K or greater in high-end devices — such that it is now possible to capture the 2D reflection of a device’s screen in the user’s eyes.”
actual novelty · Abstract + Introduction (core insight + contrast to prior screen-glint work) · confidence 0.72
“ Crucially, however, the device knows what is being displayed on its own screen — in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user’s screen-relative gaze target”
departure from common sense · Abstract/Introduction narrative · confidence 0.76
“ This is particularly attractive as it provides the simplest pipeline with virtually no preprocessing (no iris center estimation, no computation of the reflection vector) and can work on any input”
limitation · Limitations & Future Work + Results/Discussion failure modes · confidence 0.62
“ Our best performing model reduces mean tracking error by ~18% compared to a baseline appearance-based model”
validation scope · User Study results + Supplemental Study · confidence 0.84
Limits
Method limits
The method depends on robust iris center estimation and on being able to capture usable screen reflections with sufficiently high-resolution user-facing cameras. Performance is reported to degrade with darker screen content and when the reflection is partially occluded by eyelids or eyelashes during downward gaze.
Deployment limits
The approach is tied to consumer devices with high-quality front cameras and to configurations where the screen reflection is visible enough to segment. The reported supplemental benefit from bottom-mounted cameras suggests deployment geometry matters, and the initial dataset excludes glasses users, limiting immediate applicability.
Boundary conditions
The paper notes darker screen content as a failure mode and reports partial occlusion from upper eyelids/eyelashes during downward gaze. It also indicates that screen-to-eye distance and camera placement affect performance, so the method is not uniform across all device setups or viewing conditions.
Position in field
This work sits at the intersection of gaze estimation and screen-content-aware sensing. It extends appearance-based eye tracking by exploiting a device-owned signal that prior methods typically ignore, and it frames screen reflection as a structured cue rather than a nuisance artifact.