HiconAgent: History Context-aware Policy Optimization for GUI Agents

Xurui Zhou1, Gongwei Chen1, Yuquan Xie1, Zaijing Li1, Kaiwen Zhou2,
Shuai Wang2, Shuo Yang1, Zhuotao Tian1, Rui Shao 1✉
1Harbin Institute of Technology, Shenzhen    2Huawei Noah’s Ark Lab
✉ Corresponding author  

Abstract

Graphical User Interface (GUI) agents require effective utilization of historical context to perform sequential navigation tasks. While incorporating past actions and observations can significantly improve decision-making, naively using full history leads to excessive computational overhead and potential distraction from irrelevant information. In this work, we introduce HiconAgent, a GUI agent trained with History Context-aware Policy Optimization (HCPO) for effective and efficient utilization of historical information. HCPO explicitly optimizes history usage in both sampling and policy updates by integrating two complementary components: (1) Dynamic Context Sampling (DCS) presents the agent with variable-length histories during sampling, enabling adaptive use of the most relevant historical context to improve sequential decision quality; (2) Anchor-guided History Compression (AHC) refines the policy update phase via a dual-branch optimization strategy, where the compressed branch drops history observations while keeping history actions as information flow anchors. The compressed and uncompressed branches are coupled through a history-enhanced alignment loss to enforce consistent history usage, achieving efficiency with minimal performance degradation. Extensive experiments on mainstream GUI navigation benchmarks demonstrate the strong performance of our model. Despite its smaller size, HiconAgent-3B outperforms GUI-R1-7B by +8.46% grounding and +11.32% step successful rate on GUI-Odyssey, while achieving comparable results on AndroidControl and AITW, with up to 2.47× computational speedup and 60% FLOPs reduction.

Introduction


The role of history usage in RL-based GUI agents remains largely underexplored. Most prior works adopt a simplified design in which history observations (past screenshots) are omitted, and only history actions are included as the input context. While this choice reduces memory and computational cost, it discards rich visual cues from past observations that are often essential for resolving ambiguous instructions, grounding visually similar elements, and maintaining temporal consistency across steps. Conversely, naively incorporating complete history, including both past actions and observations, substantially increases computational overhead due to the quadratic complexity of attention mechanisms and the large number of visual tokens from high-resolution screenshots. This trade-off between decision quality and efficiency motivates the development of methods that can effectively retain the most informative parts of historical context while mitigating redundancy. To this end, we propose History Context-aware Policy Optimization, a training framework designed to improve both the effectiveness and efficiency of history usage in GUI agents. As illustrated in the teaser, HCPO improves both the sampling and update phases of existing GUI RL framework through two complementary components: Dynamic Context Sampling and Anchor-guided History Compression.

Rethinking History Usage: Limitations of Fixed Context and the Anchoring Role of Actions


Different samples prefer different history lengths. Left: For each sample we evaluate a set of different history lengths $\tau$ and take the $\tau$ that yields the highest mean reward. The preferred $\tau$ differs across samples and action types. Right: Providing more history does not necessarily yield the optimal result, suggesting effective usage of historical information is under exploration.
Layer-wise token-drop analysis. Left: Schematic of the layer-wise token-drop probe, illustrating the information flow of image-drop and action-drop. Right: Dropping $A_{\mathrm{his}}$ at shallow depths ($k < 12$) causes a much larger decline than dropping $V_{\mathrm{his}}$. Even if rich visual information is retained, later layers cannot directly extract effective cues from $V_{\mathrm{his}}$ without the action anchors. As $k$ increases, the action-drop curve rises toward the image-drop curve and the image-action drop curve converges rapidly.

Overview Framework of HiconAgent


Overview of our history context-aware optimization framework for building HiconAgent. HCPO improves both the sampling and update phases of policy optimization by incorporating two key components: (1) Dynamic Context Sampling (DCS), which introduces varied history lengths during training to encourage context-effective decision-making, and (2) Anchor-guided History Compression (AHC), which adopts a dual-branch architecture where both branches share sampled responses and group-wise advantages. The compressed branch is trained using policy gradients, aligned with the uncompressed branch via a history-enhanced alignment loss.

Experiment

Table 1 Table 2
We present the main experimental results in Table 1 and Table 2 on three representative GUI navigation datasets: AndroidControl-High, AITW and GUI-Odyssey. Table 1 provides a detailed comparison under the same data scale and training settings, highlighting the effect of our history-aware optimization strategy against both supervised fine-tuning and reinforcement fine-tuning baselines. Table 2 further extends the comparison to recent advanced GUI agents of varying model sizes and training data volumes, demonstrating the generalization ability of our approach in out-of-distribution (OOD) scenarios.

Qualitative Results


To better understand how history length affects agent behavior, we provide a case study comparing the base model and our HiconAgent-3B under different history lengths $\tau \in \{0,1,2\}$. As shown in the figure, the base model performs correctly when using shorter contexts ($\tau=0$ or $\tau=1$), but fails when the history is extended to $\tau=2$, where the additional observations introduce distracting or misleading information, causing the model to attend to an incorrect UI element and produce the wrong action. In contrast, our model, trained with Dynamic Context Sampling, still produces the correct action when $\tau=2$. Since DCS exposes the agent to diverse and progressively biased history lengths during optimization, the model learns to effectively utilize extended context. This qualitative evidence supports our quantitative results, demonstrating that naively increasing history is suboptimal, whereas HCPO equips the agent with robustness across variable context windows and enables it to benefit from longer history when necessary.


Conclusion

In this paper, we present HiconAgent, a history-aware GUI agent trained with History Context-aware Policy Optimization. Through extensive empirical investigations, we first revisited how history is utilized in GUI reinforcement learning agents. Our two key studies revealed that different decision steps prefer different history lengths and historical actions serve as information flow anchors. By pairing DCS and AHC, our model outperforms larger models with fewer FLOPs. These results highlight HiconAgent as a practical path toward lightweight, high-performance GUI agents.