$$W(x, y) = \alpha \cdot \exp\left(-\frac{(x - x_0)^2 + (y - y_0)^2}{2\sigma^2}\right)$$
World models are emerging as a transformative paradigm in artificial intelligence, enabling agents to construct internal representations of their environments for predictive reasoning, planning, and decision-making. By learning latent dynamics, world models provide a sample-efficient framework that is especially valuable in data-constrained or safety-critical scenarios. In this paper, we present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning. We compare and distinguish world models from related concepts such as digital twins, the metaverse, and foundation models, clarifying their unique role as embedded cognitive engines for autonomous agents. We further propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization, particularly in low-altitude wireless networks (LAWNs). Through a weather-aware UAV trajectory planning case study, we demonstrate the effectiveness of our framework in improving learning efficiency and decision quality.
We propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization, particularly in low-altitude wireless networks (LAWNs). Furthermore, we present a case study on weather-aware UAV trajectory planning and demonstrate how the proposed framework leverages a world model to enhance wireless optimization.
Figure 1: The workflow of the proposed framework. The left part is the model structure of Wireless Dreamer, including a world model, Q-network, and target Q-network. The bottom part presents the continuous processes of Wireless Dreamer. The right part illustrates the weather-aware tracking planning in a UAV-assisted scenario in LAWNs.
We consider a UAV-assisted wireless communication scenario in LAWNs, where a single UAV acts as a mobile base station to provide downlink coverage for multiple ground users under dynamic and spatially varying weather conditions, as shown in Figure 1. The UAV traverses a two-dimensional area over a fixed time horizon and dynamically adjusts its position to optimize communication performance while responding to evolving environmental factors such as wind intensity.
The WeatherAwareUAVEnv is a Gym-compatible simulator for UAV-based wireless optimization in dynamic environments. It supports OFDMA channel modeling, energy constraints, and weather dynamics.
We adopt a drifting Gaussian hotspot to simulate dynamic weather disturbances. At each step, the storm center moves with drift and noise, generating a rainfall map:
$$W(x, y) = \alpha \cdot \exp\left(-\frac{(x - x_0)^2 + (y - y_0)^2}{2\sigma^2}\right)$$
Weather intensity affects path loss as follows:
$$PL(d) = PL_0 + 10 \cdot n \cdot \log_{10}(d) + \beta \cdot W(x, y)$$
Weather intensity map follows a Gaussian distribution with drift:
$$W(x, y, t) = I_W \cdot \exp\left(-\frac{(x - c_x(t))^2 + (y - c_y(t))^2}{2\sigma^2}\right)$$
Q-value update (TD learning):
$$\mathcal{L}_Q = \mathbb{E}_{(s,a,r,s')\sim\mathcal{D}}\left[\left(r + \gamma \max_{a'}Q_{\text{target}}(s',a') - Q(s,a)\right)^2\right]$$
Epsilon decay:
$$\epsilon_{t+1} = \max(\epsilon_{\min}, \lambda_{\epsilon} \cdot \epsilon_t)$$
State encoding:
$$h_s = \text{MLP}_{\text{encoder}}(s) \in \mathbb{R}^{16}$$
Combined features:
$$z = [h_s; \text{one-hot}(a)] \in \mathbb{R}^{25}$$
(16-dim encoded state + 9-dim one-hot action)
Next state and reward prediction:
$$\hat{s}' = \text{MLP}_{\text{state}}(z)$$
$$\hat{r} = \text{MLP}_{\text{reward}}(z)$$
World model loss:
$$\mathcal{L}_M = \mathbb{E}_{(s,a,r,s')\sim\mathcal{D}}\left[\|s' - \hat{s}'\|^2 + |r - \hat{r}|^2\right]$$
Planning objective (when enabled):
$$a^* = \arg\max_{a_0} \sum_{i=1}^{N_{\text{samples}}} \sum_{t=0}^{H_{\text{plan}}} \gamma^t \hat{r}_t(s_t, a_t^{(i)})$$
Our experiment is based on Python along with the PyTorch package, conducted on a Linux server equipped with an Ubuntu 22.04 operating system and powered by an Intel(R) Xeon(R) Silver 4410Y 12-core processor and an NVIDIA RTX A6000 GPU. We benchmark the proposed method with DQN and random policy.
Figure 2: Comparison of average episodic rewards
Figure 3: Comparison between real and predicted rewards
The experimental results demonstrate the effectiveness of Wireless Dreamer in improving learning efficiency and decision quality.
@article{zhao2025world,
title={World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks},
author={Zhao, Changyuan and Zhang, Ruichen and Wang, Jiacheng and Zhao, Gaosheng and Niyato, Dusit and Sun, Geng and Mao, Shiwen and Kim, Dong In},
journal={arXiv preprint arXiv:2506.00417},
year={2025}
}