I unified convolution and attention into a single framework

The operational primitives of deep learning, primarily matrix multiplication and convolution, exist

as a fragmented landscape of highly specialized tools. This paper introduces the Generalized Windowed

Operation (GWO), a theoretical framework that unifies these operations by decomposing them into three

orthogonal components: Path, defining operational locality; Shape, defining geometric structure and

underlying symmetry assumptions; and Weight, defining feature importance.

We elevate this framework to a predictive theory grounded in two fundamental principles. First, we

introduce the Principle of Structural Alignment, which posits that optimal generalization is achieved

when the GWO’s (P, S, W) configuration mirrors the data’s intrinsic structure. Second, we show that

this principle is a direct consequence of the Information Bottleneck (IB) principle. To formalize

this, we define an Operational Complexity metric based on Kolmogorov complexity. However, we