The operational primitives of deep learning, primarily matrix multiplication and convolution, exist
as a fragmented landscape of highly specialized tools. This paper introduces the Generalized Windowed
Operation (GWO), a theoretical framework that unifies these operations by decomposing them into three
orthogonal components: Path, defining operational locality; Shape, defining geometric structure and
underlying symmetry assumptions; and Weight, defining feature importance.
We elevate this framework to a predictive theory grounded in two fundamental principles. First, we
introduce the Principle of Structural Alignment, which posits that optimal generalization is achieved
when the GWO’s (P, S, W) configuration mirrors the data’s intrinsic structure. Second, we show that
this principle is a direct consequence of the Information Bottleneck (IB) principle. To formalize
this, we define an Operational Complexity metric based on Kolmogorov complexity. However, we
... continue reading