The operational primitives of deep learning, primarily matrix multiplication and convolution, exist
as a fragmented landscape of highly specialized tools. This paper introduces the Generalized Windowed
Operation (GWO), a theoretical framework that unifies these operations by decomposing them into three
orthogonal components: Path, defining operational locality; Shape, defining geometric structure and
underlying symmetry assumptions; and Weight, defining feature importance.
We elevate this framework to a predictive theory grounded in two fundamental principles. First, we
introduce the Principle of Structural Alignment, which posits that optimal generalization is achieved
when the GWO’s (P, S, W) configuration mirrors the data’s intrinsic structure. Second, we show that
this principle is a direct consequence of the Information Bottleneck (IB) principle. To formalize
this, we define an Operational Complexity metric based on Kolmogorov complexity. However, we
move beyond the simplistic view that lower complexity is always better. We argue that the nature of
this complexity—whether it contributes to brute-force capacity or to adaptive regularization—is
the true determinant of generalization. Our theory predicts that a GWO whose complexity is utilized to
adaptively align with data structure will achieve a superior generalization bound. Canonical operations
and their modern variants emerge as optimal solutions to the IB objective, and our experiments reveal that
the quality, not just the quantity, of an operation’s complexity governs its performance. The GWO theory
thus provides a grammar for creating neural operations and a principled pathway from data properties
to generalizable architecture design.