Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: generalization Clear Filter

I unified convolution and attention into a single framework

The operational primitives of deep learning, primarily matrix multiplication and convolution, exist as a fragmented landscape of highly specialized tools. This paper introduces the Generalized Windowed Operation (GWO), a theoretical framework that unifies these operations by decomposing them into three orthogonal components: Path, defining operational locality; Shape, defining geometric structure and underlying symmetry assumptions; and Weight, defining feature importance. We elevate this f

Just Ask for Generalization (2021)

Generalizing to what you want may be easier than optimizing directly for what you want. We might even ask for "consciousness". This blog post outlines a key engineering principle I’ve come to believe strongly in for building general AI systems with deep learning. This principle guides my present-day research tastes and day-to-day design choices in building large-scale, general-purpose ML systems. Discoveries around Neural Scaling Laws, unsupervised pretraining on Internet-scale datasets, and o