Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: attention Clear Filter

The Tradeoffs of SSMs and Transformers

This blog post was adapted from a talk I’ve given a handful of times over the last year. It was meant to be a high-level talk accessible to a fairly broad audience, but hopefully has some interesting insights, opinions, and intuitions around sequence models for the dedicated researchers too. State Space Models Just so we’re on the same page, I’ll start by defining what I mean by a state space model. (This section isn’t strictly necessary to get to the main part of this post though; feel free t

VLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

GitHub | Documentation | Paper LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LL

I have reimplemented Stable Diffusion 3.5 from scratch in pure PyTorch

miniDiffusion miniDiffusion is a reimplementation of the Stable Diffusion 3.5 model in pure PyTorch with minimal dependencies. It's designed for educational, experimenting, and hacking purposes. It's made with the mindset of having the least amount of code necessary to recreate Stable Diffusion 3.5 from scratch, with only ~2800 spanning from VAE to DiT to the Train and Dataset scripts. -Files: The main Stable Diffusion model code is located in dit.py, dit_components.py, and attention.py. The d

DeepDive in everything of Llama3: revealing detailed insights and implementation

[ View in English | 中文版文档点这里 ] This project is an enhanced version based on naklecha/llama3-from-scratch. It has been comprehensively improved and optimized on the basis of the original project, aiming to help everyone more easily understand and master the implementation principle and the detailed reasoning process of the Llama3 model. Thanks to the contributions of the original author :) The following are the core improvements of this project: Structural Optimization The presentation se