Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: tokens Clear Filter

One Token to rule them all – Obtaining Global Admin in every Entra ID tenant

While preparing for my Black Hat and DEF CON talks in July of this year, I found the most impactful Entra ID vulnerability that I will probably ever find. This vulnerability could have allowed me to compromise every Entra ID tenant in the world (except probably those in national cloud deployments). If you are an Entra ID admin reading this, yes that means complete access to your tenant. The vulnerability consisted of two components: undocumented impersonation tokens, called “Actor tokens”, that

Hackers steal 3,325 secrets in GhostAction GitHub supply chain attack

A new supply chain attack on GitHub, dubbed 'GhostAction,' has compromised 3,325 secrets, including PyPI, npm, DockerHub, GitHub tokens, Cloudflare, and AWS keys. The attack was discovered by GitGuardian researchers, who report that the first signs of compromise on one of the impacted projects, FastUUID, became evident on September 2, 2025. The attack involved leveraging compromised maintainer accounts to perform commits that added a malicious GitHub Actions workflow file that triggers automat

Salesloft says Drift customer data thefts linked to March GitHub account hack

Salesloft said a breach of its GitHub account in March allowed hackers to steal authentication tokens that were later used in a mass-hack targeting several of its big tech customers. Citing an investigation by Google’s incident response unit Mandiant, Salesloft said on its data breach page that the as-yet-unnamed hackers accessed Salesloft’s GitHub account and performed reconnaissance activities from March until June, which allowed them to download “content from multiple repositories, add a gue

Understanding Transformers Using a Minimal Example

The internal mechanisms of Transformer Large Language models (LLMs), particularly the flow of information through the layers and the operation of the attention mechanism, can be challenging to follow due to the vast amount of numbers involved. We humans can hardly form a mental model. This article aims to make these workings tangible by providing visualizations of a Transformer's internal state. Utilizing a minimal dataset and a deliberately simplified model, it is possible to follow the model's

Pearl – An Erlang lexer and syntax highlighter in Gleam

Pearl An Erlang lexer and syntax highlighter for Gleam! Pearl is a lexer and syntax highlighter for Erlang, written in Gleam. The lexer is based on glexer and just , allowing you to convert Erlang source code into tokens. There is also an API which allows you to highlight Erlang code using ansi colours, html or a custom format. Heavily inspired by contour . gleam add pearl@2 import pearl pub fn main ( ) { let code = " -module(hello). -export([hello_world/0]). hello_world() -> io:fwrite( \" H

Google warns that mass data theft hitting Salesloft AI agent has grown bigger

Google is advising users of the Salesloft Drift AI chat agent to consider all security tokens connected to the platform compromised following the discovery that unknown attackers used some of the credentials to access email from Google Workspace accounts. In response, Google has revoked the tokens that were used in the breaches and disabled integration between the Salesloft Drift agent and all Workspace accounts as it investigates further. The company has also notified all affected account hold

Google warns Salesloft breach impacted some Workspace accounts

Google now reports that the Salesloft Drift breach is larger than initially thought, warning that attackers also used stolen OAuth tokens to access a small number of Google Workspace email accounts in addition to stealing data from Salesforce instances. "Based on new information identified by GTIG, the scope of this compromise is not exclusive to the Salesforce integration with Salesloft Drift and impacts other integrations,' warns Google. "We now advise all Salesloft Drift customers to treat

Are OpenAI and Anthropic losing money on inference?

I keep hearing what a cash incinerator AI is, especially around inference. While it seems reasonable on the surface, I've often been wary of these kind of claims, so I decided to do some digging. I haven't seen anyone really try to deconstruct the costs in running inference at scale and the economics really interest me. This is really napkin math. I don't have any experience at running frontier models at scale, but I do know a lot about the costs and economics of running very high throughput s

Are OpenAI and Anthropic Losing Money on Inference?

I keep hearing what a cash incinerator AI is, especially around inference. While it seems reasonable on the surface, I've often been wary of these kind of claims, so I decided to do some digging. I haven't seen anyone really try to deconstruct the costs in running inference at scale and the economics really interest me. This is really napkin math. I don't have any experience at running frontier models at scale, but I do know a lot about the costs and economics of running very high throughput s

Open-Sourced AI Models May Be More Costly in the Long Run, Study Finds

As more businesses adopt AI, picking which model to go with is a major decision. While open-sourced models may seem cheaper initially, a new study warns that those savings can evaporate fast, due to the extra computing power they require. In fact, open-source AI models burn through significantly more computing resources than their closed-source rivals when performing the same tasks, according to a study published Thursday by Nous Research. The researchers tested dozens of AI models, including

What's the strongest AI model you can train on a laptop in five minutes?

What’s the strongest model I can train on my MacBook Pro in five minutes? I’ll give the answer upfront: the best 5-minute model I could train was a ~1.8M-param GPT-style transformer trained on ~20M TinyStories tokens, reaching ~9.6 perplexity on a held-out split. Here’s an example of the output, with the prompt bolded: Once upon a time, there was a little boy named Tim. Tim had a small box that he liked to play with. He would push the box to open. One day, he found a big red ball in his yard.

Claude gets 1M tokens support via API to take on Gemini 2.5 Pro

Claude Sonnet 4 has been upgraded, and it can now remember up to 1 million tokens of context, but only when it's used via API. This could change in the future. This is 5x more than the previous limit. It also means that Claude now supports remembering over 75,000 lines of code, or even hundreds of documents in a single session. Previously, you were required to submit details to Claude in small chunks, but that also meant Claude would forget the context as it hit the limit. With up to a 1 milli

GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM

Here is the thing, the expert layers run amazing on CPU ( ~17T/s 25T/s on a 14900K) and you can force that with this new llama-cpp option: --cpu-moe . You can offload just the attention layers to GPU (requiring about 5 to 8GB of VRAM) for fast prefill. KV cache for the sequence Attention weights & activations Routing tables LayerNorms and other “non-expert” parameters No giant MLP weights are resident on the GPU, so memory use stays low. This yields an amazing snappy system for a 120B mod

Topics: gpu layers moe ms tokens

Apple taught an LLM to predict tokens up to 5x faster in math and coding tasks

A new research paper from Apple details a technique that speeds up large language model responses, while preserving output quality. Here are the details. The nerdy bits Traditionally, LLMs generate text one token at a time. This is slow because each step depends on all the previous ones to keep the output coherent and accurate. If the model is writing a sentence like “ The cat is black ”, it predicts each token in sequence. After writing “ The cat is ”, it looks at everything so far (plus the

How attention sinks keep language models stable

We discovered why language models catastrophically fail on long conversations: when old tokens are removed to save memory, models produce complete gibberish. We found models dump massive attention onto the first few tokens as "attention sinks"—places to park unused attention since softmax requires weights to sum to 1. Our solution, StreamingLLM, simply keeps these first 4 tokens permanently while sliding the window for everything else, enabling stable processing of 4 million+ tokens instead of j

How Attention Sinks Keep Language Models Stable

We discovered why language models catastrophically fail on long conversations: when old tokens are removed to save memory, models produce complete gibberish. We found models dump massive attention onto the first few tokens as "attention sinks"—places to park unused attention since softmax requires weights to sum to 1. Our solution, StreamingLLM, simply keeps these first 4 tokens permanently while sliding the window for everything else, enabling stable processing of 4 million+ tokens instead of j

The Mysterious AI Easter Egg at the Heart of Ari Aster’s ‘Eddington’

Horror wunderkind Ari Aster’s new movie Eddington has divided audiences and inspired plenty of online debate about what exactly the director is trying to say about our collective relationship to technology (hint: it’s probably not good). The story centers around a small town in Texas that descends into social-media-driven chaos during the covid-19 pandemic. The film stars Joaquin Phoenix as local sheriff Joe Cross, who tussles with the town’s mayor, played by Pedro Pascal, while the rest of the

My favorite use-case for AI is writing logs

July 17, 2025 My favorite use-case for AI is writing logs One of my favorite AI dev products today is Full Line Code Completion in PyCharm (bundled with the IDE since late 2023). It’s extremely well-thought out, unintrusive, and makes me a more effective developer. Most importantly, it still keeps me mostly in control of my code. I’ve now used it in GoLand as well. I’ve been a happy JetBrains customer for a long time now, and it’s because they ship features like this. I frequently work with c

Trump’s Meme Coin Empire Is About to Explode With Millions of New Tokens

Millions of Trump meme coins could flood the crypto market starting this week, potentially reshaping the future of one of the most controversial cryptocurrencies in circulation. Entities affiliated with President Donald Trump now have the right to sell a large portion of the meme coin named after him, $TRUMP, starting Wednesday July 17. The move could unlock hundreds of millions of dollars in digital tokens, intensifying scrutiny over Trump’s growing involvement in crypto and raising fresh ques

Grok 4

Grok 4. Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Key characteristics: image and text input, text output. 256,000 context length (twice that of Grok 3). It's a reasoning model where you can't see the reasoning tokens or turn off reasoning mode. xAI released results showing Grok 4 beating other models on most of the significant benchmarks. I haven't been able to find their own written version of these (the launch was a livestream video) but

Optimizing a Math Expression Parser in Rust

Optimizing a Math Expression Parser in Rust Optimizing a Math Expression Parser in Rust Table of contents In a previous post I explored how to optimize file parsing for max speed. This time, we’ll look at a different, self-contained problem: writing a math expression parser in Rust, and making it as fast and memory-efficient as possible. Let’s say we want to parse simple math expressions with addition, subtraction, and parentheses. For example: 4 + 5 + 2 - 1 => 10 (4 + 5) - (2 + 1) => 6 (1

OpenAI Warns You Not to Buy Its Fake Stock

OpenAI has a message for anyone who thinks they’re about to cash in on the AI boom by buying a new “OpenAI token” on Robinhood: Don’t. But in a chaotic turn, Elon Musk just suggested that even the company’s real equity might be an illusion. The maker of ChatGPT, in a rare public warning posted on X (formerly Twitter), disavowed any involvement with crypto-like financial products claiming to offer a piece of its business. “These ‘OpenAI tokens’ are not OpenAI equity,” the company wrote. “We did

Owning a Piece of ChatGPT Was Already Messy. Then Elon Musk Made It Weirder

OpenAI has a message for anyone who thinks they’re about to cash in on the AI boom by buying a new “OpenAI token” on Robinhood: Don’t. But in a chaotic turn, Elon Musk just suggested that even the company’s real equity might be an illusion. The maker of ChatGPT, in a rare public warning posted on X (formerly Twitter), disavowed any involvement with crypto-like financial products claiming to offer a piece of its business. “These ‘OpenAI tokens’ are not OpenAI equity,” the company wrote. “We did

OpenAI disavows online broker Robinhood's sale of 'OpenAI tokens'

'We did not partner with Robinhood, were not involved in this and do not endorse it.' OpenAI has condemned online brokerage firm Robinhood's sale of "OpenAI tokens," saying they will not give consumers stock in the company. "We did not partner with Robinhood, were not involved in this, and do not endorse it," the company said in a post on X, adding that the tokens are not equity and that it did not give approval for any transfer. The statement addresses a recent move by Robinhood to provide Eu

Life of an inference request (vLLM V1): How LLMs are served efficiently at scale

Life of an inference request (vLLM V1): How LLMs are served efficiently at scale Junhao Li Senior Software Engineer Ubicloud is an open source alternative to AWS. We offer managed cloud services that build on top of PostgreSQL, Kubernetes, vLLM, and others.‍ ‍vLLM is an open-source inference engine that serves large language models. We deploy multiple vLLM instances across GPUs and load open weight models like Llama 4 into them. We then load balance traffic across vLLM instances, run health

OpenAI charges by the minute, so speed up your audio

Want to make OpenAI transcriptions faster and cheaper? Just speed up your audio. I mean that very literally. Run your audio through ffmpeg at 2x or 3x before transcribing it. You’ll spend fewer tokens and less time waiting with almost no drop in transcription quality. That’s it! Here’s a script combining of all my favorite little toys and tricks to get the job. You’ll need yt-dlp, ffmpeg and llm installed. # Extract the audio from the video yt-dlp -f 'bestaudio[ext=m4a]' --extract-audio --au

OpenAI Charges by the Minute, So Make the Minutes Shorter

Want to make OpenAI transcriptions faster and cheaper? Just speed up your audio. I mean that very literally. Run your audio through ffmpeg at 2x or 3x before transcribing it. You’ll spend fewer tokens and less time waiting with almost no drop in transcription quality. That’s it! Here’s a script combining of all my favorite little toys and tricks to get the job. You’ll need yt-dlp, ffmpeg and llm installed. # Extract the audio from the video yt-dlp -f 'bestaudio[ext=m4a]' --extract-audio --au

MiniMax-M1 is a new open source model with 1 MILLION TOKEN context and new, hyper efficient reinforcement learning

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Chinese AI startup MiniMax, perhaps best known in the West for its hit realistic AI video model Hailuo, has released its latest large language model, MiniMax-M1 — and in great news for enterprises and developers, it’s completely open source under an Apache 2.0 license, meaning businesses can take it and use it for commercial applications a

Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Last month, along with a comprehensive suite of new AI tools and innovations, Google DeepMind unveiled Gemini Diffusion. This experimental research model uses a diffusion-based approach to generate text. Traditionally, large language models (LLMs) like GPT and Gemini itself have relied on autoregression, a step-by-step approach where each

With the launch of o3-pro, let’s talk about what AI “reasoning” actually does

On Tuesday, OpenAI announced that o3-pro, a new version of its most capable simulated reasoning model, is now available to ChatGPT Pro and Team users, replacing o1-pro in the model picker. The company also reduced API pricing for o3-pro by 87 percent compared to o1-pro while cutting o3 prices by 80 percent. While "reasoning" is useful for some analytical tasks, new studies have posed fundamental questions about what the word actually means when applied to these AI systems. We'll take a deeper l