Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: llm Clear Filter

LameHug malware uses AI LLM to craft Windows data-theft commands in real-time

A novel malware family named LameHug is using a large language model (LLM) to generate commands to be executed on compromised Windows systems. LameHug was discovered by Ukraine’s national cyber incident response team (CERT-UA) and attributed the attacks to Russian state-backed threat group APT28 (a.k.a. Sednit, Sofacy, Pawn Storm, Fancy Bear, STRONTIUM, Tsar Team, Forest Blizzard). The malware is written in Python and relies on the Hugging Face API to interact with the Qwen 2.5-Coder-32B-Instr

How to run an LLM on your laptop

For Pistilli, opting for local models as opposed to online chatbots has implications beyond privacy. “Technology means power,” she says. “And so who[ever] owns the technology also owns the power.” States, organizations, and even individuals might be motivated to disrupt the concentration of AI power in the hands of just a few companies by running their own local models. Breaking away from the big AI companies also means having more control over your LLM experience. Online LLMs are constantly sh

Gaslight-driven development

Gaslight-driven development Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. Well, now there is a new way to serve

Topics: api just llms new unique

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by researchers at Google DeepMind and University College London reveals how large language models (LLMs) form, maintain and lose confidence in their answers. The findings reveal striking similarities between the cognitive biases of LLMs and humans, while also highlighting stark differences. The research reveals that LLMs can be

Voxtral-Mini-3B-2507 – Open source speech understanding model

Voxtral Mini 1.0 (3B) - 2507 Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here. Key Features Voxtral builds upon Ministral-3B with powerful audio understanding capabilities. Dedicated transcription mode : Voxtral can operate in a pure speech transcription mode to maximize p

Mistralai/Voxtral-Mini-3B-2507 · Hugging Face

Voxtral Mini 1.0 (3B) - 2507 Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Learn more about Voxtral in our blog post here. Key Features Voxtral builds upon Ministral-3B with powerful audio understanding capabilities. Dedicated transcription mode : Voxtral can operate in a pure speech transcription mode to maximize p

Ask HN: What's Your Useful Local LLM Stack?

What I’m asking HN: What does your actually useful local LLM stack look like? I’m looking for something that provides you with real value — not just a sexy demo. --- After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun. My daily (remote) LLM stack: - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients. - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it

McDonald's Idiotic AI Hiring System Just Leaked Personal Data About Millions of Job Applicants

As large language models (LLMs) become ever more integrated into the platforms that define daily life, major flaws in the software's security capabilities are starting to show. McDonald's is among the growing list of companies that have quickly shoehorned LLM chatbots into their hiring systems, consequences be damned. Its Paradox.ai-built chatbot, which McDonald's calls a "virtual recruiting assistant," goes by the name Olivia. Olivia is more than happy to help applicants find jobs near them t

AI therapy bots fuel delusions and give dangerous advice, Stanford study finds

When Stanford University researchers asked ChatGPT whether it would be willing to work closely with someone who had schizophrenia, the AI assistant produced a negative response. When they presented it with someone asking about "bridges taller than 25 meters in NYC" after losing their job—a potential suicide risk—GPT-4o helpfully listed specific tall bridges instead of identifying the crisis. These findings arrive as media outlets report cases of ChatGPT users with mental illnesses developing da

LLM Inference Handbook

On this page Introduction LLM Inference in Production is your technical glossary, guidebook, and reference - all in one. It covers everything you need to know about LLM inference, from core concepts and performance metrics (e.g., Time to First Token and Tokens per Second), to optimization techniques (e.g., continuous batching and prefix caching) and operation best practices. Practical guidance for deploying, scaling, and operating LLMs in production. Focus on what truly matters, not edge cas

LangChain is about to become a unicorn, sources say

LangChain, an AI infrastructure startup providing tools to build and monitor LLM-powered applications, is raising a new round of funding at an approximate $1 billion valuation led by IVP, according to three sources with knowledge of the deal. LangChain began its life in late 2022 as an open-source project founded by Harrison Chase, who was then an engineer at machine learning startup Robust Intelligence. After generating significant developer interest, Chase transformed the project into a start

Smollm3: Smol, multilingual, long-context reasoner LLM

SmolLM3: smol, multilingual, long-context reasoner Published July 8, 2025 Update on GitHub Base model: https://hf.co/HuggingFaceTB/SmolLM3-3B-Base Instruct and reasoning model: https://hf.co/HuggingFaceTB/SmolLM3-3B Small language models are becoming increasingly important as users seek capable models that can be deployed efficiently. The community has produced a fascinating range of capable small models, each pushing the boundaries of what's possible at this scale. With SmolLM3, we're excit

Bosses Are Using AI to Decide Who to Fire

Though most signs are telling us artificial intelligence isn't taking anyone's jobs, employers are still using the tech to justify layoffs, outsource work to the global South, and scare workers into submission. But that's not all — a growing number of employers are using AI not just as an excuse to downsize, but are giving it the final say in who gets axed. That's according to a survey of 1,342 managers by ResumeBuilder.com, which runs a blog dedicated to HR. Of those surveyed, 6 out of 10 admi

Optimizing Tool Selection for LLM Workflows with Differentiable Programming

Modern agentic architectures rely heavily on chaining LLM calls. A typical pattern looks like: Use an LLM to decide which tool to invoke Call the tool (e.g. search, calculator, API) Use another LLM call to interpret the result and generate a final response This structure is easy to reason about, simple to prototype, and generalizes well. But it scales poorly. Each LLM call incurs latency, cost, and token overhead. More subtly, it compounds context: every step includes not only the original q

VLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

GitHub | Documentation | Paper LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LL

Tools: Code Is All You Need

Tools: Code Is All You Need If you've been following me on Twitter, you know I'm not a big fan of MCP right now. It's not that I dislike the idea; I just haven't found it to work as advertised. In my view, MCP suffers from two major flaws: It isn’t truly composable. Most composition happens through inference. It demands too much context. You must supply significant upfront input, and every tool invocation consumes even more context than simply writing and running code. A quick experiment make

Writing Code Was Never the Bottleneck

For years, I’ve felt that writing lines of code was never the bottleneck in software engineering. The actual bottlenecks were, and still are, code reviews, knowledge transfer through mentoring and pairing, testing, debugging, and the human overhead of coordination and communication. All of this wrapped inside the labyrinth of tickets, planning meetings, and agile rituals. These processes, meant to drive quality, often slow us down more than the act of writing code itself because they require t

LLMs as Compilers

LLMs as compilers 7/2/2025 by Kadhir So far, I've only used LLMs as an assistant, where I'm doing something, and an LLM helps me along the way. Code autocomplete feels like a great example of how useful it can be when it gets it right. I don't doubt that over time this will improve, but I'm excited to see a more significant transition from this assistant mode to a compiler mode, at least for coding. It will be exciting when we focus solely on the context we fed the LLM, then test the features

What to build instead of AI agents

Paul: Today, the scene is owned by Hugo, a brilliant mind who advises and teaches teams building LLM-powered systems, including engineers from Netflix, Meta, and the U.S. Air Force. He runs a course on the LLM software development lifecycle, focusing on everything from retrieval and evaluation to agent design, and all the intermediate steps in between. Enough talking, I’ll let him dig into today’s controversial topic: “Stop building AI agents”. ↓🎙️ P.S. I agree with him. 🤫 Hugo: I've taught

Slouching Towards Sensemaking

Slouching Towards Sensemaking There’s a particular quality to the confusion of our current moment that reminds me of standing in Dolores Park at dusk, watching fog roll in from Twin Peaks while the Mission stays stubbornly sunny. We’re between weather systems, between worlds. The old information order – built on broadcast towers and printing presses, gatekeepers and institutions – is visibly dissolving. The new one hasn’t quite condensed into recognizable forms yet. We’re in the interregnum, an

Life of an inference request (vLLM V1): How LLMs are served efficiently at scale

Life of an inference request (vLLM V1): How LLMs are served efficiently at scale Junhao Li Senior Software Engineer Ubicloud is an open source alternative to AWS. We offer managed cloud services that build on top of PostgreSQL, Kubernetes, vLLM, and others.‍ ‍vLLM is an open-source inference engine that serves large language models. We deploy multiple vLLM instances across GPUs and load open weight models like Llama 4 into them. We then load balance traffic across vLLM instances, run health

Lossless LLM 3x Throughput Increase by LMCache

Redis for LLMs - Infinite and Ultra-Fast LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay. By combining LMCache with vLLM, LMCache achieve

Structured Output with LangChain and Llamafile

2 minutes read This article shows how one can teach Llamafile to handle structured outputs like JSON. If you’re already familiar with LangChain, you’ll know that popular models like OpenAI include their own implementations of with_structured_output. Using it is straightforward: All we need is to derive a new class from Pydantic’s BaseModel. The rest happens transparently. You don’t need to teach the LLM anything. Using Llamafile This isn’t currently possible with Llamafile, which I’m using i

Software 3.0 is powered by LLMs, prompts, and vibe coding - what you need know

dan/Getty Are large language models (LLMs) our new operating systems? If so, they are changing the definition of what we consider to be software. Also: 8 ways to write better ChatGPT prompts - and get the results you want faster Several analogies are used to describe the impact of fast-evolving AI technologies, such as utilities, time-sharing systems, and operating systems. Andrej Karpathy, co-founder of OpenAI and former senior director of AI at Tesla, believes that an operating system is th

Libraries are under-used. LLMs make this problem worse

Libraries are under-used. LLMs make this problem worse. Libraries are under-used. Why? Briefly: Writing code is more fun than reading documentation. Dunning-Kruger effect leads us to understimate the complexity of the problem solved by the library we're considering. Perverse incentives: libraries compete with big internal engineering projects that look good in a promo packet. LLMs make this problem worse. Why? Less briefly: Vibe coding is more fun than reading documentation. Shit, vibe-codin

MIT brain scans suggest that using GenAI tools reduces cognitive activity

Why it matters: As the use of generative AI becomes increasingly common in education, law, politics, media, and other fields, many worry that reliance on the technology may reduce cognitive independence. A recent study from MIT strongly supports this concern, indicating that the use of digital tools significantly alters brain activity. The newly published paper explains that as participants in an experiment wrote a series of essays, electronic brain monitoring revealed substantially weaker conn

Brain activity much lower when using AI chatbots, MIT boffins find

Using AI chatbots actually reduces activity in the brain versus accomplishing the same tasks unaided, and may lead to poorer fact retention, according to a new preprint study out of MIT. Seeking to understand how the use of LLM chatbots affects the brain, a team led by MIT Media Lab research scientist Dr. Nataliya Kosmyna hooked up a group of Boston-area college students to electroencephalogram (EEG) headsets and gave them 20 minutes to write a short essay. One group was directed to write witho

From LLM to AI Agent: What's the Real Journey Behind AI System Development?

AI agents are a hot topic, but not every AI system needs to be one. While agents promise autonomy and decision-making power, simpler & more cost-saving solutions better serve many real-world use cases. The key lies in choosing the right architecture for the problem at hand. In this post, we'll explore recent developments in Large Language Models (LLMs) and discuss key concepts of AI systems. We've worked with LLMs across projects of varying complexity, from zero-shot prompting to chain-of-tho