Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: vlm Clear Filter

You can try Apple’s lightning-fast video captioning model right from your browser

A few months ago, Apple released FastVLM, a Visual Language Model (VLM) that offered near-instant high-resolution image processing. Now, you can take it for a spin, provided you have an Apple Silicon-powered Mac. Here’s how. When we first covered FastVLM, we explained that it leveraged MLX, Apple’s own open ML framework specifically designed for Apple Silicon, to deliver up to 85 times faster video captioning, while being more than 3 times smaller than similar models. Since then, Apple has wor

FastVLM: Efficient Vision Encoding for Vision Language Models

Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from a pretrained vision encoder to a pretrained Large Language Model (LLM) through a projection layer. By leveraging the rich visual representations of the vision encoder and the world knowledge and reasoning capabilities of the LLM, VLMs can be useful for a wide range of applications, including accessibility assistants, UI navigation, robotics, and gaming. VLM

How generative AI could help make construction sites safer

To combat the shortcuts and risk-taking, Lorenzo is working on a tool for the San Francisco–based company DroneDeploy, which sells software that creates daily digital models of work progress from videos and images, known in the trade as “reality capture.” The tool, called Safety AI, analyzes each day’s reality capture imagery and flags conditions that violate Occupational Safety and Health Administration (OSHA) rules, with what he claims is 95% accuracy. That means that for any safety risk the