Apple trained a large language model to efficiently understand long-form video
Apple researchers have developed an adapted version of the SlowFast-LLaVA model that beats larger models at long-form video analysis and understanding. Here’s what that means. The nerdy bits Very basically, when an LLM is trained to also understand video, it learns to split videos into frames, apply computer vision to extract visual features, analyze how those features change over time, and align all of that with language so it can describe or reason about the video in the form of text. One v