From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs
True spatial intelligence for multimodal agents transcends low-level geometric perception, evolving from knowing where things are to understanding what they are for. While existing benchmarks, such as VSI-Bench, effectively evaluate this foundational geometric stage, they fall short of probing the higher-order cognitive abilities essential for grounded intelligence. To bridge this gap, we introduce the Spatial-Functional Intelligence Benchmark (SFI-Bench), a video-based…