New paper out! CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs.