In classical AI, perception relies on learning spatial representations, while planning—temporal reasoning over action sequences—is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both spatial and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure, due to reliance on spurious features. To address this, we introduce Contrastive Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik’s Cube. In particular, for the Rubik’s Cube, CRTR learns representations that generalize across all initial states and allow solving the puzzle much faster than BestFS—though with longer solutions. To our knowledge, this is the first demonstration of efficiently solving arbitrary Cube states using only learned representations, without hand-crafted search heuristics.