ZJIT removes redundant object loads and stores

Intro

Since the post at the end of last year, ZJIT has grown and changed in some exciting ways. This is the story of how a new, self-contained optimization pass causes ZJIT performance to surpass YJIT on an interesting microbenchmark. It has been 10 months since ZJIT was merged into Ruby, and we’re now beginning to see the design differences between YJIT and ZJIT manifest themselves in performance divergences. In this post, we will explore the details of one new optimization in ZJIT called load-store optimization. This implementation is part of ZJIT’s optimizer in HIR. Recall that the structure of ZJIT looks roughly like the following.

flowchart LR A(["Ruby"]) A --> B(["YARV"]) B --> C(["HIR"]) C --> D(["LIR"]) D --> E(["Assembly"])

This post will focus on optimization passes in HIR, or “High-level” Intermediate Representation. At the HIR level, we have two capabilities that are distinct from other compilation stages. Our optimizations in HIR typically utilize the benefits of our SSA representation in addition to the HIR instruction effect system.

These are the current analysis passes in ZJIT without load-store optimization, as well as the order in which the passes are executed.

run_pass! ( type_specialize ); run_pass! ( inline ); run_pass! ( optimize_getivar ); run_pass! ( optimize_c_calls ); run_pass! ( fold_constants ); run_pass! ( clean_cfg ); run_pass! ( remove_redundant_patch_points ); run_pass! ( eliminate_dead_code );

Here’s where load-store optimization gets added.

run_pass!(type_specialize); run_pass!(inline); run_pass!(optimize_getivar); run_pass!(optimize_c_calls); + run_pass!(optimize_load_store); run_pass!(fold_constants); run_pass!(clean_cfg); run_pass!(remove_redundant_patch_points); run_pass!(eliminate_dead_code);

Overview

Ruby is an object-oriented programming language, so CRuby needs to have some notion of object loads, modifications, and stores. In fact, this is a topic already covered by another Rails at Scale blog post. The shape system provides performance improvements in CRuby (both interpreter and JIT), but there is still plenty of opportunity to improve JIT performance. Sometimes optimizing interpreter opcodes one at a time leaves repeated loads or stores that can be cleaned up with a program analysis optimization pass. Before getting into the weeds about this pass, let’s talk performance.

... continue reading