Tech News
← Back to articles

A 40-Line Fix Eliminated a 400x Performance Gap

read original related products more articles

I have a habit of skimming the OpenJDK commit log every few weeks. Many commits are too complex for me to grasp in the limited time I have reserved for this ... special hobby. But occasionally something catches my eye.

Last week, this commit stopped me mid-scroll:

858d2e434dd 8372584: [Linux]: Replace reading proc to get thread CPU time with clock_gettime

The diffstat was interesting: +96 insertions, -54 deletions . The changeset adds a 55-line JMH benchmark, which means the production code itself is actually reduced.

Here's what got removed from os_linux.cpp :

static jlong user_thread_cpu_time(Thread *thread) { pid_t tid = thread->osthread()->thread_id(); char *s; char stat[2048]; size_t statlen; char proc_name[64]; int count; long sys_time, user_time; char cdummy; int idummy; long ldummy; FILE *fp; os::snprintf_checked(proc_name, 64, "/proc/self/task/%d/stat", tid); fp = os::fopen(proc_name, "r"); if (fp == nullptr) return -1; statlen = fread(stat, 1, 2047, fp); stat[statlen] = '\0'; fclose(fp); // Skip pid and the command string. Note that we could be dealing with // weird command names, e.g. user could decide to rename java launcher // to "java 1.4.2 :)", then the stat file would look like // 1234 (java 1.4.2 :)) R ... ... // We don't really need to know the command string, just find the last // occurrence of ")" and then start parsing from there. See bug 4726580. s = strrchr(stat, ')'); if (s == nullptr) return -1; // Skip blank chars do { s++; } while (s && isspace((unsigned char) *s)); count = sscanf(s,"%c %d %d %d %d %d %lu %lu %lu %lu %lu %lu %lu", &cdummy, &idummy, &idummy, &idummy, &idummy, &idummy, &ldummy, &ldummy, &ldummy, &ldummy, &ldummy, &user_time, &sys_time); if (count != 13) return -1; return (jlong)user_time * (1000000000 / os::Posix::clock_tics_per_second()); }

This was the implementation behind ThreadMXBean.getCurrentThreadUserTime() . To get the current thread's user CPU time, the old code was:

Formatting a path to /proc/self/task//stat Opening that file Reading into a stack buffer Parsing through a hostile format where the command name can contain parentheses (hence the strrchr for the last ) ) Running sscanf to extract fields 13 and 14 Converting clock ticks to nanoseconds

For comparison, here's what getCurrentThreadCpuTime() does and has always done:

jlong os::current_thread_cpu_time() { return os::Linux::thread_cpu_time(CLOCK_THREAD_CPUTIME_ID); } jlong os::Linux::thread_cpu_time(clockid_t clockid) { struct timespec tp; clock_gettime(clockid, &tp); return (jlong)(tp.tv_sec * NANOSECS_PER_SEC + tp.tv_nsec); }

... continue reading