What’s the quickest giveaway that someone’s an amateur developer?
When they tell you to break out a function into smaller functions, and the first reason they cite is “it’s over X lines of code.”
Yeah, you heard me.
This article is targeted at people who are so easily intimidated by long functions that they fail to see deeper issues with them.
It’s about people who take a linear function with no repetition or need for reusability, break it out into smaller functions, and think they accomplished something.
I’ve written one-liner functions (besides getters and setters). I’ve written one-hundred-liner functions. I’ve written everything in between, and I have no regrets. So let me teach you some things you should’ve been taught a long time ago.
Lesson 1: Stop using aesthetic arguments
I cringe when developers say “this function just doesn’t look clean” to justify refactoring in lieu of actual reasoning, as if code cleanliness is an aesthetic judgment that only a true connoisseur can determine.
Intuition is valuable, but it’s a hint, not the answer. There’s always a deeper reason. You just may not be articulate enough to express it using reasoned arguments. I can help, but…
Lesson 2: “Lines Of Code” is the last thing you should think about
Referring to LoC to determine function cleanliness is like judging a paragraph of a book by how long it is. If the exact same information can be communicated in fewer lines of code, and if all else is equal, then sure, conciseness is valuable.
But all else is rarely equal. While function length can exacerbate cleanliness problems, it’s rarely the root cause.
Lesson 3: Cognitive Complexity
A superior metric to LoC is CC. A good linter will warn you when your cognitive complexity reaches a certain amount. I try to keep my functions under a CC of 15, but it’s up to you and your team.
What matters is that you don’t treat it as gospel.
What matters even more is that you try to simplify your function’s logic BEFORE you even think about decomposing. Because if you do, you may fall below your CC threshold, and you won’t feel the need to decompose at all.
Lesson 4: There are costs to decomposition
Too many developers will downplay the cost of function decomposition and act like it’s only costly below a certain LoC.
That’s noob talk.
Locality is the design principle of grouping related things together. It’s a mental shortcut or heuristic that allows the mind to assume a higher degree of independence from the code block being read, easing the mental burden. Spreading code across your codebase diminishes locality.
Linearity is the idea of having instructions be read one after the other, from top to bottom, just like how code is executed. Having to jump back and forth from function call to function call reduces linearity.
Context Overhead is how much context from the calling function you need to understand a called function, and vice versa. Ideally, this would be zero; the calling function treats the called function as a black box, and the called function can be understood in isolation.
Too often, people will decompose functions by line count, yet the details of each sub-function are HEAVILY intertwined with the overall purpose of the main function, such that there’s no real separation to ease the mental burden.
Functions that modify passed-in variables (like appending to a list), or worse, globals.
Functions that require a LOT of local variables to be passed in.
Functions that have a LOT of return values that aren’t necessarily related to each other.
And worst of all, function signatures that do a bad job of summarizing what the function is supposed to do.
You’ll notice this when the function name uses very simple yet vague language (i.e getDetails , setupInfrastructure , etc), but the body is so much more involved than the name implies.
The authors had a difficult time coming up with a good name because it’s impossible. There doesn’t exist a high-level description for the code in this function that isn’t just a restatement of the implementation.
You’ll have to read these functions anyway to understand what they do, so you haven’t actually reduced complexity, you just moved it.
Coupling is inevitable, but when you introduce a function that other parts of the codebase will call, they are now coupled to this function. Sometimes this coupling is desirable, but in the future, you may find that each caller acquires a slightly different twist to the same function. Now you have to either inline the function to some of the callers, make separate functions for each caller, or modify the function to handle the unique concerns and paths of the callers.
I’ll talk more about this in the next section.
Lesson 5: Reusability and deduplication are king
The very first thing you should think of when considering decomposition is whether this same functionality will be called elsewhere, the goal being to reduce code duplication.
Just make sure that the duplication you’re removing isn’t coincidental. “Coincidental duplication” is two or more pieces of code that just happen to be similar now, but are likely to diverge in the future.
Consider the principle of a Single Source of Truth. In the context of functions, this is when you have business logic that you want to standardize across your modules.
Also consider the Rule of Three. A single duplication is often a coincidence, but another one points to a pattern.
I say all this because what’ll often happen is that people will prematurely create functions because duplication is ugly. But when unique cases arise from the callers, a lot of people find it hard, psychologically speaking, to let go of said functions. Instead, they’ll Frankenstein-stitch on specific cases to account for each caller, leading to multiple unintuitive code paths (*shudder*).
Lesson 6: Testability, Dependency Injection, and Purity
Sometimes, the code you’re extracting is so involved and so tricky that you want to test it on its own, rather than exercising it through your current interface. In that case, you should consider extracting into a public method, whether it’s in the same class or a different one. But this should be a last resort, and the function you’re extracting shouldn’t mess with the internals of your class. It should be as pure as possible.
On a similar note, you may find that specific blocks of code work better as wrappers over external dependencies (DBs, files, third-party APIs, etc), and you want to be able to mock these dependencies in your tests. That’s another good candidate for extraction.
Lesson 7: Maybe the calling function is just doing too much
I’ve been talking about splitting blocks of code into separate functions, but what if the overall function itself needs refactoring?
Maybe it’s trying to do too many things.
Maybe it itself needs to be split into separate functions.
Maybe it takes so many arguments, causing a combinatorial explosion of scenarios that makes it hard to reason about and test, where more smaller functions would be easier.
Zeroing in on cleaning up the function in front of you can blind you to the bigger picture that maybe the function itself is bad.
Lesson 8: Self-containment and naming
Okay, let’s say you’ve dealt with the above scenarios, but your function is still painful to read. At this point, I’m assuming that your temptation to extract is solely because of complexity, and that duplication reduction isn’t a concern. You may have one of three issues:
You have code that complicates the function to the point that it distracts from the function’s overall purpose, such as comprehensive validation logic before the real meat of the function, a tricky algorithm, or involved loop logic that the reader really doesn’t need to see right now. Your function is still a mish-mash of code that’s hard to hold in your head. Your business logic is just that complicated.
For (1), I’m inclined to agree. Sometimes you have logic that, while vital, just distracts you from what the function’s really about at a higher-level. For such scenarios, I would actually advise extracting to a separate function, even if it’s just private.
But consider Lesson 4, especially Context Overhead, to see if it’s worth it, and don’t be a chicken. If it’s just a few simple lines of code, it’s likely not worth extracting. Don’t let the “ugliness” scare you.
(2) I mentioned earlier that code that can be reasoned about in isolation is important, because once you understand it, you can use that understanding when reading the calling function.
However, there are some tricks you can use to better present your code without the costs of extracting to a separate function, if you find the costs to be high.
Just like how books are broken into paragraphs, you can break your code into blocks to hint to the reader that those specific lines of code are more related to each other than they are to others. This is a psychology concept called “chunking”.
Variable naming is also often overlooked. Too many developers use functions as a crutch to reduce complexity, when in reality they just have mediocre variable names. Get off it. Good names prime readers to infer functionality without having to read every single line.
For the messier blocks (i.e the ones filled with local context), you can place a comment at the top of each block to summarize if that helps. For these kinds of code blocks, in-place comments improve comprehension more than extracted functions (remember Lesson 4).
“Nooo! Comments lie!” And you think function names don’t? At least comments lie to your face, whereas functions hide the lies elsewhere.
In other words, if you feel the need to extract to a function, try blocking your code, using comments for the messy parts, and improving your naming first. That fixes most readability issues for those who haven’t yet drank the Clean Code Kool-Aid.
(3) I can’t help you with this. You might need to revisit your requirements or suck it up.
An Example
I was inspired to write this after reading through the first example of refactoring in Martin Fowler’s book, Refactoring: Improving the Design of Existing Code (highly recommend, btw).
The initial version of the code can be found here: Theatrical-Players-Refactoring-Kata/javascript/src/statement.js at main · emilybache/Theatrical-Players-Refactoring-Kata · GitHub
And the refactored code can be found here: GitHub – kanpou0108/refactoring-2nd-martinfowler: Refactoring: Improving the Design of Existing Code by Martin Fowler, Second Edition
To be clear, Fowler took the initial statement.js script, and split it into statement.js & createStatementData.js . The main idea was to separate the String summarization of the statement from the creation of the statement.
It’s a great refactor! And he acknowledges that while splitting these incurs a performance penalty, the readability and maintainability benefits far outweigh that cost.
But there also some weird choices that irked me in that they reek of the fear of “dirty code” I described above.
Starting with my biggest issue: Inheritance.
plays are either comedies or tragedies, and each play type has its own way of calculating an amount (i.e a price given the size of the audience), and volumeCredits (i.e customer loyalty points given the size of the audience).
In my opinion, Fowler doesn’t do a good job of justifying the use of polymorphism this early. There are only two types, and each type only has two methods. And neither are used outside this module.
Also, he uses JavaScript which doesn’t enforce method implementation in subclasses at compile-time like Java interfaces do, so even that benefit isn’t recognized here.
It would’ve been better to simply have a function to handle pricing, and another one for loyalty points, switching on the play type.
Oh wait, he actually did this in an earlier iteration before replacing it with polymorphism.
function amountFor(aPerformance) { let result = 0; switch (aPerformance.play.type) { case "tragedy": result = 40000; if (aPerformance.audience > 30) { result += 1000 * (aPerformance.audience - 30); } break; case "comedy": result = 30000; if (aPerformance.audience > 20) { result += 10000 + 500 * (aPerformance.audience - 20); } result += 300 * aPerformance.audience; break; default: throw new Error('unknown type: ${aPerformance.play.type}') } return result; } function volumeCreditsFor(aPerformance) { let result = 0; result += Math.max(aPerformance.audience - 30, 0); if ("comedy" === aPerformance.play.type) result += Math.floor(aPerformance.audience / 5); return result; }
Extracting this simple functionality to separate classes is so unnecessary because it doesn’t reduce complexity, it just spreads it out. If you feel these functions are doing too much (especially amountFor ), let me try this.
function calculatePrice(performance) { let price = 0; switch (performance.play.type) { case "tragedy": price = 40000; if (performance.audience > 30) { price += 1000 * (performance.audience - 30); } break; case "comedy": price = 30000; if (performance.audience > 20) { price += 10000 + 500 * (performance.audience - 20); } price += 300 * performance.audience; break; default: throw new Error('unknown type: ${performance.play.type}') } return price; }
Wow, it’s suddenly super easy to read! What magic did I just do?
I just did some renaming:
amountFor was an awful name to begin with, so I changed it to calculatePrice .
result was vague, so I changed it to price .
I removed the unnecessary “a” from aPerformance .
Does your mind feel lighter reading this?
How about this?
function calculatePrice(performance) { switch (performance.play.type) { case "tragedy": let price = 40000; if (performance.audience > 30) { price += 1000 * (performance.audience - 30); } return price; case "comedy": let price = 30000; if (performance.audience > 20) { price += 10000 + 500 * (performance.audience - 20); } price += 300 * performance.audience; return price; default: throw new Error('unknown type: ${performance.play.type}') } }
All I did was move the outer-level price variable declaration into each block and return early. Now, your mind can reason about each block independently, without having to hop over the default block to see what happens with price after the switch statement.
I guess I introduced some duplication by declaring price twice, but surely you can see it’s worth it. Let’s touch up volumeCreditsFor , first with some nice renaming:
function calculateCustomerCredits(performance) { let credits = 0; credits += Math.max(performance.audience - 30, 0); if ("comedy" === performance.play.type) credits += Math.floor(performance.audience / 5); return credits; }
Now, I’m gonna do something unexpected:
function calculateCustomerCredits(performance) { switch(performance.play.type) { case "tragedy": return Math.max(performance.audience - 30, 0); case "comedy": return Math.max(performance.audience - 30, 0) + Math.floor(performance.audience / 5); default: throw new Error('unknown type: ${performance.play.type}') } }
I introduced switching logic to the credit calculation function, even though by doing so, I’ve actually increased its complexity. Why?
First, it’s good practice to handle the default scenario, even though it would never be reached in calculateCustomerCredits , since calculatePrice is called first, and would’ve thrown an exception if the play type was unknown.
But the real reason I did this is because I’m not naive. I know that there’s a good chance that polymorphism will be introduced in the future as new play types are added.
Even though I’ve duplicated the base amount calculation in both blocks, this’ll make it trivially easy to move these implementations into subclass methods. Because if you noticed in Fowler’s refactored code, get volumeCredits has a super implementation, whose only purpose is to calculate this base amount.
Remember earlier when I mentioned “coincidental duplication?” This is what happens when you’re too averse to it. You end up with unnecessary coupling.
Now, let’s zoom out a bit and refactor createStatementData . Here’s how Fowler’s version looks:
export default function createStatementData(invoice, plays) { const result = {}; result.customer = invoice.customer; result.performances = invoice.performances.map(enrichPerformance); result.totalAmount = totalAmount(result); result.totalVolumeCredits = totalVolumeCredits(result); return result; function enrichPerformance(aPerformance) { const calculator = createPerformanceCalculator(aPerformance, playFor(aPerformance)); const result = Object.assign({}, aPerformance); result.play = calculator.play; result.amount = calculator.amount; result.volumeCredits = calculator.volumeCredits; return result; } function playFor(aPerformance) { return plays[aPerformance.playID]; } function totalAmount(data) { return data.performances .reduce((total, p) => total + p.amount, 0); } function totalVolumeCredits(data) { return data.performances .reduce((total, p) => total + p.volumeCredits, 0); } } function createPerformanceCalculator(aPerformance, aPlay) { switch(aPlay.type) { case "tragedy": return new TragedyCalculator(aPerformance, aPlay); case "comedy" : return new ComedyCalculator(aPerformance, aPlay); default: throw new Error(`unknown type: ${aPlay.type}`); } } class PerformanceCalculator { constructor(aPerformance, aPlay) { this.performance = aPerformance; this.play = aPlay; } get amount() { throw new Error('subclass responsibility');} get volumeCredits() { return Math.max(this.performance.audience - 30, 0); } } class TragedyCalculator extends PerformanceCalculator { get amount() { let result = 40000; if (this.performance.audience > 30) { result += 1000 * (this.performance.audience - 30); } return result; } } class ComedyCalculator extends PerformanceCalculator { get amount() { let result = 30000; if (this.performance.audience > 20) { result += 10000 + 500 * (this.performance.audience - 20); } result += 300 * this.performance.audience; return result; } get volumeCredits() { return super.volumeCredits + Math.floor(this.performance.audience / 5); } }
First thing I’m gonna do is remove the PerformanceCalculator class hierachy and replace it with the straightforward procedural functions from above.
export default function createStatementData(invoice, plays) { const result = {}; result.customer = invoice.customer; result.performances = invoice.performances.map(enrichPerformance); result.totalAmount = totalAmount(result); result.totalVolumeCredits = totalVolumeCredits(result); return result; function enrichPerformance(aPerformance) { const result = Object.assign({}, aPerformance); result.play = playFor(aPerformance); result.amount = calculatePrice(result); result.volumeCredits = calculateCustomerCredits(result); return result; } function playFor(aPerformance) { return plays[aPerformance.playID]; } function totalAmount(data) { return data.performances .reduce((total, p) => total + p.amount, 0); } function totalVolumeCredits(data) { return data.performances .reduce((total, p) => total + p.volumeCredits, 0); } function calculatePrice(performance) { switch (performance.play.type) { case "tragedy": let price = 40000; if (performance.audience > 30) { price += 1000 * (performance.audience - 30); } return price; case "comedy": let price = 30000; if (performance.audience > 20) { price += 10000 + 500 * (performance.audience - 20); } price += 300 * performance.audience; return price; default: throw new Error('unknown type: ${performance.play.type}') } } function calculateCustomerCredits(performance) { switch(performance.play.type) { case "tragedy": return Math.max(performance.audience - 30, 0); case "comedy": return Math.max(performance.audience - 30, 0) + Math.floor(performance.audience / 5); default: throw new Error('unknown type: ${performance.play.type}') } } }
Now I’m gonna inline and delete the playFor , totalAmount , and totalVolumeCredits functions, because why have one-line functions that are only used in one place? They take focus away from enrichPerformance , calculatePrice , and calculateVolumeCredits , which is where the good stuff is.
export default function createStatementData(invoice, plays) { const result = {}; result.customer = invoice.customer; result.performances = invoice.performances.map(enrichPerformance); result.totalAmount = result.performances .reduce((total, p) => total + p.amount, 0); result.totalVolumeCredits = result.performances .reduce((total, p) => total + p.volumeCredits, 0); return result; function enrichPerformance(aPerformance) { const result = Object.assign({}, aPerformance); result.play = plays[aPerformance.playID]; result.amount = calculatePrice(result); result.volumeCredits = calculateCustomerCredits(result); return result; } function calculatePrice(performance) { switch (performance.play.type) { case "tragedy": let price = 40000; if (performance.audience > 30) { price += 1000 * (performance.audience - 30); } return price; case "comedy": let price = 30000; if (performance.audience > 20) { price += 10000 + 500 * (performance.audience - 20); } price += 300 * performance.audience; return price; default: throw new Error('unknown type: ${performance.play.type}') } } function calculateCustomerCredits(performance) { switch(performance.play.type) { case "tragedy": return Math.max(performance.audience - 30, 0); case "comedy": return Math.max(performance.audience - 30, 0) + Math.floor(performance.audience / 5); default: throw new Error('unknown type: ${performance.play.type}') } } }
Finally, let’s improve the naming:
export default function createStatementData(invoice, plays) { const statement = {}; statement.customer = invoice.customer; statement.performances = invoice.performances.map(addPlayInfo); statement.totalPrice = statement.performances .reduce((total, perf) => total + perf.price, 0); statement.totalCustomerCredits = statement.performances .reduce((total, perf) => total + perf.customerCredits, 0); return statement; function addPlayInfo(performance) { const performanceClone = Object.assign({}, performance); performanceClone.play = plays[performance.playID]; performanceClone.price = calculatePrice(performanceClone); performanceClone.customerCredits = calculateCustomerCredits(performanceClone); return performanceClone; } function calculatePrice(performance) { switch (performance.play.type) { case "tragedy": let price = 40000; if (performance.audience > 30) { price += 1000 * (performance.audience - 30); } return price; case "comedy": let price = 30000; if (performance.audience > 20) { price += 10000 + 500 * (performance.audience - 20); } price += 300 * performance.audience; return price; default: throw new Error('unknown type: ${performance.play.type}') } } function calculateCustomerCredits(performance) { switch(performance.play.type) { case "tragedy": return Math.max(performance.audience - 30, 0); case "comedy": return Math.max(performance.audience - 30, 0) + Math.floor(performance.audience / 5); default: throw new Error('unknown type: ${performance.play.type}') } } }
And we have a lovely looking function with subfunctions that don’t overstate their importance, procedural calculations that aren’t coupled to class hierarchies but can easily be made to if needed, and naming that doesn’t mislead readers into prioritizing structural refactoring over in-place refactoring.
I still kept three subfunctions, though.
addPlayInfo(performance) is quite involved, but it’s also easier to understand in a self-contained fashion. Though I wouldn’t really object if it was also inlined.
calculatePrice and calculateCustomerCredits are heavily numerical, contain branches, and are also easy to reason about independently.
That said, I do want to point to an advantage with the polymorphic approach that my procedural approach lacks. My code checks the play types twice, once in calculatePrice , and once in calculateCustomerCredits . That’s a bit of duplication that could prove annoying with more play types. I can keep that in mind, while still holding off on abstraction until later.
I tried to keep the overall logic of the program the same as Fowler’s to make my point.
It takes true skill to look at a function and accurately assess whether or not it’s carrying its weight, that is, whether its simplification of the problem space is enough to justify its distraction and indirection.
Learn it.
So in summary, you should lean toward extraction when:
The function would be easy to reason about in isolation
The signature does a good job of implying how the function transforms inputs into outputs.
Duplication can be significantly reduced.
You want to standardize certain business logic.
You want to test said functionality on its own.
When it works better as an external dependency.
When you’ve done your due diligence in cleaning up your current function, but it’s still too complicated.
Certain details significantly distract from the overall function, and you want readers to get a high-level view before they worry about these details.
and lean away when:
You’re doing it just to abide by some arbitrary LoC metric.
You can’t think of a straightforward signature for the function, which hints at bad design or context overhead.
You’re unsure of whether your duplication is coincidental or not.
You haven’t tried to improve the readability of your overall function first.
The benefits outweigh the costs from Lesson 4.
Use your judgment, and don’t be bullied by people who prescribe specific line numbers.