Are short methods actually worse?

March 9, 2009 ⋅ 32 Comments »

I ran across an interesting post on programming.reddit called Anecdote Driven Development, or Why I Don’t Do TDD. The article focused on testing, but what I found most interesting was the part about how long a method or function should be:

I recently wrote some code for Class::Sniff which would detect “long methods” and report them as a code smell. […] Ben Tilly asked an embarrassingly obvious question: how do I know that long methods are a code smell?

I threw out the usual justifications, but he wouldn’t let up. He wanted information and he cited the excellent book Code Complete as a counter-argument. I got down my copy of this book and started reading “How Long Should A Routine Be” (page 175, second edition). The author, Steve McConnell, argues that routines should not be longer than 200 lines. Holy crud! That’s waaaaaay to long. If a routine is longer than about 20 or 30 lines, I reckon it’s time to break it up.

Regrettably, McConnell has the cheek to cite six separate studies, all of which found that longer routines were not only not correlated with a greater defect rate, but were also often cheaper to develop and easier to comprehend.

I’d never heard this before, but this is great, because it verifies what I’ve believed for a long time…

That which obscures my code is bad

Last year I wrote a post called If this is Object Calisthenics, I think I’ll stay on the couch where I argued (among other things) that making your methods as short as possible is NOT a good idea. My justification was that it just makes the code more complicated: “That which obscures my code is bad.” But this is even better…actual empirical evidence.

I don’t have a copy of Code Complete, so I did a bit more research to see if I could find the actual studies. I found a good summary here (links added by me):

McConnell cites the findings of several studies of the correlation between the size of routines and the cost and/or fault rate of routines. Some findings which favor longer routines are:

Routine size is inversely correlated with errors, up to 200 lines of code. [Basili and Perricone, 1984]

Larger routines (65 lines of code or more) are cheaper to develop per line of code. [Card, Church, and Agresti, 1986; Card and Glass, 1990]

Routines with fewer than 143 source statements (including comments) had 23% more errors per line of code than larger routines. [Selby and Basili, 1991]

Routines averaging 100 to 150 lines of code need to be changed least. [Lind and Vairavan, 1989]

Hmmm. It looks like the studies are all about 20-25 years old. I wonder if — or how — the results would apply now. I took a quick look at the papers (the ones I could get my hands on), and the programming languages used were: Fortran [Card ‘86], Pascal and Fortran [Lind ‘89], and a mix of custom languages (one being PL/1-like) and assembly [Selby ‘91].

Does anyone know of any more current results? (Greg?) It would be interesting to see if this can be shown with more modern languages. But intuitively, it makes sense. In her book Software Engineering: Theory and Practice, Joanne Atlee summarizes it nicely:

Card and Glass (1990) point out that the design complexity really involves two aspects: the complexity within each component and the complexity of the relationships among components.

By making your methods shorter, you’re just trading one kind of complexity for another.

Update: In the comments, Stephane Vaucher pointed to a much more recent study (from 2002): The Optimal Class Size for Object-Oriented Software. They point out that the conclusion that shorter methods are more error-prone is misleading, at best:

The observed phenomenon of smaller components having a higher fault density is due to an arithmetic artifact. First, note that the above conclusions were drawn based exclusively on examination of the relationship between fault density versus size […] However, by definition, if we model the relationship between any variable X and 1/X, we will get a negative association as long as the relationship between size and faults is growing at most linearly.

Another way of putting it is: short methods may have more defects per line, but they still have fewer defects overall. There may be a justification for not making methods too short, but these studies do not provide one.

The one sure thing is that the more code you write, the more bugs you will have.

32 Comments:

PP - March 9, 2009:

I'm curious what these authors would say about current JavaScripts. Many of them contain long functions within functions within functions... I must say I'm not exactly proud of my programming neatness, have felt that too much code goes into too large functions, but they all work smooth, and when you mention they're cheap; I have another great selling point for my employers ;-)
Neil - March 10, 2009:

Yeah, I found the same thing in my readings of Glass and Mcconnell and others .. most of the citations they use were from the 80s or maybe early 90s. Either these studies aren't done any more, or there is a knowledge transfer problem. What effect would an OO-language like C++ or Java have? Who are they studying? I think Jorge's post about not learning that much from empirical studies is pretty applicable here.
Yandu - March 10, 2009:

How does having one large component reduce the complexity within the component? I agree that you shouldn't artificially shorten your methods when it is not logical to do so, but splitting up one long routine into logical subparts makes sense. I would also argue that the complexity within the component does not really increase, it's just shifted around.
Binil Thomas - March 10, 2009:

By making your methods shorter, you’re just trading one kind of complexity for another.

When people talk of "short methods", they typically mean short private methods in a class. I think those methods make the code easier to read, and hence, easier to maintain. They do not increase the external complexity of the component.
Jax - March 10, 2009:

Length is only the symptom of the point IMO. It's like focusing on how wide a car should be EVER when what you're really interested in is how wide the roads are in your country and where you live.

The main reason that long code sucks is because it is usually repetative and by the end there is crap loads in scope. The two issues are DRY and having crap loads in scope. IF you can create an utterly unique method that is 200 lines + where no other methods need to re-use any of that logic inside then hoorah you have found a use case where it makes sense. I personally have never seen such a use case.

The usual 200 line + methods usually means that only the person who made it knows how it works and bugfixing bypassers can easily ruin things because its complicated. I wonder if the studies you looked at state which people created the errors and whether those people were the original author or other people picking the code up? I think that would be information very relevant to these stats.
Dennis Gorelik - March 10, 2009:

I think these studies were made incorrectly. If routine is too complex -- then developer is more likely to make it shorter (by splitting in parts). So, longer routines are usually doing simple things -- that's why they have less bugs and less cost of development.
A. L. Flanagan - March 10, 2009:

The Golden Rule for me is that a procedure should be testable. If a procedure is so complex that you can't write a comprehensive unit test, it needs to be broken up. Typically I have one procedure that does the control flow, and several procedures that actually "do something".
Jonathan Tran - March 10, 2009:

By making your methods shorter, you’re just trading one kind of complexity for another.

Agreed. If your methods are shorter and usable in more situations, then you are more likely to have more things depend on it, making relationships more complex.

When I was doing code maintenance, I definitely found long methods easier to read. Extracting a method creates another level of indirection. With long methods, I didn't have to keep looking up definitions of methods I was unfamiliar with, like terminology in a domain I was new to.

If, on the other hand, you just factor out a "private static", as it's called in Java, that isn't called anywhere else, I think it can help reduce complexity by (1) grouping it semantically, (2) clearly defining the code block's inputs and outputs, and (3) giving it a semantic name.

This is why for a long time, I have wished my programming language/editor allowed me to get the best of both by grouping things in sort of inline-collapsable functions, kind of like a "let" in functional programming, but named and with clearly defined inputs like a function.
Patrick - March 10, 2009:

@Neil: Got a link for the post you're referring to?

@Yandu: Take the extreme example of a single line of code. If you split that into a separate method, it now takes 3 lines to accomplish the same thing: method definition, the original line of code, and the method call. Not only that, but now I have to break out of the flow to go read what the method does. It's like trying to read text that is filled with footnotes -- you have to put the current thought on hold, go read the detail, then come back to the original location. Used sparingly it's okay but there is clearly a point where it becomes a hindrance.

@Binil Thomas: It's not just the external complexity that's important, for the reasons I just mentioned.

@Dennis Gorelik: That's a good thought. Maybe the routine length and its defect rate are both directly related to the complexity of the code. It's hard to say for sure.

@Jonathan Tran: Yeah -- the idea of collapsable inline functions is something I've wished for too. Or a related idea: if your editor could just inline the code for a method at the call site, with appropriate variable substitution. Might be tough but it could be really cool.
Stephane Vaucher - March 10, 2009:

This kind of study is still performed (often on open-source software). and tends to be of better quality. Studying the relationship between fault density vs size is somewhat treacherous since the normalising factor (of fd) is size itself. You can see check out:

"The optimal class size for object-oriented software"

El Emam, K. Benlarbi, S. Goel, N. Melo, W. Lounis, H. Rai, S.N. Inst. for Inf. Technol., Nat. Res. Council of Canada, Ottawa, Ont.; IEEE Transactions on Software Engineering May 2002 Volume: 28, Issue: 5 On page(s): 494-509 Abstract A growing body of literature suggests that there is an optimal size for software components. This means that components that are too small or too big will have a higher defect content (i.e., there is a U-shaped curve relating defect content to size). The U-shaped curve has become known as the "Goldilocks Conjecture." Recently, a cognitive theory has been proposed to explain this phenomenon and it has been expanded to characterize object-oriented software. This conjecture has wide implications for software engineering practice. It suggests 1) that designers should deliberately strive to design classes that are of the optimal size, 2) that program decomposition is harmful, and 3) that there exists a maximum (threshold) class size that should not be exceeded to ensure fewer faults in the software. The purpose of the current paper is to evaluate this conjecture for object-oriented systems. We first demonstrate that the claims of an optimal component/class size (1) above) and of smaller components/classes having a greater defect content (2) above) are due to a mathematical artifact in the analyses performed previously. We then empirically test the threshold effect claims of this conjecture (3) above). To our knowledge, the empirical test of size threshold effects for object-oriented systems has not been performed thus far. We performed an initial study with an industrial C++ system and repeated it twice on another C++ system and on a commercial Java application. Our results provide unambiguous evidence that there is no threshold effect of class size. We obtained the same result for three systems using four different size measures. These findings suggest that there is a simple continuous relationship between class size and faults, and that, optimal class size, smaller classes are better and threshold effects conjectures have no sound theoretical nor empirical basis
Rick Minerich - March 10, 2009:

I've been having ongoing conversations with one of my friends about this. I think the issue is nesting. If the function nesting goes beyond three layers deep in the same object, it gets hard to keep track of exactly what is going on. The answer might be to use more objects in these cases.

This does seem to have have implications for pure FP though.
Christian Carey - March 10, 2009:

I always approach function/method length as guidelines to trigger reviewing it for functional decomposition. The longer the method, the higher the chances you have code that should be broken out. It cannot be viewed or approached as a hard-fast rule that no function/method should be longer than X, or Y, or whatever.
Jason - March 11, 2009:
1. First, remove all processing logic from large flow control routines and delegate to routines which handle generic processing of the type required.
2. If helpful to simplicity refactor flow control routines into logical blocks to reduce excessive nesting (beyond 2 or 3 levels is generally smelly)
Neil - March 11, 2009:

Yeah, it was http://catenary.wordpress.com/2009/02/25/experimentation-and-argumentation/
Jonathan Allen - March 11, 2009:

Another way of putting it is: short methods may have more defects per line, but they still have fewer defects overall.

So what? That doesn't address the question.

When asking if four 50-line functions is better than one 200-line function, you can't propose a single 50-line function as a third option. Obviously if 50 lines of code would suffice, we wouldn't be asking how to structure 200 lines of code.
Newtopian - March 11, 2009:

Methon length, like object length, method coupling, return count or any-other-metric-that-when-exceeded-mean-you-are-coding-like-an-idiot are to be taken with a grain of salt. They are all thumb rules to be stretched and broken.

Of course shorter methods are generally better than long ones... I'll give you a main that stretches 60k lines deep that does everything and I dare you to tell me this is the best way to do it.

On the opposite side. Sure we can always break methods to be 10 lines max... but that too would be hell to read.

The ideal code lies somewhat in between and yes... there will be methods that will be a single return and there will also be methods that will span 300 lines and both will be quite readable and quite correct the way they are.

I'm not saying the goldilocks metrics are just garbaged uttered by statisticians in search for self enlightment, just that they are guidelines, things one should know and apply until one gains enough experience to make their own damn mind to recognize a stinkin pile of code for what it is.

Code is not badly written because it is too long, short or not indented properly, however these hints sure help in finding it...
Geth - March 11, 2009:

It amuses me somewhat that we make such a fuss over such trivial things. A 'method' should be as long as it needs to be to as long as it 1) does what it's supposed to; 2) runs a faster as it can realistically run; 3) does not reuse the same code over and over again; and 4) is easy enough to understand so that a new developer can fix / change it as required.

Any over analysis is overkill and simply clogs up the whole process. Remember, your code isn't a work of art, it's something to get something else done that will probably never be looked at by anyone.
Theo - March 11, 2009:
- Methods should be as short as possible
- Your methods should be well named and easy to understand
- Classes should have as few methods and members as possible
- Your class structure should be well defined and easy to understand
- Your higher-level classes/methods should be immediately obvious
If you follow those guidelines you will have a good time. If you don't, you'll live in class hell and no-one will understand what your 3000 classes do, how they fit together or which of your 20+ methods per class they need use.

That said, it also helps if you're using a language that doesn't require 10 lines just to do something trivial.
Jimmy the Geek - March 11, 2009:

Look for patterns, then create interfaces that can be completely and automatically unit tested.
Andrew - March 11, 2009:

Short routines will contain fewer bugs than large methods assuming a similar level of complexity

You don't need an empirical study to tell you this - its obvious. A human mind programming is like a peep-hole optimizer - it can only keep so much complexity in it at any one time. Breaking things up into smaller simpler parts helps munch through the task of writing stable software.

I'm saddened that experienced developers would question their own judgment on account of an empirical study likely conducted by non-experts.
Jesse McNelis - March 11, 2009:

The length of a function is irrelevant. The important factor in writing a function is that it do one specific job. Making breaking a function up in to smaller parts for no reason other then to make it short just makes it more difficult to read because when reading it I'll still have to go and find those other functions and read them too to actually know what is going on.
Fadzlan - March 11, 2009:

@Dennis Gorelik I disagree to a point. Basically, if the routine is simple enough, cannot be generalized, cannot be reused, can easily be tested and actually quite expressive, I'll leave it at that (lengthy code). No need to fix unbroken things I suppose.

But so far, this has been easily abused for so long that I know of. Maybe I am not that old, but its so easy for me to encounter any piece lengthy code, more than 200 lines, magically calling here and there, a lot of side effects and hacks over the years, and really painful to test.

That being said, generally (as in life), there are always two polar opposite for any practice, in which anybody most likely argue, the best lies in between of the opposites.

As for me, I always check why any particular rule is created and what purpose does it serve. As long as I can agree with the purpose, then I will apply the rule as long as it meets the purpose, and discard it when its not.

In this case, shorter code is more expressive, easier to test and maintain, and have higher chances of reuse (depending on how you write of course). There may be times when shorter codes doesn't meet all/part of the criteria. And when that happens, judgment calls.
Fadzlan - March 11, 2009:

@Jason

Amen to that. Even when I write shorter code, I would still have to be aware not to obfuscate the code. That would defeat the purpose IMO. If things are nested ten level below just because you can, then it seems something is not right to me.

Example would be, refactoring a 10 lines routine to a 10 stacks deep, with two line routine in each method/function (one line for routine, another to call deeper into the stack). I don't know how would something like this would be good, as was preached by some people. If I ever come across something like this, I would sure raise an eyebrow.
Patrick - March 11, 2009:

@Neil: Thanks, that's a great post! You're right, it definitely applies here.

@Jonathan Allen:

When asking if four 50-line functions is better than one 200-line function, you can’t propose a single 50-line function as a third option. Obviously if 50 lines of code would suffice, we wouldn’t be asking how to structure 200 > lines of code.

You're absolutely right. The problem is that it's really hard to determine that empirically. The studies I cited did something else: they compared module sizes within a program to see which ones were most error-prone. Unfortunately it looks like they made a pretty big mistake in their analysis.

@Everyone else:

Lots of good thoughts, rules of thumb, etc. We've all got our own heuristics and guidelines about when to break a method up into several smaller ones. The main point of this post was that maybe there is an empirical difference between the various choices. We can argue forever about what each of us thinks is the "right" way to do things, but it would be really nice if we could back it up with some actual science. Unfortunately, we don't seem to be able to yet.
Imagist - March 11, 2009:

The basic problem with those studies:

http://xkcd.com/552/
Terry - March 11, 2009:

"the more code you write, the more bugs you will have. " So actually this really implies that making shorter methods is good because if an effort is made to implement the same behavior as before with shorter methods you should end up with less code and smaller amounts of code to overview in case of error.It's ironic that in your final conclusion and update you go right back to an argument that actually supports the opposite of your entire argument though. Creating shorter methods that only do one thing is actually a good practice and these "studies" just seem to be far off the mark.
Patrick - March 11, 2009:

@Terry: Yep, that's why I provided an update. The studies concluded that while there is a correlation between routine length and number of defects, there is a lower limit to how short your routines should be. In the paper linked in the update, they showed why that conclusion was incorrect, and reiterated that the only certainty is: the more code you have, the more defects you will have. Now, what that doesn't address is whether 1000 lines of code is better to be organized as 10 functions of 100 lines each, or 100 functions of 10 lines each. The older studies seemed to provide an answer, but it turns out their methods were flawed, as I pointed out in the update.
Brent - March 12, 2009:

I have found that the more large chucks of code I write = the more bugs. Breaking things down into small chucks of code really helps with that problem. It is easier to identify problems in smaller lines of code. Having methods to do various parts of what you are trying to achieve really makes your code versatile. And it's all about having versatile code in this OO world we live in. That is what they call divide and conqueror.

I totally agree methods names should not be too short nor should they be too long. I don't think there is a golden rule here i.e. 17.5 characters long. It should be long enough for another programmer to know exactly what it is. Good post.
ncloud - March 13, 2009:

I tend to follow this rule: a routine should do one thing, and do it well. If it takes 5 lines to do it, great. If it takes 50, great. If it makes sense to refactor the routine at some point and break it down into smaller routines, I will do it. Routines should be evaluated in terms of responsibility, and not length necessarily.
Software Quality Digest - 2009-03-13 | No bug left behind - March 13, 2009:

[...] Are short methods actually worse? - “Another way of putting it is: short methods may have more defects per line, but they still have fewer defects overall.” [...]
Chris - March 16, 2009:

I've been out of town ... so I just saw this. I haven't read through everything, so maybe I'm repeating something already said.

I always read this rule to mean that short methods are better when it makes sense to split up a method. I never thought that splitting up a method into 10 sub-methods really proved anything. Just taking every 5 lines of code and breaking it out is going to make things harder to read and it will be just as hard to test.

Usually, the trick with long methods and long classes is they break the Single Responsibility Principal . . . and I do feel that it's worth looking at. And from my experience, this is what the TDD world is talking about.

Breaking code into classes that focus on a single task makes a given piece of code easier to understand and much easier to test in isolation. Even if this process is more error prone, a proper unit testing is more easily available on this kind of code, and if it is in place, the components should all work as expected with no little mistakes.

But, my main point is - I have never looked at it as an issue of method length. Long methods are fine, if that's the best way to write them. If they're carrying out a single task and it takes 100 lines of code to do that, then it should be left be. Subroutines can sometimes make these big functions more readable, but they shouldn't just be used just to reduce method size.

With regards to testing, breaking a method into 1 big public method and 5 private sub routines doesn't actually help anything, as you can't test private sub routines directly. You're still feeding inputs into the big public method and forcing it along all the various paths . . . and those are arguably harder to see if it's broken into too many subroutines.

Anyway ... I'm going to read some more of the literature. But, I had to blather first ;)

Chris.
Pencho - May 21, 2010:

Interesting discussion! I like the "reading a book with a lot of footnotes" one :)

I try to think about design patterns, apply them on class level and structure my code according to them. Always ask myself "will I need this functionality somewhere else?" and if the answer is yes, I put it into a separate method (I call such methods cross-cutting concerns on class level). I recently wrote a method that generates a RSS feed using XmlWritter(.NET). The method was long (about 200 lines) but it was doing one atomic piece of work and breaking it down would reduce readability (as oppose to what a colleague of mine suggested) and decrease performance (a counter-argument by another colleague of mine).

Are short methods actually worse?

That which obscures my code is bad

32 Comments:

PP - March 9, 2009:

Neil - March 10, 2009:

Yandu - March 10, 2009:

Binil Thomas - March 10, 2009:

Jax - March 10, 2009:

Dennis Gorelik - March 10, 2009:

A. L. Flanagan - March 10, 2009:

Jonathan Tran - March 10, 2009:

Patrick - March 10, 2009:

Stephane Vaucher - March 10, 2009:

Rick Minerich - March 10, 2009:

Christian Carey - March 10, 2009:

Jason - March 11, 2009:

Neil - March 11, 2009:

Jonathan Allen - March 11, 2009:

Newtopian - March 11, 2009:

Geth - March 11, 2009:

Theo - March 11, 2009:

Jimmy the Geek - March 11, 2009:

Andrew - March 11, 2009:

Jesse McNelis - March 11, 2009:

Fadzlan - March 11, 2009:

Fadzlan - March 11, 2009:

Patrick - March 11, 2009:

Imagist - March 11, 2009:

Terry - March 11, 2009:

Patrick - March 11, 2009:

Brent - March 12, 2009:

ncloud - March 13, 2009:

Software Quality Digest - 2009-03-13 | No bug left behind - March 13, 2009:

Chris - March 16, 2009:

Pencho - May 21, 2010: