Those who are working on High Available systems/databases consider Redundancy as one of the possible ways to achieve high availability. Redundancy in this case is helping in positive way. But consider the other side of it- In a high available database systems the data is replicated across different nodes. Any change/update at one node has to be propagated to all other nodes, failure during replication can cause mismatch in the data at different nodes. This shows redundancy is both good and bad. But why I am writing about high available systems when the title says “good code”. Because when it comes to redundancy in code- there is NO goodness in it.
The most common tendency of every programmer is to Copy-Paste. I remember reading somewhere that programmers have to be lazy, but where ever I read it the author didn’t mean that being lazy is to Copy-Paste. Instead its to write code which is concise and clear. One of the possible reasons why we tend to Copy-Paste is because the design of the existing code doesn’t facilitate reuse. Now any change to code at one place would trigger same change at multiple places. This is one of the most common ways of introducing redundancy.
The other possible reason would be- use of same algorithm at different places but with a different set of data each time OR a slight change in the order in the algorithm at different places. The first reaction to this would be- how can this be redundant, there’s no same code? The redundancy here is of the idea/logic. Suppose the algorithm has to change, then this change would have to be done at all the places where it is being used.
How is redundancy going to affect the code?
- The code becomes fragile – The developer might not be aware of all the possible copies of the code, may be because he is new to the system, and hence might miss fixing the code at few places and this can lead to a broken functionality. So any change to be made has to be done very carefully.
- The code is hard to maintain and extend – with no option of resue, copy-paste of code will become a rage. There would be no element of reuse. And it would also add to the number of lines of code.
- The code becomes hard to read – especially in case of redundancy of algorithms/logic. The person reading the code would have no idea why the same algorithm is written in 2 different ways at 2 different places. It creates a grey area in the code.
These were some of the ill-effects of redundancy that I could think of.
I would like to give an example of redundancy in a SQL statements-
IF something IS NOT NULL THEN SELECT somecolumn, someothercolumn, onemorecolumn FROM sometable WHERE somecolumn = something AND condition1 AND condition2; ELSIF someotherthing IS NOT NULL THEN SELECT somecolumn, someothercolumn, onemorecolumn FROM sometable WHERE someothercolumn = someotherthing AND condition1 AND condition2 AND condition3; END IF;
(I did copy paste the queries to create the second query but edited the condition in WHERE clause)
Suppose I need to change the condition1 or condition2, then I would have to change at 2 places. This example looks obvious and looks like an easy change, but imagine the query being more complicated, and there are 4 or 5 such if … elsif. It would take sometime to understand what each of those query does only to find out they are all the same with a few changes in the WHERE clause. Let me try to remove the redundancy:
SELECT somecolumn, someothercolumn, onemorecolumn FROM sometable WHERE (something IS NULL OR somecolumn = something) AND condition1 AND condition2 AND (someotherthing IS NULL OR ( someothercolumn = someotherthing AND condition3 ));
I pulled out common elements together and put the differing elements together and their usage being decided by value of something or someotherthing.
How can we eliminate redundancy?
- When ever someone sees something being repeated then its always better to refactor that code, the aim should be to improve the quality of the code with each check in.
- Refactor out the common behavior into a method or a class. There are some refactoring moves explained in the Refactoring book by Martin Fowler. There’s also another book on Refactoring databases.
- Use various design patterns to refactor the code. This is especially useful when the redundancy is due to a design flaw and not just due to copy-paste of code.
- And never ever tend to copy-paste the code from a different place.
Now there can be few concerns- what if I refactor and end up breaking the functionality? This happens usually when the features are not backed by corresponding tests. If each of the feature has automated tests, then one can easily identify if the particular refactoring is harmful or not. That is why writing automated tests is so much important.
A related popular principle is: Dont Repeat Yourself (DRY)
This was a short write up based on my limited experience and exposure. Please feel free to add your comments/concerns and share your ideas about redundancy and how it limits your productivity.
Related material/Recommended Reading:
- Shalloways Principle taken from Essential Skills for the Agile Developer: A Guide to Better Programming and Design
- Design Patterns: Elements of Reusable Object-Oriented Software
- Refactoring: Improving the Design of Existing Code