Report from the battlefield #11 - premature optimization is the root of all evil?

Have you ever heard that "premature optimization is the root of all evil"? Probably yes. It's quite well known Donald Knuth's phrase. However, the whole cite is much less known:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Why Am I writing about that? Because recently I had on occasion to fix an application which was written according to the fist part of this cite. Or even worse it was written according to the rule "any optimization is the root of all evil". Here are some examples what not to do and some tips what to do.

Reuse data

If you already retrieved some data from DB use it. It may sound obvious but apparently it isn't. I found the code that was working in the following way:
foreach(var obj in GetDataToProcess())
   // Let's do a lot of stuff    

// ...

public void Process(int id) 
   var obj = GetById(id);    
In this example, someone firstly reads some data from DB and then each found object is retrieved again from DB based on its identifier. In this particular example, data returned from GetDataToProcess were exactly same as data returned from GetById. It's crazy and to make things worse GetById method was implemented in blasphemous way.

Do not read more data than needed

What do you think about such an approach to reading just one object from DB:
public IEnumerable<SomeObject> GetAll() 
   // Read all data from DB and return

public SomeObject GetById(int id) 
   return GetAll().SingleOrDefault(o => o.Id == id);
It's more than horrible. In order to read just one object from DB we read all of them (in this there were thousands of them) and then we filter them in the code.

Use dictionaries

Let's assume that GetById method from the above is implemented correctly i.e. it reads just one particular object from DB. It's much more better. But what if we must read thousands of objects in this way. Communication with DB costs a lot. It's much more better to read all data at once (if we have enough memory). This is one of the possible solutions:
private IDictionary<int, SomeObject> _cache; 

public void Begin()
   _cache= GetAll().ToDictionary(o => o.Id,o => o); 

public void End()
   _cache =  null; 

public SomeObject GetById(int id) 
   if(_cache != null)       
      return _cache[id]; 
   // Read object from DB 
If we know that GetById will be called many times, we should call Begin to cache data first. Thanks to that each call to GetById will read data from the cache instead of from DB.

Test your code with real data

All these bugs are not a problem if we work with small amount of data. But we should never assume that the same will be in the production. So always remember to test you code with real data. The simple rule of thumb may be.

Run your application and wait. If you start becoming irritated or bored, it may mean that something is wrong ;)

Do not use repositories

It's quite a common pattern to hide all the communication with DB in so called repositories. However, it's controversial to use this pattern together with ORMs. Why? Because ORMs are actually the implementation of the repository pattern so why to hide them behind another layer of repositories. Besides if ORM code is hidden in the repository it's easier to overlook the crappy implementation of some method of this repository. For example, see the initial implementation of GetById method.

By using these relatively simple techniques I was able to speed up the application considerably and decrease the processing time from around few hours to a few minutes.

Remember, the fact that premature optimization is the root of all evil doesn't mean that you can write crappy code.

*The picture at the beginning of the post comes from own resources and shows a toucan in Warsaw Zoo.


.NET Developer Days 2017

In the post .NET Developer Days 2016 - Grand finale I wrote that it hadn't been my last my last .NET Developer Days conference. Recently, I've been asked again to become a media partner of this year's edition so I agreed without much hesitation. Disclaimer: It also means that it is a sponsored text.

The well known aphorism says that perfect is the enemy of good. The organizers of the conference must have heard that because the form of the current edition will be similar to the previous one i.e.:
  • What: 3 tracks with session about different topics and of different difficulties.
  • Where: EXPO XXI Exhibition Center – Warsaw, Prądzyńskiego 12/14
  • When: 18th-20th October 2017. 18th October is reserved for full-day training sessions (so called pre-conf) and the actual conference will start on 19th October.
  • Language: 100% English
The agenda has not been published yet and we have to wait until June. However, the last time the range of topics was wide and I didn't have any problem to find something interesting from my perspective. I keep my fingers crossed this year will not be worse. What is very important the earlier you buy tickets the less you will pay. If you are interested do not think too much. The current price for the conference without pre-conf is 275€. In June 325€ it will be and in August 375€.


The best and the worst thing when doing science

A few months ago, I returned (partially) to the university. I'm working in the project in the field of the computer vision for Google company. The project is related to Google Tango technology and is really interesting. However, within these few months there were also moments when I was really fed up. The same happened when I was doing Ph.D. so I started thinking what I like the most in doing science and what I don't like.


How I removed 50% of the code

Title: Azuleyo tiles somewhere in Portugal, Source: own resources, Authors: Agnieszka and Michał Komorowscy

My last 2 posts were about problems with using Roslyn. Nonetheless, even if I sometime hate it, I'm still using it so the time has come to show some practical example of using Roslyn. Recently, I've been working on the task that can be summed up as: Take this ugly code and do something with it. i.e. more or less the refactoring task.

Now I'll give you some intuition of what I have to deal with. The code that I have to refactor was generated automatically based on XML schema. These were actually DTO classes used to communicate with the external service. Here are some statistics:
  • 28.7 thousands lines of code in 23 files.
  • 2200 classes and 920 enums.
  • Many classes and enums seems to me identical or very similar.


Why I hate Roslyn even more

In my previous post I wrote about my problem with "empty" projects and Roslyn. The symptom was that in some cases according to Roslyn my C# projects didn't contain any files. For quite a long time, I haven't been able to find a solution. Especially because I couldn't reproduce problem on my local machine. Fortunately, today I noticed exactly the same problem on another computer.