24/07/2016

Report from the battlefield #5 - Logging can kill performance

Home

Public Domain, https://commons.wikimedia.org/w/index.php?curid=48390
Source: own resources, Authors: Agnieszka and Michał Komorowscy

So far in Report from the battlefield series I wrote about my experiences as an expert in the recruitment company. This time I'll write a bug that I found in the production. It was all about the performance. The problem was that in the new version of an application one operation slowed down about 6 times. Initially, I suspected that amount of data simply increased considerably or some network problems. Fortunately, I easily reproduced the issue on my dev machine. Reproducing a problem is half the battle. Though performance problem are usually difficult to analyse so I was ready for a long investigation.

I started stepping through the code with a debugger just to see what is going on. Everything seemed to be ok until... One of the final operations was to log into a file what was retrieved from a database. What's important the log level was set to Trace so even large amount of data shouldn't matter in the production. Why? Because in the production, precisely because of the performance reasons, the logger should be configured not to log everything to a file. In other words it should ignore messages usually with the log level = Trace or Debug. However, after I had pressed F10 (Step Over), I had to wait a few seconds till the logging ends. BINGO!

My first though was that someone configured the logger in the wrong way in the production. Typical PEBKAC problem. To verify my hypothesis I changed the configuration of the logger and executed the problematic operation. Unfortunately, the problem occurred again. Another look at the code and I know what was wrong. And do you already know?

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

The problem was that for large amount of data the application required a few seconds just to create/prepare a message for the logger. To make thing worse this message was created, regardless if it was later used by the logger or not. During development it may acceptable but not in the production! There are 2 potential solutions of this problem. Details depends on the logging framework:
  • The first approach is to simply check the logging level before creating/preparing a message e.g.:
    if(Logger.LogLevel == LogLevel.Trace) 
    {
        /* Prepare and log a message */
    }
    
  • The second approach is to use deferred execution for example lambdas e.g.:
    Logger.Trace(() => /* Prepare a message */).
    If a logger supports this syntax, a lambda will be executed if and only if it is required.

17/07/2016

The longest project

Home

Source: own resources, Authors: Agnieszka and Michał Komorowscy

I haven't been blogging for 4 months and it's the longest break I've ever had. Why? Was I sick? Did I have no ideas what to write about? Did I have no time? Did I have too much work? Fortunately, nothing of that. The reason is completely different and probably surprising. So, I finished the longest project in my life.
  • The project that I started in 2009.
  • The project that for all these years was somewhere in my mind.
  • The project that I wanted to abandon over a dozen of times.
  • The project that took hundreds or thousands of hours.
  • The project that allow me to learn a lot of.
  • The project that I would have done in a different way if I had had this chance.
  • The project of which I'm extremely proud.
  • The project after which I simply had to rest.
What could it be? The answer is PhD in Computer Science. On 12 April 2016 I defended my doctoral dissertation, written under the supervision of Professor Janusz Sosnowski, under the title:

Methods of analysis of information systems based on logs of historical debuggers

Even now I remember how relieved and happy I felt then :)

In my work I focused on the problem of storing and analysing of data collected by historical / reversible debuggers. I performed a detailed analysis what could be and what should be improved when it comes to working with them. In the result I proposed new models of representation of execution traces and I implemented tools that facilitate working with data recorded by historical debuggers. I also performed experiments showing advantages of my ideas. It was a really, really huge job.

Now you may want to ask some questions:
  • Was it worth it?
  • Why did you do so?
  • Did you work professionally at the same time?
  • How did you share time between PhD studies with your work? Is it possible at all?
  • What did you actually gain?
  • How to start PhD studies?
  • How much could I earn at the university?
  • Would you continue your science career?;
  • Why you didn't write about PhD earlier
  • And many, many more.
I plan a series of post about doing PhD in the computer science. Many topics will be specific for Poland but many will be general. I want to do that because of two things. Firstly, it'll be a form of therapy for me :) I simply want/need to write about something that was so important to me for such a long time. Secondly, I think that there are not so many blogs/articles about writing PhD so it should be simply useful for others.

If you have any specific questions just let me know.

18/03/2016

Two things I learned about HTML and CSS

Home

I've never worked a lot of with CSS. However, from time to time I do something with it, for example in order to check out new possibilities. Recently, I read about CSS transformations and I decided to give it a try. For the beginning I wanted to achieve a very simple effect i.e. a red square with a blue and a green diagonal lines. It sounds simple and it is simple but there are traps in this exercise. I decided to write this post because it took me a moment to figure out that was wrong. It was also difficult to find a solution in Internet. Maybe because it is so obvious ;)

My idea was to use 3 div elements. One for a square and 2 for diagonal lines. I also wanted to use transformations in order to rotate divs so that they look like diagonals. My first attempt looks as follows. Do you know what is wrong? There are 2 main problems here.

Scroll down if you want to see a correct solution:
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.


I changed two things, one in html and one in CSS:
  • The first problem was that I used div as a self closing tag. It is not allowed. Browsers treat <div /> as <div >. It is quite difficult to spot.
  • The second problem was in the greenLine style. It was not enough to rotate the green div by 45 degrees. Firstly we need to translate it in this way: translate(100px,-141px) rotate(45deg). It might be surprising because in the case of the blue div the rotation was enough. However, we we have to remember that the green div is not located in the origin of the coordinate system but just below the blue div. The blue div looks like a thin line but it's height is set to 141 pixels.

14/03/2016

Report from the battlefield #4 - Do not waste my and your time

Home

The Report from the battlefield series is based on my experience as a reviewer. The idea is simple. In order to evaluate programming skills, a candidate is asked to write a simple project. To do so he/she needs to invest some amount of time (roughly speaking a few hours). Taking this into account I assume that he/she must be interested in finding a new job. Otherwise he/she wouldn't spend his/her private time writing a project which rather isn't extremely exciting. The more I'm surprised why some people doesn't care about the first impression.

Here are some examples showing what I'm talking about:
  • A connection string used by the application referred to some server that of course wasn't available to me.
  • The database used by the application didn't contain any sample data.
  • I had to manually create a database used by the application. There was a script but it didn't work without fixes.
  • The application crashed immediately when started.
  • ...
It's wasting of time from my perspective. It is true that all these problems can be fixed quickly but they require additional effort from me. You can believe me that it is extremely annoying. Instead of making an actual review someone forces me to fix bugs. What's the worst these bugs could be avoided easily with a little bit more effort.

Please remember, the first impression is important. It'll be appreciated if a reviewer can run your application just by pressing F5 in Visual Studio (or in another IDE). You can test it in a straightforward way. Before submitting a project to a review, copy it to another machine and try to run it there. It should work without any additional actions.

Currently, if a project cannot be run without problems I don't make a review. However, I have a soft heart and I give a candidate one chance to fix them. Do you think that it's a good approach? I have my doubts because an employer probably wouldn't do so.

27/02/2016

Tips & Tricks: How to tell VS to modify variables in the runtime for us?

Home

Today, I'd like to share with you a simple but useful trick. Imagine yourself that you are debugging an application and you find a place with the following very simple code:
            
var flag = ReadConfiguration();
if (flag)
{
   //...
}
else
{
   //...
}
The problem is that the flag variable is set to false but you need to check what would happen if it is set to true. Of course you can easily change the value of this variable in Visual Studio. But what would you do if this kind of code is executed dozens, hundreds... of times and every time the flag variable must be set to true? One solution is to modify a configuration, another might be to change the source code. However, all these things require an additional action. It would be much more better to tell Visual Studio to do it for us. How? In order to achieve desired effect we can utilize breakpoints and custom actions. I'll show how to do it in Visual Studio 2015.

Firstly, put a breakpoint in the line with if.


Right click the breakpoint and from a context menu select Actions... Then in the text box enter {flag = true}. You can even use IntelliSense here. At the end click Close button.


An that's all. Now, if you run the application under debugger control a flag variable will be set to true whenever a line with a breakpoint is executed. What's more this trick also works with other types of variables and you can also execute methods in this way e.g.:


At the end I want to say 2 things. Custom actions are usually used to write diagnostic messages to Output window. This trick works because in order to write a message Visual Studio has to execute some code and this code can have side effects. Besides, you can also use this trick in older versions of Visual Studio. The only difference is that from a context menu you need to select When Hit... option.

21/02/2016

My list of online editors

Home

Online editors (testers, debuggers) are awesome if we want to quickly test some code. They are also very useful to check our solution when we want to post an answer on Stack Overflow. Here is my collection of various online editors that I encountered, though personally I use only some of them.

I'm publishing it because it can be helpfull for others and because I'd like to have this list easily accessible in Internet. Of course this list is not complete and there are many other editors. If you know something interesting let me know and I will add it here.


Online Editor Language / Technology Share function Collaborate function
yUMLUMLYes
draw.ioDiagramsYes (via Google docs)Yes
moqupsUI moqupsYesYes
ideoneC#, Java, Haskell, C++, Ada and many otherYes
SQL FiddleSQL (MySQL, Oracle, PostgreeSQL, SQLLite, MSSQL)Yes
regular expressions 101Regular expressionsYes
.NET FiddleC#, VB.NET, F#YesYes
C# PadC#
D3.jsD3.js Java Script library
CodePenHTML + CSS + JSYes
jsfiddleHTML + CSS + JSYesYes
JS BinHTML + CSS + JSYesSoon
CSS DeckHTML + CSS + JSYesYes
LiveweaveHTML + CSS + JSYesYes
PlunkerHTML + CSS + JSYesYes
cpp.shC++Yes

By share function I mean a possibility to create a pernament link to our code. Collaborate functions allows a group of developers to write a code together.

16/02/2016

Interview Questions for Programmers by MK #7

Home

Question #7
You have the following code that uses Entity Framework to retrieve data from Northwind database. Firstly it finds customers that are from London and then process their orders. All data model classes were generated with the code first from database approach. Unfortunately, this code contains a bug that can lead to performance problems. Identity this problem and propose a fix.
using (var ctx = new NorthwindContext())
{
   var londoners = ctx.Customers.Where(e => e.City == "London");
   foreach (var londoner in londoners)
   {
      foreach (var o in londoner.Orders)
      {
         foreach (var d in o.OrderDetails)
         {
            //....
         }
      }
   }               
}
Answer #7
This code is a classical example of N+1 select problem where too many queries are sent to a database. The first query will be sent to a database in order to find customers from London. Then for each customer another query will be sent to read orders. Finally, for each order another query will be sent to retrieve details of a given order. Instead, all data could be retrieved by sending only one query. To do so we need to tell EF that we want to read customers together with their orders and orders details. It can be achieved with Include method.
using (var ctx = new NorthwindContext())
{
   londoners = ctx.Customers.Include("Orders").Include("Orders.Order_Details").Where(e => e.City == "London");
   ...
}
A quick test will show that originally 53 queries are sent to a database and after a fix only 1.

07/02/2016

Report from the battlefield #3 - IEnumerable vs IQueryable

Home

Sometime ago I was reviewing the data access layer that was based on Entity Framework. I found a code which immediately attracted my attention. The simplified version is shown below.
public IList<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... }).ToList(); 
}
...
var numberOfProducts = GetAll().Count();
GetAll method is pretty simple because it just reads products from a database. The result returned from this method is used to count number of products in the database. Although it is simple it contains a serious bug. The problem is that it uses ToList method to return a list of products. It causes that ALL products must be retrieved from the database in order to return them in the form of a list. In other words there is no deferred execution here.

If we work with a local database and the number of products is small it shouldn't be a problem. However, this kind of code might lead to difficult to analyse performance problems. For example if our application uses a remote database and/or there are thousands of products. The desired behaviour is that products are counted by a database engine. So let's try to make a fix:
public IEnumerable<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... });
}
...
var numberOfProducts = GetAll().Count();
Now it looks much more better, doesn't it? GetAll doesn't use ToList and returns IEnumerable. interface. Unfortunately this solution is far from being perfect. In comparison to the first version, the only difference is the moment when all products are retrieved from the database. This time it will happen when Count method is executed. Why? Before I'll explain let's see the correct solution:
public IQueryable<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... });
}
...
var numberOfProducts = GetAll().Count();
This time I used IQueryable instead of IEnumerable. This small change is crucial. It causes that no products are read from a database. Entity Framework "sees" that we only wanto to count number of products and an appropriate query is sent to a database. In other words LINQ To Entities is used.

The situation is completely different when we work with IEnumerable. In order to understand a difference we have to realise one thing. Count method for IEnumerable is something different than Count method for IQueryable. With IEnumerable we use LINQ To Objects and LINQ To Objects operates on objects in memory, it cannot communicate with a database. It is why all products must be read from a database if we want to count them.

Now someone inquiring can say that for virtual methods it shouldn't matter if we have variables of type IEnumerable or of type IQueryable if these variables points the same objects. After all C# is an object oriented language that supports polymorphism etc. Well, it is all true but only for virtual methods and Count is not a virtual method. It is an extension method and extension methods don't support polymorphism.

05/02/2016

Sandbox Database Manager

Home

My colleague Tomasz Moska published very nice tool that makes management of development MSSQL sandbox databases very easy. It is called Sandbox Database Manager and you can download it here or from GitHub.

Why is it worth recommending? Try to imagine yourself situation like this. A tester found a bug in the application. In order to reproduce it you need a copy of his database from a system test environment. With Sandbox Database Manager you can make a copy of this database and restore it on a selected server with just a few clicks. Another click or two and you have a snapshot created. Thanks to that you are be able to revert the database to its original state at any time. Now let's assume that this database contains hundreds of tables and you don't know all of them. To investigate a problem you want to run an application and see which tables (probably dozens of them) will be updated and how. Sandbox Database Manager also supports this scenario because it'll allow you to track data changes at the column level.

These are only a few features of Sandbox Database Manager. It can do much more, for example to run the same query against many databases or compare data between two databases. I can guarantee that Sandbox Database Manager is a really, really helpful tool because I use it in my day to day work. I recommend it without any hesitation. What is the best you can use it completely for free!