Showing posts with label Report from the battlefield. Show all posts
Showing posts with label Report from the battlefield. Show all posts

18/05/2017

Report from the battlefield #11 - premature optimization is the root of all evil?

Home



Have you ever heard that "premature optimization is the root of all evil"? Probably yes. It's quite well known Donald Knuth's phrase. However, the whole cite is much less known:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Why Am I writing about that? Because recently I had on occasion to fix an application which was written according to the fist part of this cite. Or even worse it was written according to the rule "any optimization is the root of all evil". Here are some examples what not to do and some tips what to do.

24/03/2017

Report from the battlefield #10 - fuck-up with AutoMapper

Home



Have you ever heard or used AutoMapper? What a question, of course you have. And in the very unlikely scenario that you haven't, it's the object to object mapper that allows you to map probably everything. In short no more manual, boring, tedious, error-prone mapping.

However, the great power comes with great responsibility. In the recent time, I had an occasion to fix 2 difficult to track bugs related to improper usage of AutoMapper. Both issues were related to the feature of AutoMapper which according to me is almost useless and at least should be disabled by default. Let's look at the following 2 classes and testing code:
public class SomeSourceClass
{
   public Guid Id { get; set; }
   public string IdAsString => Id.ToString();
   public string Value { get; set; }
}

public class SomeDestinationClass
{
   public Guid Id { get; set; }
   public string IdAsString => Id.ToString();
   public string Value { get; set; }
}

class Program
{ 
   static void Main()
   {
      Mapper.Initialize(config => config.CreateMap<SomeSourceClass,SomeDestinationClass>>());
      
      var src = new SomeSourceClass { Id = Guid.NewGuid(), Value = "Hello" };
      var dest = Mapper.Map<SomeDestinationClass>(src);

      Console.WriteLine($"Id = {dest.Id}");
      Console.WriteLine($"IdAsString = {dest.IdAsString}");
      Console.WriteLine($"Value = {dest.Value}");
   }
}
This works as a charm. If you run this example, you should see output like that:

Id = a2648b9e-60be-4fcc-9968-12a20448daf4
IdAsString = a2648b9e-60be-4fcc-9968-12a20448daf4
Value = Hello

Now, let's introduce interfaces that will be implemented by SomeSourceClass and SomeDestinationClass:
public interface ISomeSourceInterface
{
   Guid Id { get; set; }
   string IdAsString { get; }
   string Value { get; set; }
}

public interface ISomeDestinationInterface
{
   Guid Id { get; set; }
   string IdAsString { get; }
   string Value { get; set; }
}

public class SomeSourceClass: ISomeSourceInterface { /*... */}

public class SomeDestinationClass : ISomeDestinationInterface { /*... */}
We also want to support mappings from ISomeSourceInterface to ISomeDestinationInterface so we need to configure AutoMapper accordingly. Otherwise the mapper will throw an exception.
Mapper.Initialize(config =>
   {
      config.CreateMap<SomeSourceClass, SomeDestinationClass>();
      config.CreateMap<ISomeSourceInterface, ISomeDestinationInterface>();
   });

var src = new SomeSourceClass { Id = Guid.NewGuid(), Value = "Hello" };
var dest = Mapper.Map<ISomeDestinationInterface>(src);

Console.WriteLine($"Id = {dest.Id}");
Console.WriteLine($"IdAsString = {dest.IdAsString}");
Console.WriteLine($"Value = {dest.Value}");
If you run this code, it'll seemingly work as the charm. However, there is a BIG PROBLEM here. Let's examine more carefully what was written to the console. The result will look as follows:

Id = a2648b9e-60be-4fcc-9968-12a20448daf4
IdAsString =
Value = Hello

Do you see a problem? The readonly property IdAsString is empty. It seems crazy because IdAsString property only returns the value of Id property which is set. How is it possible?

And here we come the feature of AutoMapper which according to be should be disabled by default i.e. automatic proxy generation. When AutoMapper tries to map ISomeSourceInterface to ISomeDestinationInterface it doesn't know which implementation of ISomeDestinationInterface should be used. Well, actually no implementation may even exists, so it generates one. If we check the type of dest property we'll see something like:

Proxy<ConsoleApplication1.ISomeDestinationInterface_ConsoleApplication1_Version=1.0.0.0_Culture=neutral_PublicKeyToken=null>.

Initially this function may look as something extremely useful. But it's the Evil at least because of the following reasons:
  • As in the example, the mapping succeeds but the result object contains wrong data. Then this object may be used to create other objects... This can lead to really difficult to detect bugs.
  • If a destination interface defines some methods, a proxy will be generated, but the mapping will fail due to System.TypeLoadException.
  • It shouldn't be needed in the well written code. However, if you try to cast the result of the mapping to the class, then System.InvalidCastException exception will be thrown.
The ideal solution would be to disable this feature. However, I don't know how :( The workaround is to explicitly tell AutoMapper not to generate proxies. To do that we need to use As method and specify which concrete type should be created instead of a proxy.

The final configuration looks as follows. It's also worth mentioning that in this case we actually don't need to define mapping from SomeSourceClass to SomeDestinationClass. AutoMapper is clever enough to figure out that these classes implements interfaces.
Mapper.Initialize(
   config =>
   {
      config.CreateMap<ISomeSourceInterface, ISomeDestinationInterface>().As<SomeDestinationClass>();
   });


AutoMapper proxy generation feature is the Evil.


*The picture at the beginning of the post comes from own resources and shows Okonomiyaki that we ate in Hiroshima. One of the best food we've ever eaten.

15/03/2017

Report from the battlefield #9 - async/await + MARS

Home



This post from Report from the battlefield series will be about my own mistake. It is related to async/await and MARS i.e. Multiple Active Result Sets. async/await allows us to use asynchronous programming more easily. MARS is a feature of MSSQL that allows us to have more than one pending request opened per connection at the same time. For example, it may be useful if we have 2 nested loops i.e. internal and external. External loops iterate through one result set and the internal one through another. Ok, but you probably wonder what MARS has in common with async/await.

A few days ago my application started failing due to InvalidOperationException exception with the additional message saying that The connection does not support MultipleActiveResultSets. Well, I used MARS in the past so I simply enabled it in the connection string by setting MultipleActiveResultSets attribute to true.

However, later I realized that my application should not require MARS at all so I started digging into what was wrong. It turned out that the problem was related to my silly mistake in using async/await. Let's look at the following simplified version of the problematic code. We have a trivial Main method:
static void Main()
{
   Start().GetAwaiter().GetResult();
}
Start is an async method responsible for opening a connection to DB and executing other async methods:
private static async Task Start()
{
   using (var c = new SqlConnection(ConnectionString))
   {
      c.Open();

      await Func1(c);
      await Func2(c);
      await Func3(c);
   }
}
Func1, Func2 and Func3 are responsible for reading data and processing them. In our case, for simplification, they all will do the same thing:
private static async Task Func1(SqlConnection c) => await ReadData(c);
private static async Task Func2(SqlConnection c) => ReadData(c);
private static async Task Func3(SqlConnection c) => await ReadData(c);
And here is the ReadData method. It's also simple. It simply reads data from a table:
private static async Task ReadData(SqlConnection c)
{
   var cmd = c.CreateCommand();

   cmd.CommandText = "SELECT * FROM dbo.Fun";

   using (var reader = await cmd.ExecuteReaderAsync())
   {
      while (await reader.ReadAsync())
      {
         // Process data
      }
   }
}
If you run this code, the aforementioned InvalidOperationException exception will be thrown in the line with ExecuteReaderAsync. The question is why? Well, in this short code it is rather easy to spot that in Func2 method await is missing before ReadData. But, do you know why it is a problem? If not, don't worry it's a little bit tricky.

Here is an explanation. Without await the simplified flow is as follows:
  • ...
  • Start executes Func2.
  • Func2 executes ReadData.
  • ReadData executes ExecuteReaderAsync.
  • ReadData awaits for the result from ExecuteReaderAsync.
  • The control returns to caller i.e. Func2.
  • However, Func2 does not use await so it returns completed task to Start method.
  • From the point of view of Start processing of Func2 is finished so it executes Func3.
  • Func3 executes ReadData
  • The previous call to ReadData may be still in progress.
  • It also means that ReadData will call ExecuteReaderAsync when another result set is still being processed.
  • The exception is thrown.
Adding missing await fix the problem. Thanks to that the task returned from Func2 will not be completed until call to ReadData is over. And if so Start will not execute Func3 immediately. The final well known conclusion is:

Always async/await all the way down.


*The picture at the beginning of the post comes from own resources and shows Laurel forest on La Gomera.

27/02/2017

Report from the battlefield #8 - always remember about the context

Home


Title: Sunrise seen from the top of Mount Fuji, , Source: own resources, Authors: Agnieszka and Michał Komorowscy

I decided to change a little bit a character of Report from the battlefield series. Initially, in this series, I was describing my observations from my work as a reviewer for a recruitment company. Now, I'll be also writing about my findings from my day-to-day work. To start, I'll give you a tip how to log useful information.

I worked with an application that is responsible for monitoring folders. If it detects any new files, they are processed and copied somewhere else. The application logs information like the number of files to be processes, the file that is currently being processed etc. This information are logged with severity Information or Debug. It happens that a given file cannot be copied for example because the file with the same name already exists in the destination directory. In that case .NET throws System.IO.IOException. This exception is caught and logged with the severity Error. The simplified version of the log could look in this way:

INFO - 10 files have been found
INFO - Processing file started
...
INFO - Processing file ended
INFO - Processing file started
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...
INFO - Processing file ended
...

It's good that many important information are logged. However, there is a major issue with this log. The lack of the context! For example we know that some files have been processed but we don't know which exactly. This log should look as follows (I used red color to mark changes):

INFO - 10 files have been found in the directory 'C:\Input'
INFO - Processing file 'C:\Input\a.txt' started
...
INFO - Processing file 'C:\Input\a.txt' ended
INFO - Processing file 'C:\Input\b.txt' started
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...
INFO - Processing file 'C:\Input\b.txt' ended
...

It looks much better. Based on the log we can figure out which directory was monitored, which files have been processed successfully and which not. However, it's not everything. There is one more subtle problem here. What if messages with the severity Information or lower won't be logged (for example because of performance issues) and an error will be reported. In this case we'll get the following log:

...
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...

It's better than nothing but again we don't know processing of which file has actually failed. The expected result is:

...
ERROR - An error occurred while processing a file 'C:\Input\b.txt': Cannot create a file when that file already exists.
...

To sum up, always remember about the context when logging.

21/09/2016

Report from the battlefield #6 - Auto-Property Initializers + a non-binary serialization

Home


Source: own resources, Authors: Agnieszka and Michał Komorowscy

I can bet that you've already heard about and used Auto-Property Initializers and that you love them. At least I do so ;) Here is a small example. In this case an Auto-Property Initializer was used to generate unique identifiers for instances of Entity class. Trivial, isn't it?
public class Entity
{
   public string Guid { get; } = System.Guid.NewGuid().ToString();
   /*...*/
}
What is important we have guaranteed that a given initializer will be executed only ONCE for an instance of a class. Otherwise it will have no sense! In other words our expectations is that if we create a new instance of Entity class it's identifier will not change. It is generally true, but there are some caveats.

24/07/2016

Report from the battlefield #5 - Logging can kill performance

Home

Public Domain, https://commons.wikimedia.org/w/index.php?curid=48390
Source: own resources, Authors: Agnieszka and Michał Komorowscy

So far in Report from the battlefield series I wrote about my experiences as an expert in the recruitment company. This time I'll write a bug that I found in the production. It was all about the performance. The problem was that in the new version of an application one operation slowed down about 6 times. Initially, I suspected that amount of data simply increased considerably or some network problems. Fortunately, I easily reproduced the issue on my dev machine. Reproducing a problem is half the battle. Though performance problem are usually difficult to analyse so I was ready for a long investigation.

I started stepping through the code with a debugger just to see what is going on. Everything seemed to be ok until... One of the final operations was to log into a file what was retrieved from a database. What's important the log level was set to Trace so even large amount of data shouldn't matter in the production. Why? Because in the production, precisely because of the performance reasons, the logger should be configured not to log everything to a file. In other words it should ignore messages usually with the log level = Trace or Debug. However, after I had pressed F10 (Step Over), I had to wait a few seconds till the logging ends. BINGO!

My first though was that someone configured the logger in the wrong way in the production. Typical PEBKAC problem. To verify my hypothesis I changed the configuration of the logger and executed the problematic operation. Unfortunately, the problem occurred again. Another look at the code and I know what was wrong. And do you already know?

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

The problem was that for large amount of data the application required a few seconds just to create/prepare a message for the logger. To make thing worse this message was created, regardless if it was later used by the logger or not. During development it may acceptable but not in the production! There are 2 potential solutions of this problem. Details depends on the logging framework:
  • The first approach is to simply check the logging level before creating/preparing a message e.g.:
    if(Logger.LogLevel == LogLevel.Trace) 
    {
        /* Prepare and log a message */
    }
    
  • The second approach is to use deferred execution for example lambdas e.g.:
    Logger.Trace(() => /* Prepare a message */).
    If a logger supports this syntax, a lambda will be executed if and only if it is required.

14/03/2016

Report from the battlefield #4 - Do not waste my and your time

Home

The Report from the battlefield series is based on my experience as a reviewer. The idea is simple. In order to evaluate programming skills, a candidate is asked to write a simple project. To do so he/she needs to invest some amount of time (roughly speaking a few hours). Taking this into account I assume that he/she must be interested in finding a new job. Otherwise he/she wouldn't spend his/her private time writing a project which rather isn't extremely exciting. The more I'm surprised why some people doesn't care about the first impression.

Here are some examples showing what I'm talking about:
  • A connection string used by the application referred to some server that of course wasn't available to me.
  • The database used by the application didn't contain any sample data.
  • I had to manually create a database used by the application. There was a script but it didn't work without fixes.
  • The application crashed immediately when started.
  • ...
It's wasting of time from my perspective. It is true that all these problems can be fixed quickly but they require additional effort from me. You can believe me that it is extremely annoying. Instead of making an actual review someone forces me to fix bugs. What's the worst these bugs could be avoided easily with a little bit more effort.

Please remember, the first impression is important. It'll be appreciated if a reviewer can run your application just by pressing F5 in Visual Studio (or in another IDE). You can test it in a straightforward way. Before submitting a project to a review, copy it to another machine and try to run it there. It should work without any additional actions.

Currently, if a project cannot be run without problems I don't make a review. However, I have a soft heart and I give a candidate one chance to fix them. Do you think that it's a good approach? I have my doubts because an employer probably wouldn't do so.

07/02/2016

Report from the battlefield #3 - IEnumerable vs IQueryable

Home

Sometime ago I was reviewing the data access layer that was based on Entity Framework. I found a code which immediately attracted my attention. The simplified version is shown below.
public IList<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... }).ToList(); 
}
...
var numberOfProducts = GetAll().Count();
GetAll method is pretty simple because it just reads products from a database. The result returned from this method is used to count number of products in the database. Although it is simple it contains a serious bug. The problem is that it uses ToList method to return a list of products. It causes that ALL products must be retrieved from the database in order to return them in the form of a list. In other words there is no deferred execution here.

If we work with a local database and the number of products is small it shouldn't be a problem. However, this kind of code might lead to difficult to analyse performance problems. For example if our application uses a remote database and/or there are thousands of products. The desired behaviour is that products are counted by a database engine. So let's try to make a fix:
public IEnumerable<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... });
}
...
var numberOfProducts = GetAll().Count();
Now it looks much more better, doesn't it? GetAll doesn't use ToList and returns IEnumerable. interface. Unfortunately this solution is far from being perfect. In comparison to the first version, the only difference is the moment when all products are retrieved from the database. This time it will happen when Count method is executed. Why? Before I'll explain let's see the correct solution:
public IQueryable<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... });
}
...
var numberOfProducts = GetAll().Count();
This time I used IQueryable instead of IEnumerable. This small change is crucial. It causes that no products are read from a database. Entity Framework "sees" that we only wanto to count number of products and an appropriate query is sent to a database. In other words LINQ To Entities is used.

The situation is completely different when we work with IEnumerable. In order to understand a difference we have to realise one thing. Count method for IEnumerable is something different than Count method for IQueryable. With IEnumerable we use LINQ To Objects and LINQ To Objects operates on objects in memory, it cannot communicate with a database. It is why all products must be read from a database if we want to count them.

Now someone inquiring can say that for virtual methods it shouldn't matter if we have variables of type IEnumerable or of type IQueryable if these variables points the same objects. After all C# is an object oriented language that supports polymorphism etc. Well, it is all true but only for virtual methods and Count is not a virtual method. It is an extension method and extension methods don't support polymorphism.

29/12/2015

Report from the battlefield #2 - amount of data matters a lot

Home

In the next post from the Report from the battlefield series I'll wrote about a serious mistake that is quite common according to my experience. I'm thinking about a situation when a developer assumes that all data from a database can be processed on the client side. I'll give you 2 examples that I encountered during my reviews.

Case 1

A developer was asked to implement the paging functionality. He created a single page Web application. It looked nice and the paging was working correctly at first glance. I decided to check how it was implemented under the hood. I examined a web service that was used by the application and I was surprised. Why? I didn't find a web method responsible for returning pages. The next step was to dig into a java script code. Unfortunately, I discovered that the paging was implemented only on the client side i.e. the application initially downloaded all data from a database (via Web Service).

Case 2

In the another project the paging was implemented correctly on the server side but a developer made a more subtle mistake. An application had a shopping cart function. Of course it was possible to add and remove products to and from a cart. To do so a web service used by the application had a method GetCart. This method was responsible for retrieving a current content of a cart from a database.

However, it was strange that this method returned only identifiers of products. What's more there was no GetProductDetails web method. It made me curious how the application displays products details to users only knowing its identifiers. It turned out that at the initiation the application was reading details of all products from a database. Having all products on the client side it was easy to find details of a product based on its identifier.

Summary

In both cases applications were fast enough because of a small amount of data. In the case of real-life databases they will not. I think that we should always be prepared for the worst case. Especially, when we participate in a recruitment and we want to show ourselves from the best side. An evaluator shouldn't guess whether we know something or not.

24/11/2015

Report from the battlefield #1 - EF and DTOs

Home

Some time ago, I started doing code reviews of various projects for the recruitment company. It is an interesting experience and I'm learning a lot by this occasion. I also observed that some mistakes are repeated by different authors. Other are not so common but are not obvious. So I came up with the idea to start a new series of posts under the title "Report from the battlefield". In this series I'll describe my observations and findings from my reviews.

Let's start. Recently, I reviewed a project created with AngularJS + ASP.NET Web API + Entity Framework. The code was neither very good nor very bad. However, I noticed that the author decided to use a class generated from the EDMX model as DTO (Data Transfer Object). The reasoning behind this decision was simple - this class had all properties required on the client side so why not to use it. Well there are a few reasons why it is not a good idea.
  • With dedicated DTOs it is less possible that changes on the server side will affect the client side.
  • With dedicated DTOs we can easily control what will be send to the client side and in what format.
  • With dedicated DTOs the server side model can be completely different from the client side model.
  • By exposing EF classes to the client side we effectively expose the database model to the client side!
You may agree with my points or not. So, I'll give you a practical example what could happen if we use EF classes as DTOs. Let's assume that there is EDMX model with 3 types of entities:
  • Customer with Orders navigation property.
  • Orders with Customer and Products navigation properties.
  • Products with Orders navigation property.
Now we want to read only 1 customer from a database, serialize it to JSON and send the result to the client side. What could go wrong? Well, because of the navigation properties the JSON serializer that is used by ASP.NET Web API will read from the database and convert to JSON the whole graph of customers, orders and products! To be more specific, I saw 0.5 MB response which should have a few kilobytes for a very small database (it contained small dozens of records in all tables)! I can bet that in the case of a production database a response would have hundreds of megabytes.