Showing posts with label data access. Show all posts
Showing posts with label data access. Show all posts

15/03/2017

Report from the battlefield #9 - async/await + MARS

Home



This post from Report from the battlefield series will be about my own mistake. It is related to async/await and MARS i.e. Multiple Active Result Sets. async/await allows us to use asynchronous programming more easily. MARS is a feature of MSSQL that allows us to have more than one pending request opened per connection at the same time. For example, it may be useful if we have 2 nested loops i.e. internal and external. External loops iterate through one result set and the internal one through another. Ok, but you probably wonder what MARS has in common with async/await.

A few days ago my application started failing due to InvalidOperationException exception with the additional message saying that The connection does not support MultipleActiveResultSets. Well, I used MARS in the past so I simply enabled it in the connection string by setting MultipleActiveResultSets attribute to true.

However, later I realized that my application should not require MARS at all so I started digging into what was wrong. It turned out that the problem was related to my silly mistake in using async/await. Let's look at the following simplified version of the problematic code. We have a trivial Main method:
static void Main()
{
   Start().GetAwaiter().GetResult();
}
Start is an async method responsible for opening a connection to DB and executing other async methods:
private static async Task Start()
{
   using (var c = new SqlConnection(ConnectionString))
   {
      c.Open();

      await Func1(c);
      await Func2(c);
      await Func3(c);
   }
}
Func1, Func2 and Func3 are responsible for reading data and processing them. In our case, for simplification, they all will do the same thing:
private static async Task Func1(SqlConnection c) => await ReadData(c);
private static async Task Func2(SqlConnection c) => ReadData(c);
private static async Task Func3(SqlConnection c) => await ReadData(c);
And here is the ReadData method. It's also simple. It simply reads data from a table:
private static async Task ReadData(SqlConnection c)
{
   var cmd = c.CreateCommand();

   cmd.CommandText = "SELECT * FROM dbo.Fun";

   using (var reader = await cmd.ExecuteReaderAsync())
   {
      while (await reader.ReadAsync())
      {
         // Process data
      }
   }
}
If you run this code, the aforementioned InvalidOperationException exception will be thrown in the line with ExecuteReaderAsync. The question is why? Well, in this short code it is rather easy to spot that in Func2 method await is missing before ReadData. But, do you know why it is a problem? If not, don't worry it's a little bit tricky.

Here is an explanation. Without await the simplified flow is as follows:
  • ...
  • Start executes Func2.
  • Func2 executes ReadData.
  • ReadData executes ExecuteReaderAsync.
  • ReadData awaits for the result from ExecuteReaderAsync.
  • The control returns to caller i.e. Func2.
  • However, Func2 does not use await so it returns completed task to Start method.
  • From the point of view of Start processing of Func2 is finished so it executes Func3.
  • Func3 executes ReadData
  • The previous call to ReadData may be still in progress.
  • It also means that ReadData will call ExecuteReaderAsync when another result set is still being processed.
  • The exception is thrown.
Adding missing await fix the problem. Thanks to that the task returned from Func2 will not be completed until call to ReadData is over. And if so Start will not execute Func3 immediately. The final well known conclusion is:

Always async/await all the way down.


*The picture at the beginning of the post comes from own resources and shows Laurel forest on La Gomera.

16/02/2016

Interview Questions for Programmers by MK #7

Home

Question #7
You have the following code that uses Entity Framework to retrieve data from Northwind database. Firstly it finds customers that are from London and then process their orders. All data model classes were generated with the code first from database approach. Unfortunately, this code contains a bug that can lead to performance problems. Identity this problem and propose a fix.
using (var ctx = new NorthwindContext())
{
   var londoners = ctx.Customers.Where(e => e.City == "London");
   foreach (var londoner in londoners)
   {
      foreach (var o in londoner.Orders)
      {
         foreach (var d in o.OrderDetails)
         {
            //....
         }
      }
   }               
}
Answer #7
This code is a classical example of N+1 select problem where too many queries are sent to a database. The first query will be sent to a database in order to find customers from London. Then for each customer another query will be sent to read orders. Finally, for each order another query will be sent to retrieve details of a given order. Instead, all data could be retrieved by sending only one query. To do so we need to tell EF that we want to read customers together with their orders and orders details. It can be achieved with Include method.
using (var ctx = new NorthwindContext())
{
   londoners = ctx.Customers.Include("Orders").Include("Orders.Order_Details").Where(e => e.City == "London");
   ...
}
A quick test will show that originally 53 queries are sent to a database and after a fix only 1.

07/02/2016

Report from the battlefield #3 - IEnumerable vs IQueryable

Home

Sometime ago I was reviewing the data access layer that was based on Entity Framework. I found a code which immediately attracted my attention. The simplified version is shown below.
public IList<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... }).ToList(); 
}
...
var numberOfProducts = GetAll().Count();
GetAll method is pretty simple because it just reads products from a database. The result returned from this method is used to count number of products in the database. Although it is simple it contains a serious bug. The problem is that it uses ToList method to return a list of products. It causes that ALL products must be retrieved from the database in order to return them in the form of a list. In other words there is no deferred execution here.

If we work with a local database and the number of products is small it shouldn't be a problem. However, this kind of code might lead to difficult to analyse performance problems. For example if our application uses a remote database and/or there are thousands of products. The desired behaviour is that products are counted by a database engine. So let's try to make a fix:
public IEnumerable<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... });
}
...
var numberOfProducts = GetAll().Count();
Now it looks much more better, doesn't it? GetAll doesn't use ToList and returns IEnumerable. interface. Unfortunately this solution is far from being perfect. In comparison to the first version, the only difference is the moment when all products are retrieved from the database. This time it will happen when Count method is executed. Why? Before I'll explain let's see the correct solution:
public IQueryable<Product> GetAll()
{
   return ctx.Products.Select(p => new Product() { ... });
}
...
var numberOfProducts = GetAll().Count();
This time I used IQueryable instead of IEnumerable. This small change is crucial. It causes that no products are read from a database. Entity Framework "sees" that we only wanto to count number of products and an appropriate query is sent to a database. In other words LINQ To Entities is used.

The situation is completely different when we work with IEnumerable. In order to understand a difference we have to realise one thing. Count method for IEnumerable is something different than Count method for IQueryable. With IEnumerable we use LINQ To Objects and LINQ To Objects operates on objects in memory, it cannot communicate with a database. It is why all products must be read from a database if we want to count them.

Now someone inquiring can say that for virtual methods it shouldn't matter if we have variables of type IEnumerable or of type IQueryable if these variables points the same objects. After all C# is an object oriented language that supports polymorphism etc. Well, it is all true but only for virtual methods and Count is not a virtual method. It is an extension method and extension methods don't support polymorphism.

29/12/2015

Report from the battlefield #2 - amount of data matters a lot

Home

In the next post from the Report from the battlefield series I'll wrote about a serious mistake that is quite common according to my experience. I'm thinking about a situation when a developer assumes that all data from a database can be processed on the client side. I'll give you 2 examples that I encountered during my reviews.

Case 1

A developer was asked to implement the paging functionality. He created a single page Web application. It looked nice and the paging was working correctly at first glance. I decided to check how it was implemented under the hood. I examined a web service that was used by the application and I was surprised. Why? I didn't find a web method responsible for returning pages. The next step was to dig into a java script code. Unfortunately, I discovered that the paging was implemented only on the client side i.e. the application initially downloaded all data from a database (via Web Service).

Case 2

In the another project the paging was implemented correctly on the server side but a developer made a more subtle mistake. An application had a shopping cart function. Of course it was possible to add and remove products to and from a cart. To do so a web service used by the application had a method GetCart. This method was responsible for retrieving a current content of a cart from a database.

However, it was strange that this method returned only identifiers of products. What's more there was no GetProductDetails web method. It made me curious how the application displays products details to users only knowing its identifiers. It turned out that at the initiation the application was reading details of all products from a database. Having all products on the client side it was easy to find details of a product based on its identifier.

Summary

In both cases applications were fast enough because of a small amount of data. In the case of real-life databases they will not. I think that we should always be prepared for the worst case. Especially, when we participate in a recruitment and we want to show ourselves from the best side. An evaluator shouldn't guess whether we know something or not.

24/11/2015

Report from the battlefield #1 - EF and DTOs

Home

Some time ago, I started doing code reviews of various projects for the recruitment company. It is an interesting experience and I'm learning a lot by this occasion. I also observed that some mistakes are repeated by different authors. Other are not so common but are not obvious. So I came up with the idea to start a new series of posts under the title "Report from the battlefield". In this series I'll describe my observations and findings from my reviews.

Let's start. Recently, I reviewed a project created with AngularJS + ASP.NET Web API + Entity Framework. The code was neither very good nor very bad. However, I noticed that the author decided to use a class generated from the EDMX model as DTO (Data Transfer Object). The reasoning behind this decision was simple - this class had all properties required on the client side so why not to use it. Well there are a few reasons why it is not a good idea.
  • With dedicated DTOs it is less possible that changes on the server side will affect the client side.
  • With dedicated DTOs we can easily control what will be send to the client side and in what format.
  • With dedicated DTOs the server side model can be completely different from the client side model.
  • By exposing EF classes to the client side we effectively expose the database model to the client side!
You may agree with my points or not. So, I'll give you a practical example what could happen if we use EF classes as DTOs. Let's assume that there is EDMX model with 3 types of entities:
  • Customer with Orders navigation property.
  • Orders with Customer and Products navigation properties.
  • Products with Orders navigation property.
Now we want to read only 1 customer from a database, serialize it to JSON and send the result to the client side. What could go wrong? Well, because of the navigation properties the JSON serializer that is used by ASP.NET Web API will read from the database and convert to JSON the whole graph of customers, orders and products! To be more specific, I saw 0.5 MB response which should have a few kilobytes for a very small database (it contained small dozens of records in all tables)! I can bet that in the case of a production database a response would have hundreds of megabytes.