20/04/2017

How I removed 50% of the code


Title: Azuleyo tiles somewhere in Portugal, Source: own resources, Authors: Agnieszka and Michał Komorowscy


My last 2 posts were about problems with using Roslyn. Nonetheless, even if I sometime hate it, I'm still using it so the time has come to show some practical example of using Roslyn. Recently, I've been working on the task that can be summed up as: Take this ugly code and do something with it. i.e. more or less the refactoring task.

Now I'll give you some intuition of what I have to deal with. The code that I have to refactor was generated automatically based on XML schema. These were actually DTO classes used to communicate with the external service. Here are some statistics:
  • 28.7 thousands lines of code in 23 files.
  • 2200 classes and 920 enums.
  • Many classes and enums seems to me identical or very similar.

10/04/2017

Why I hate Roslyn even more




In my previous post I wrote about my problem with "empty" projects and Roslyn. The symptom was that in some cases according to Roslyn my C# projects didn't contain any files. For quite a long time, I haven't been able to find a solution. Especially because I couldn't reproduce problem on my local machine. Fortunately, today I noticed exactly the same problem on another computer.

29/03/2017

Why I hate Roslyn




The more I work with Roslyn the more I appreciate the possibilities it gives and the more I hate it. And I hate it for the same thing as many other projects I worked with in the past. What is it? Well, I like when a system fails fast, fails loudly and fails in the clear way. Unfortunately, Roslyn can do something completely different what sometimes makes working with it the pain in ass. I'll give you some examples.

Issue 1 - Problem with "empty" projects

Here is the code that shows how I usually process documents/files for a given project. It's pretty easy.
var workspace = MSBuildWorkspace.Create();

var sln = await workspace.OpenSolutionAsync(path);
     
foreach (var projectId in sln.ProjectIds)
{
   var project = sln.GetProject(projectId);

   foreach (var documentId in project.DocumentIds)
   {
      // Process a document
   }
}
It works quite well but only on my machine :) On 2 other machines I'm observing problems. In general I have an example solution with 2 test projects. One is WPF application and the another is WebAPI.

The problem is that on some machines I can only read and analyze WPF application. If I try to do exactly the same thing with WebAPI application, then the project loaded by Roslyn is empty i.e. contains no documents (DocumentIds property is empty)! I've already tried to load this project in a different way but without success.

To be honest currently I'm stuck and I have no idea what is wrong here. Any suggestions?

Issue 2 - the semantic analysis does not work

With Roslyn we can perfrom the syntax analysis and the semantic analysis of the code. The syntax analysis, with a syntaxt tree, allows you to only see a structure of a program. The semantic analysis is more powerfull and allows you to understand more. For example, having a code like that:

SomeClass x;

With the semantic analysis you can check that SomeClass is defined within SomeNamespace and has X members (methods, properties). For example, here we have a code showing how to use the semantic analysis to check what interfaces are implemented by a given class at any level of the inheritance.
var compilation = await project.GetCompilationAsync();

foreach (var documentId in project.DocumentIds)
{
   var document = project.GetDocument(documentId);

   // Get a syntax tree
   var tree = await document.GetSyntaxTreeAsync(); 

   // Get a root of the syntax tree
   var root = await tree.GetRootAsync(); 
   
   // Find a node of the syntaxt tree for a first class in a file/document
   var classNode= root.DescendantNodes().OfType<ClassDeclarationSyntax>().FirstOrDefault(); 

   if(classNode== null) continue;

   // Get a semantic model for the syntax tree
   var semanticModel = compilation.GetSemanticModel(tree); 

   // Use the semantic model to get symbol info for the found class node
   var symbol = semanticModel.GetDeclaredSymbol(classNode); 

   // Check what inerfaces are implemnted by the class at any level
   foreach(var @interface in symbol.AllInterfaces)  
   {
      // ...
   }
}
If you run this code as it is, it again will not throw any exceptions. However, you'll noticed that any found class doesn't implement any interface according to Roslyn. Where is the problem this time?

It's quite obious if you know that. To perform the semantic analysis Roslyn needs to analyse assemblies used by the project. However, it's now enough to compile the project. You have to explicitly register all required assemblies. I do it in the easy way. I simply register all assemblies found in the output folder.
var compilation = await project.GetCompilationAsync();

// Let's register mscorlib
compilation = compilation.AddReferences(MetadataReference.CreateFromFile(typeof(object).Assembly.Location));

if (Directory.Exists("PATH TO OUTPUT DIRECTORY"))
{
   var files = Directory.GetFiles(directory, "*.dll").ToList(); // You can also look for *.exe files

   foreach (var f in files)
      compilation = compilation.AddReferences(MetadataReference.CreateFromFile(f));
}
And again if the semantic analysis can not be performed without that why no exception is thrown?

Issue 3 - Problem with reading projects/solutions

This one I've already described in more details in the post about Roslyn and unit tests. The problem was that:
  • MSBuildWorkspace.OpenSolutionAsync method was returning an empty solution if a particular assembly was missing (not fast, not loud)
  • MSBuildWorkspace.OpenSolutionAsync method was returning the error The language 'C#' is not supported (not in the clear way).
This particular assembly was Microsoft.CodeAnalysis.CSharp.Workspaces.dll so wouldn't it be easier to just throw an xception saying that it is missing. Or at least saying that it was not possible to find assembly responsible for reading C# projects and solutions.


Remember failing fast, loudly and in the clear way does not cost much but can save a lot of time.


*The picture at the beginning of the post comes from own resources and shows cliffs near Cabo da Roca - the westernmost extent of mainland Portugal.

24/03/2017

Report from the battlefield #10 - fuck-up with AutoMapper




Have you ever heard or used AutoMapper? What a question, of course you have. And in the very unlikely scenario that you haven't, it's the object to object mapper that allows you to map probably everything. In short no more manual, boring, tedious, error-prone mapping.

However, the great power comes with great responsibility. In the recent time, I had an occasion to fix 2 difficult to track bugs related to improper usage of AutoMapper. Both issues were related to the feature of AutoMapper which according to me is almost useless and at least should be disabled by default. Let's look at the following 2 classes and testing code:
public class SomeSourceClass
{
   public Guid Id { get; set; }
   public string IdAsString => Id.ToString();
   public string Value { get; set; }
}

public class SomeDestinationClass
{
   public Guid Id { get; set; }
   public string IdAsString => Id.ToString();
   public string Value { get; set; }
}

class Program
{ 
   static void Main()
   {
      Mapper.Initialize(config => config.CreateMap<SomeSourceClass,SomeDestinationClass>>());
      
      var src = new SomeSourceClass { Id = Guid.NewGuid(), Value = "Hello" };
      var dest = Mapper.Map<SomeDestinationClass>(src);

      Console.WriteLine($"Id = {dest.Id}");
      Console.WriteLine($"IdAsString = {dest.IdAsString}");
      Console.WriteLine($"Value = {dest.Value}");
   }
}
This works as a charm. If you run this example, you should see output like that:

Id = a2648b9e-60be-4fcc-9968-12a20448daf4
IdAsString = a2648b9e-60be-4fcc-9968-12a20448daf4
Value = Hello

Now, let's introduce interfaces that will be implemented by SomeSourceClass and SomeDestinationClass:
public interface ISomeSourceInterface
{
   Guid Id { get; set; }
   string IdAsString { get; }
   string Value { get; set; }
}

public interface ISomeDestinationInterface
{
   Guid Id { get; set; }
   string IdAsString { get; }
   string Value { get; set; }
}

public class SomeSourceClass: ISomeSourceInterface { /*... */}

public class SomeDestinationClass : ISomeDestinationInterface { /*... */}
We also want to support mappings from ISomeSourceInterface to ISomeDestinationInterface so we need to configure AutoMapper accordingly. Otherwise the mapper will throw an exception.
Mapper.Initialize(config =>
   {
      config.CreateMap<SomeSourceClass, SomeDestinationClass>();
      config.CreateMap<ISomeSourceInterface, ISomeDestinationInterface>();
   });

var src = new SomeSourceClass { Id = Guid.NewGuid(), Value = "Hello" };
var dest = Mapper.Map<ISomeDestinationInterface>(src);

Console.WriteLine($"Id = {dest.Id}");
Console.WriteLine($"IdAsString = {dest.IdAsString}");
Console.WriteLine($"Value = {dest.Value}");
If you run this code, it'll seemingly work as the charm. However, there is a BIG PROBLEM here. Let's examine more carefully what was written to the console. The result will look as follows:

Id = a2648b9e-60be-4fcc-9968-12a20448daf4
IdAsString =
Value = Hello

Do you see a problem? The readonly property IdAsString is empty. It seems crazy because IdAsString property only returns the value of Id property which is set. How is it possible?

And here we come the feature of AutoMapper which according to be should be disabled by default i.e. automatic proxy generation. When AutoMapper tries to map ISomeSourceInterface to ISomeDestinationInterface it doesn't know which implementation of ISomeDestinationInterface should be used. Well, actually no implementation may even exists, so it generates one. If we check the type of dest property we'll see something like:

Proxy<ConsoleApplication1.ISomeDestinationInterface_ConsoleApplication1_Version=1.0.0.0_Culture=neutral_PublicKeyToken=null>.

Initially this function may look as something extremely useful. But it's the Evil at least because of the following reasons:
  • As in the example, the mapping succeeds but the result object contains wrong data. Then this object may be used to create other objects... This can lead to really difficult to detect bugs.
  • If a destination interface defines some methods, a proxy will be generated, but the mapping will fail due to System.TypeLoadException.
  • It shouldn't be needed in the well written code. However, if you try to cast the result of the mapping to the class, then System.InvalidCastException exception will be thrown.
The ideal solution would be to disable this feature. However, I don't know how :( The workaround is to explicitly tell AutoMapper not to generate proxies. To do that we need to use As method and specify which concrete type should be created instead of a proxy.

The final configuration looks as follows. It's also worth mentioning that in this case we actually don't need to define mapping from SomeSourceClass to SomeDestinationClass. AutoMapper is clever enough to figure out that these classes implements interfaces.
Mapper.Initialize(
   config =>
   {
      config.CreateMap<ISomeSourceInterface, ISomeDestinationInterface>().As<SomeDestinationClass>();
   });


AutoMapper proxy generation feature is the Evil.


*The picture at the beginning of the post comes from own resources and shows Okonomiyaki that we ate in Hiroshima. One of the best food we've ever eaten.

15/03/2017

Report from the battlefield #9 - async/await + MARS




This post from Report from the battlefield series will be about my own mistake. It is related to async/await and MARS i.e. Multiple Active Result Sets. async/await allows us to use asynchronous programming more easily. MARS is a feature of MSSQL that allows us to have more than one pending request opened per connection at the same time. For example, it may be useful if we have 2 nested loops i.e. internal and external. External loops iterate through one result set and the internal one through another. Ok, but you probably wonder what MARS has in common with async/await.

A few days ago my application started failing due to InvalidOperationException exception with the additional message saying that The connection does not support MultipleActiveResultSets. Well, I used MARS in the past so I simply enabled it in the connection string by setting MultipleActiveResultSets attribute to true.

However, later I realized that my application should not require MARS at all so I started digging into what was wrong. It turned out that the problem was related to my silly mistake in using async/await. Let's look at the following simplified version of the problematic code. We have a trivial Main method:
static void Main()
{
   Start().GetAwaiter().GetResult();
}
Start is an async method responsible for opening a connection to DB and executing other async methods:
private static async Task Start()
{
   using (var c = new SqlConnection(ConnectionString))
   {
      c.Open();

      await Func1(c);
      await Func2(c);
      await Func3(c);
   }
}
Func1, Func2 and Func3 are responsible for reading data and processing them. In our case, for simplification, they all will do the same thing:
private static async Task Func1(SqlConnection c) => await ReadData(c);
private static async Task Func2(SqlConnection c) => ReadData(c);
private static async Task Func3(SqlConnection c) => await ReadData(c);
And here is the ReadData method. It's also simple. It simply reads data from a table:
private static async Task ReadData(SqlConnection c)
{
   var cmd = c.CreateCommand();

   cmd.CommandText = "SELECT * FROM dbo.Fun";

   using (var reader = await cmd.ExecuteReaderAsync())
   {
      while (await reader.ReadAsync())
      {
         // Process data
      }
   }
}
If you run this code, the aforementioned InvalidOperationException exception will be thrown in the line with ExecuteReaderAsync. The question is why? Well, in this short code it is rather easy to spot that in Func2 method await is missing before ReadData. But, do you know why it is a problem? If not, don't worry it's a little bit tricky.

Here is an explanation. Without await the simplified flow is as follows:
  • ...
  • Start executes Func2.
  • Func2 executes ReadData.
  • ReadData executes ExecuteReaderAsync.
  • ReadData awaits for the result from ExecuteReaderAsync.
  • The control returns to caller i.e. Func2.
  • However, Func2 does not use await so it returns completed task to Start method.
  • From the point of view of Start processing of Func2 is finished so it executes Func3.
  • Func3 executes ReadData
  • The previous call to ReadData may be still in progress.
  • It also means that ReadData will call ExecuteReaderAsync when another result set is still being processed.
  • The exception is thrown.
Adding missing await fix the problem. Thanks to that the task returned from Func2 will not be completed until call to ReadData is over. And if so Start will not execute Func3 immediately. The final well known conclusion is:

Always async/await all the way down.


*The picture at the beginning of the post comes from own resources and shows Laurel forest on La Gomera.