08/03/2017

Roslyn and unit tests suck

Home


Title: Imperial Gardens in Tokyo, Source: own resources, Authors: Agnieszka and Michał Komorowscy

I'm working on the project where I have an opportunity to use Roslyn compiler as a service. It is very good :) However yesterday it took me more than 2 hours to write working unit tests (based on MSTest) for my code! Here are some tips that may save your time.

Let's start with the simle thing. When I run unit tests for the first time the following exception was thrown:

System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime...' or one of its dependencies.

To fix this problem I simply installed the following packages via Nuget:
  • Microsoft.CodeAnalysis.CSharp
  • Microsoft.CodeAnalysis.CSharp.Workspaces
Later it was harder. In my code I use MSBuildWorkspace.OpenSolutionAsync and MSBuildWorkspace.OpenProjectAsync methods to respectively open the entire solution or a single project for further processing.

The next issue was that the first method called from within a unit test was returning an empty solution i.e. without any projects. Whereas the second one was throwing an exception with the message: The language 'C#' is not supported. What was strange these problems occurred only in unit tests! To investigate a problem I opened Exception settings window in Visual Studio and selected a check box next to Common Language Runtime Exceptions. Then, I run the unit tests one more time and Visual Studio quickly reported the exception in the line with MSBuildWorkspace.OpenProjectAsync:

System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.CodeAnalysis.CSharp.Workspaces...' or one of its dependencies.

It was even more strange because my unit test project was actually referencing Microsoft.CodeAnalysis.CSharp.Workspaces.dll! To double check, I went to the unit tests working directory. It is a folder called TestResults which by default is located in the solution directory. To my surprised this dll was missing!

Fortunately, I reminded myself the similar situation from the past. The problem is that MSTest doesn't copy all assemblies to the output directory by default. As far as I know it tries to figure out which assemblies are really needed by the code being tested. Here, I'm not sure but Microsoft.CodeAnalysis.CSharp.Workspaces.dll may be cumbersome because it is not directly referenced by other Roslyn assemblies. Instead, it is probably loaded dynamically when needed.

To fix a problem you can use the simple hack i.e. use directly any code from Microsoft.CodeAnalysis.CSharp.Workspaces.dll in your unit tests in the following way:

[ClassInitialize]
public static void ClassInitialize(TestContext ctx)
{
   var t = typeof(Microsoft.CodeAnalysis.CSharp.Formatting.LabelPositionOptions);
}

Why did I use LabelPositionOptions? Because majority of types defined in aforementioned assembly is internal and this one was the first public type I found :)

27/02/2017

Report from the battlefield #8 - always remember about the context

Home


Title: Sunrise seen from the top of Mount Fuji, , Source: own resources, Authors: Agnieszka and Michał Komorowscy

I decided to change a little bit a character of Report from the battlefield series. Initially, in this series, I was describing my observations from my work as a reviewer for a recruitment company. Now, I'll be also writing about my findings from my day-to-day work. To start, I'll give you a tip how to log useful information.

I worked with an application that is responsible for monitoring folders. If it detects any new files, they are processed and copied somewhere else. The application logs information like the number of files to be processes, the file that is currently being processed etc. This information are logged with severity Information or Debug. It happens that a given file cannot be copied for example because the file with the same name already exists in the destination directory. In that case .NET throws System.IO.IOException. This exception is caught and logged with the severity Error. The simplified version of the log could look in this way:

INFO - 10 files have been found
INFO - Processing file started
...
INFO - Processing file ended
INFO - Processing file started
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...
INFO - Processing file ended
...

It's good that many important information are logged. However, there is a major issue with this log. The lack of the context! For example we know that some files have been processed but we don't know which exactly. This log should look as follows (I used red color to mark changes):

INFO - 10 files have been found in the directory 'C:\Input'
INFO - Processing file 'C:\Input\a.txt' started
...
INFO - Processing file 'C:\Input\a.txt' ended
INFO - Processing file 'C:\Input\b.txt' started
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...
INFO - Processing file 'C:\Input\b.txt' ended
...

It looks much better. Based on the log we can figure out which directory was monitored, which files have been processed successfully and which not. However, it's not everything. There is one more subtle problem here. What if messages with the severity Information or lower won't be logged (for example because of performance issues) and an error will be reported. In this case we'll get the following log:

...
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...

It's better than nothing but again we don't know processing of which file has actually failed. The expected result is:

...
ERROR - An error occurred while processing a file 'C:\Input\b.txt': Cannot create a file when that file already exists.
...

To sum up, always remember about the context when logging.

20/02/2017

Interview Questions by MK #8

Home


Title: Sunset seen from the top of Mount Fuji, Source: own resources, Authors: Agnieszka and Michał Komorowscy

This is the first post from Interview Questions by MK series for a long time. The motivation to write it was a short talk with my colleague. His company really want to hire new .NET developers. The situation on the market is difficult for employers so they are also considering juniors without experience. And still they have a problem to find someone. Why?

The requirements are not extremely high. I'd say that they are standard. They don't demand God knows what. The ideal candidate doesn't have to: know all formatting options available in .NET, enumerate classes in System.DirectoryServices namespace, tell about all new features introduced in any .NET version or any other thing that can be checked in the documentation within seconds. However, they want someone with general knowledge. What I mean by that?
  • It's good to know how to write a class, properties, derived a class... but it's also good to understand and can explain principals of the object oriented programming. For example could you tell why OOP is better than the procedural programming? Or maybe it isn't? Could you justify why encapsulation is actually a good thing?
  • You don't have to know all possible collections available in .NET API but it's worth knowing some of them and their characteristics. Just to mention the list, the dictionary, the stack or the queue.
  • You don't have to be very good in algorithms but knowing how to search a binary tree is not the rocket science.
  • Writing a code that compiles without errors is only a first step. You should also know how to write a readable and a maintainable code. This knowledge comes from experience but at the beginning you should hear about refactoring, knows that the method with 50 parameters is not the best choice...
  • It's not a problem if you have never worked with the continuous integration but you should at least know that there is something like that.
  • As a developer you'll probably not work directly with IT infrastructure but knowing what is the load balancing or the computer cluster does not seem very demanding.
  • ...
I can continue and continue this enumeration. According to me these are basic things, but still many candidates don't know them. Sometimes even developers with a few or more years of experience.

If you are one of them and you want to have better chances on the job market, I recommend one simple thing i.e. reading books, blogs, web sites... whatever you want. Several minutes (better more) every day, regularly, will make a big difference.

You may also say that you don't care because you'll get a job anyway. Well, it's true at least for now. Nonetheless it'll be only some job.

13/02/2017

C++ for C# developers - automatic garbage collection

Home


Title: Sushi on the fish market in Tokyo, Source: own resources, Authors: Agnieszka and Michał Komorowscy

In my previous post I wrote that C++ doesn't have the automatic garbage collection. However, it's not entirely true. Let's think about it more carefully. To start, look at the following method in C#. It simply creates an instance of MyObject class.
public void Fun() 
{
   var x = new MyObject();
   //...
}
When this object will be collected by GC? When it is needed but not sooner than the execution of Fun method is finished. Can we do the same thing in C++? Of course yes, here is an example:
void fun() {
   MyObject x;
   //...
}
If you are not familiar with C++ you may say that I only declared the variable x in this example and that I didn't create any object. Well, in C++ the following 3 commands are equivalent:

MyObject x;
MyObject x();
MyObject x = MyObject();

Personally, I still cannot used to that ;) But let's return to C++ and ask the similar question as for C#. When a destructor will be called for created instance of MyObject? The answer is easy - when the execution of fun method is over. Or in other words when an object goes out of scope. It's worth nothing that this behaviour is called the automatic storage duration. Usually it is implemented by using stack however it's not the rule. Now, let's consider this code in C++:
void fun() {
   MyObject* x = new MyObject();
   //...
}
It looks almost the same like in C#. However, this time we're creating an object dynamically with new keyword. And this kind of objects won't be removed automatically from the memory, even if the execution of fun is over. This is also known as the dynamic storage duration. How to release this kind of objects?

In the past a C++ developer had to use delete keyword to do so. I did the same in the code from my post about virtual destructors. However, since C++ 11 we can use something else i.e. the explicit automatic garbage collection. More precisely I'm talking about smart pointers. Here is the rewritten version of Node class:

class Node {
 public:
  static int _count;
  
  Node(int i) : _value(i) { Node::_count++; }
  ~Node() {
    std::cout << "~Node " << _value;
    _count--;
  }
  
  int _value;

  std::unique_ptr<Node> _left = nullptr;
  std::unique_ptr<Node> _right = nullptr;
};

int Node::_count = 0;
In this code I defined ~Node destructor only for its side effects i.e. to decrease a counter. I didn't use delete in this code at all. Instead I wrapped pointers with std::unique_ptr. It works in this way that it releases the pointer when it goes out of scope. Nothing more nothing less. But thanks to that we don't have to remember about delete. Almost like in C#. Here is the testing code:
int main()
{
    Node* root = new Node(1);
    root->_left = std::unique_ptr<Node>(new Node(2));
    root->_right = std::unique_ptr<Node>(new Node(3));

    std::cout << " Existing nodes: " <<  Node::_count;
    delete root;
    std::cout << " Existing nodes: " <<  Node::_count;
}
I didn't wrap root in a smart pointer because I wanted to explicitly use delete and verify the final number of nodes. Easy, isn't it?

At the end it's worth mentioning that there are also different types of smart pointers. shared_ptr should be used when the same pointer can have many owners. Whereas std::weak_ptr represents the same concept as C# WeakReference class. Last but not least, except the automatic storage duration and the dynamic storage duration we also have the static storage duration and the thread storage duration. The former is used to store static variables which are release at the end of program (pretty the same as in C#) and the latter to store variables that will survive until the end of a thread (in C# we can use TheradLocalStorage for the similar effect). More reading can be found here.

06/02/2017

C++ for C# developers - virtual destructors

Home


Title: Tokio, Source: own resources, Authors: Agnieszka and Michał Komorowscy

In C# it's simple, we use destructors a.k.a. finalizers almost never. The only one case when they are inevitable is the implementation of Disposable pattern. In C++ the situation is different because we don't have the automatic garbage collection. It means that if we create a new object with new keyword we have to destroy it later by using delete keyword. And if the object being deleted contains pointers to other objects created dynamically, they also need to be deleted. It's where destructors come to game. Here is an example with a class Node which models a binary tree. It's simplified and it is why all fields are public, don't do it in the production! Node::_count is a static field that I'm using to count created objects.
#include <stdexcept>
#include <iostream>

class Node {
 public:
  Node(int i) : _value(i) { Node::_count++; }
  
  ~Node() {
    std::cout << " ~Node " << _value <<;
    
    if(_left != nullptr) delete _left;
    if(_right != nullptr) delete _right;
    
    _count--;
  }

  static int _count;
 
  int _value;
  
  Node* _left = nullptr;
  Node* _right = nullptr;
};

int Node::_count = 0;
Here is a testing code. If you run it you should see the result as follows: Existing nodes: 3 ~Node 1 ~Node 2 ~Node 3 Existing nodes: 0. We can see that all nodes have been deleted and that a destructor was executed 3 times.
int main()
{
    Node* root = new Node(1);
    root->_left = new Node(2);
    root->_right = new Node(3);

    std::cout << " Existing nodes: " << Node::_count;
    delete root;
    std::cout << " Existing nodes: " << Node::_count;
}
Now let's derived a new class from Node in the following way:
class DerivedNode : public Node {
 public:
  DerivedNode(int i) : Node(i) {
  }
  ~DerivedNode() {
    std::cout << " ~DerivedNode " << _value;
  }
};
And modify a testing code a little bit in order to use our new class:
int main()
{
    Node* root = new DerivedNode(1);
    root->_left = new DerivedNode(2);
    root->_right = new DerivedNode(3);

    std::cout << " Existing nodes: " << Node::_count;
    delete root;
    std::cout << " Existing nodes: " << Node::_count;
}
The expectation is that ~DerivedNode destructor should be called together with the base class destructor ~Node. However, if you run the above code you'll notice see that it's not true i.e. you'll see the same result as earlier. To explain what's going look at the C# code below and answer the following question: Why I see "I'm A" if I created an instance of class B
public class A
{
   public void Fun() { Console.WriteLine("I'm A"); }
}

public class B: A
{
   public void Fun() { Console.WriteLine("I'm  B"); }
}

A a = new B();
a.Fun();
I hope that it's not a difficult question. The answer is of course because Fun is not a virtual method. In C++ we have the same situation. Now you may say "Wait a minute, but we're talking about destructors not methods". Ya, but destructors are actually a special kind of methods. The fix is simple we just need to use a concept completely unknown in C# i.e. a virtual destructor.
virtual ~Node() {
  ...
}
This time the test code will give the following result Existing nodes: 3 ~DerivedNode 1 ~Node 1 ~DerivedNode 2 ~Node 2 ~DerivedNode 3 ~Node 3 Existing nodes: 0 .