20/04/2017

How I removed 50% of the code

Home

Title: Azuleyo tiles somewhere in Portugal, Source: own resources, Authors: Agnieszka and Michał Komorowscy


My last 2 posts were about problems with using Roslyn. Nonetheless, even if I sometime hate it, I'm still using it so the time has come to show some practical example of using Roslyn. Recently, I've been working on the task that can be summed up as: Take this ugly code and do something with it. i.e. more or less the refactoring task.

Now I'll give you some intuition of what I have to deal with. The code that I have to refactor was generated automatically based on XML schema. These were actually DTO classes used to communicate with the external service. Here are some statistics:
  • 28.7 thousands lines of code in 23 files.
  • 2200 classes and 920 enums.
  • Many classes and enums seems to me identical or very similar.

10/04/2017

Why I hate Roslyn even more

Home



In my previous post I wrote about my problem with "empty" projects and Roslyn. The symptom was that in some cases according to Roslyn my C# projects didn't contain any files. For quite a long time, I haven't been able to find a solution. Especially because I couldn't reproduce problem on my local machine. Fortunately, today I noticed exactly the same problem on another computer.

29/03/2017

Why I hate Roslyn

Home



The more I work with Roslyn the more I appreciate the possibilities it gives and the more I hate it. And I hate it for the same thing as many other projects I worked with in the past. What is it? Well, I like when a system fails fast, fails loudly and fails in the clear way. Unfortunately, Roslyn can do something completely different what sometimes makes working with it the pain in ass. I'll give you some examples.

Issue 1 - Problem with "empty" projects

Here is the code that shows how I usually process documents/files for a given project. It's pretty easy.
var workspace = MSBuildWorkspace.Create();

var sln = await workspace.OpenSolutionAsync(path);
     
foreach (var projectId in sln.ProjectIds)
{
   var project = sln.GetProject(projectId);

   foreach (var documentId in project.DocumentIds)
   {
      // Process a document
   }
}
It works quite well but only on my machine :) On 2 other machines I'm observing problems. In general I have an example solution with 2 test projects. One is WPF application and the another is WebAPI.

The problem is that on some machines I can only read and analyze WPF application. If I try to do exactly the same thing with WebAPI application, then the project loaded by Roslyn is empty i.e. contains no documents (DocumentIds property is empty)! I've already tried to load this project in a different way but without success.

To be honest currently I'm stuck and I have no idea what is wrong here. Any suggestions?

Issue 2 - the semantic analysis does not work

With Roslyn we can perfrom the syntax analysis and the semantic analysis of the code. The syntax analysis, with a syntaxt tree, allows you to only see a structure of a program. The semantic analysis is more powerfull and allows you to understand more. For example, having a code like that:

SomeClass x;

With the semantic analysis you can check that SomeClass is defined within SomeNamespace and has X members (methods, properties). For example, here we have a code showing how to use the semantic analysis to check what interfaces are implemented by a given class at any level of the inheritance.
var compilation = await project.GetCompilationAsync();

foreach (var documentId in project.DocumentIds)
{
   var document = project.GetDocument(documentId);

   // Get a syntax tree
   var tree = await document.GetSyntaxTreeAsync(); 

   // Get a root of the syntax tree
   var root = await tree.GetRootAsync(); 
   
   // Find a node of the syntaxt tree for a first class in a file/document
   var classNode= root.DescendantNodes().OfType<ClassDeclarationSyntax>().FirstOrDefault(); 

   if(classNode== null) continue;

   // Get a semantic model for the syntax tree
   var semanticModel = compilation.GetSemanticModel(tree); 

   // Use the semantic model to get symbol info for the found class node
   var symbol = semanticModel.GetDeclaredSymbol(classNode); 

   // Check what inerfaces are implemnted by the class at any level
   foreach(var @interface in symbol.AllInterfaces)  
   {
      // ...
   }
}
If you run this code as it is, it again will not throw any exceptions. However, you'll noticed that any found class doesn't implement any interface according to Roslyn. Where is the problem this time?

It's quite obious if you know that. To perform the semantic analysis Roslyn needs to analyse assemblies used by the project. However, it's now enough to compile the project. You have to explicitly register all required assemblies. I do it in the easy way. I simply register all assemblies found in the output folder.
var compilation = await project.GetCompilationAsync();

// Let's register mscorlib
compilation = compilation.AddReferences(MetadataReference.CreateFromFile(typeof(object).Assembly.Location));

if (Directory.Exists("PATH TO OUTPUT DIRECTORY"))
{
   var files = Directory.GetFiles(directory, "*.dll").ToList(); // You can also look for *.exe files

   foreach (var f in files)
      compilation = compilation.AddReferences(MetadataReference.CreateFromFile(f));
}
And again if the semantic analysis can not be performed without that why no exception is thrown?

Issue 3 - Problem with reading projects/solutions

This one I've already described in more details in the post about Roslyn and unit tests. The problem was that:
  • MSBuildWorkspace.OpenSolutionAsync method was returning an empty solution if a particular assembly was missing (not fast, not loud)
  • MSBuildWorkspace.OpenSolutionAsync method was returning the error The language 'C#' is not supported (not in the clear way).
This particular assembly was Microsoft.CodeAnalysis.CSharp.Workspaces.dll so wouldn't it be easier to just throw an xception saying that it is missing. Or at least saying that it was not possible to find assembly responsible for reading C# projects and solutions.


Remember failing fast, loudly and in the clear way does not cost much but can save a lot of time.


*The picture at the beginning of the post comes from own resources and shows cliffs near Cabo da Roca - the westernmost extent of mainland Portugal.

24/03/2017

Report from the battlefield #10 - fuck-up with AutoMapper

Home



Have you ever heard or used AutoMapper? What a question, of course you have. And in the very unlikely scenario that you haven't, it's the object to object mapper that allows you to map probably everything. In short no more manual, boring, tedious, error-prone mapping.

However, the great power comes with great responsibility. In the recent time, I had an occasion to fix 2 difficult to track bugs related to improper usage of AutoMapper. Both issues were related to the feature of AutoMapper which according to me is almost useless and at least should be disabled by default. Let's look at the following 2 classes and testing code:
public class SomeSourceClass
{
   public Guid Id { get; set; }
   public string IdAsString => Id.ToString();
   public string Value { get; set; }
}

public class SomeDestinationClass
{
   public Guid Id { get; set; }
   public string IdAsString => Id.ToString();
   public string Value { get; set; }
}

class Program
{ 
   static void Main()
   {
      Mapper.Initialize(config => config.CreateMap<SomeSourceClass,SomeDestinationClass>>());
      
      var src = new SomeSourceClass { Id = Guid.NewGuid(), Value = "Hello" };
      var dest = Mapper.Map<SomeDestinationClass>(src);

      Console.WriteLine($"Id = {dest.Id}");
      Console.WriteLine($"IdAsString = {dest.IdAsString}");
      Console.WriteLine($"Value = {dest.Value}");
   }
}
This works as a charm. If you run this example, you should see output like that:

Id = a2648b9e-60be-4fcc-9968-12a20448daf4
IdAsString = a2648b9e-60be-4fcc-9968-12a20448daf4
Value = Hello

Now, let's introduce interfaces that will be implemented by SomeSourceClass and SomeDestinationClass:
public interface ISomeSourceInterface
{
   Guid Id { get; set; }
   string IdAsString { get; }
   string Value { get; set; }
}

public interface ISomeDestinationInterface
{
   Guid Id { get; set; }
   string IdAsString { get; }
   string Value { get; set; }
}

public class SomeSourceClass: ISomeSourceInterface { /*... */}

public class SomeDestinationClass : ISomeDestinationInterface { /*... */}
We also want to support mappings from ISomeSourceInterface to ISomeDestinationInterface so we need to configure AutoMapper accordingly. Otherwise the mapper will throw an exception.
Mapper.Initialize(config =>
   {
      config.CreateMap<SomeSourceClass, SomeDestinationClass>();
      config.CreateMap<ISomeSourceInterface, ISomeDestinationInterface>();
   });

var src = new SomeSourceClass { Id = Guid.NewGuid(), Value = "Hello" };
var dest = Mapper.Map<ISomeDestinationInterface>(src);

Console.WriteLine($"Id = {dest.Id}");
Console.WriteLine($"IdAsString = {dest.IdAsString}");
Console.WriteLine($"Value = {dest.Value}");
If you run this code, it'll seemingly work as the charm. However, there is a BIG PROBLEM here. Let's examine more carefully what was written to the console. The result will look as follows:

Id = a2648b9e-60be-4fcc-9968-12a20448daf4
IdAsString =
Value = Hello

Do you see a problem? The readonly property IdAsString is empty. It seems crazy because IdAsString property only returns the value of Id property which is set. How is it possible?

And here we come the feature of AutoMapper which according to be should be disabled by default i.e. automatic proxy generation. When AutoMapper tries to map ISomeSourceInterface to ISomeDestinationInterface it doesn't know which implementation of ISomeDestinationInterface should be used. Well, actually no implementation may even exists, so it generates one. If we check the type of dest property we'll see something like:

Proxy<ConsoleApplication1.ISomeDestinationInterface_ConsoleApplication1_Version=1.0.0.0_Culture=neutral_PublicKeyToken=null>.

Initially this function may look as something extremely useful. But it's the Evil at least because of the following reasons:
  • As in the example, the mapping succeeds but the result object contains wrong data. Then this object may be used to create other objects... This can lead to really difficult to detect bugs.
  • If a destination interface defines some methods, a proxy will be generated, but the mapping will fail due to System.TypeLoadException.
  • It shouldn't be needed in the well written code. However, if you try to cast the result of the mapping to the class, then System.InvalidCastException exception will be thrown.
The ideal solution would be to disable this feature. However, I don't know how :( The workaround is to explicitly tell AutoMapper not to generate proxies. To do that we need to use As method and specify which concrete type should be created instead of a proxy.

The final configuration looks as follows. It's also worth mentioning that in this case we actually don't need to define mapping from SomeSourceClass to SomeDestinationClass. AutoMapper is clever enough to figure out that these classes implements interfaces.
Mapper.Initialize(
   config =>
   {
      config.CreateMap<ISomeSourceInterface, ISomeDestinationInterface>().As<SomeDestinationClass>();
   });


AutoMapper proxy generation feature is the Evil.


*The picture at the beginning of the post comes from own resources and shows Okonomiyaki that we ate in Hiroshima. One of the best food we've ever eaten.

15/03/2017

Report from the battlefield #9 - async/await + MARS

Home



This post from Report from the battlefield series will be about my own mistake. It is related to async/await and MARS i.e. Multiple Active Result Sets. async/await allows us to use asynchronous programming more easily. MARS is a feature of MSSQL that allows us to have more than one pending request opened per connection at the same time. For example, it may be useful if we have 2 nested loops i.e. internal and external. External loops iterate through one result set and the internal one through another. Ok, but you probably wonder what MARS has in common with async/await.

A few days ago my application started failing due to InvalidOperationException exception with the additional message saying that The connection does not support MultipleActiveResultSets. Well, I used MARS in the past so I simply enabled it in the connection string by setting MultipleActiveResultSets attribute to true.

However, later I realized that my application should not require MARS at all so I started digging into what was wrong. It turned out that the problem was related to my silly mistake in using async/await. Let's look at the following simplified version of the problematic code. We have a trivial Main method:
static void Main()
{
   Start().GetAwaiter().GetResult();
}
Start is an async method responsible for opening a connection to DB and executing other async methods:
private static async Task Start()
{
   using (var c = new SqlConnection(ConnectionString))
   {
      c.Open();

      await Func1(c);
      await Func2(c);
      await Func3(c);
   }
}
Func1, Func2 and Func3 are responsible for reading data and processing them. In our case, for simplification, they all will do the same thing:
private static async Task Func1(SqlConnection c) => await ReadData(c);
private static async Task Func2(SqlConnection c) => ReadData(c);
private static async Task Func3(SqlConnection c) => await ReadData(c);
And here is the ReadData method. It's also simple. It simply reads data from a table:
private static async Task ReadData(SqlConnection c)
{
   var cmd = c.CreateCommand();

   cmd.CommandText = "SELECT * FROM dbo.Fun";

   using (var reader = await cmd.ExecuteReaderAsync())
   {
      while (await reader.ReadAsync())
      {
         // Process data
      }
   }
}
If you run this code, the aforementioned InvalidOperationException exception will be thrown in the line with ExecuteReaderAsync. The question is why? Well, in this short code it is rather easy to spot that in Func2 method await is missing before ReadData. But, do you know why it is a problem? If not, don't worry it's a little bit tricky.

Here is an explanation. Without await the simplified flow is as follows:
  • ...
  • Start executes Func2.
  • Func2 executes ReadData.
  • ReadData executes ExecuteReaderAsync.
  • ReadData awaits for the result from ExecuteReaderAsync.
  • The control returns to caller i.e. Func2.
  • However, Func2 does not use await so it returns completed task to Start method.
  • From the point of view of Start processing of Func2 is finished so it executes Func3.
  • Func3 executes ReadData
  • The previous call to ReadData may be still in progress.
  • It also means that ReadData will call ExecuteReaderAsync when another result set is still being processed.
  • The exception is thrown.
Adding missing await fix the problem. Thanks to that the task returned from Func2 will not be completed until call to ReadData is over. And if so Start will not execute Func3 immediately. The final well known conclusion is:

Always async/await all the way down.


*The picture at the beginning of the post comes from own resources and shows Laurel forest on La Gomera.

08/03/2017

Roslyn and unit tests suck

Home


Title: Imperial Gardens in Tokyo, Source: own resources, Authors: Agnieszka and Michał Komorowscy

I'm working on the project where I have an opportunity to use Roslyn compiler as a service. It is very good :) However yesterday it took me more than 2 hours to write working unit tests (based on MSTest) for my code! Here are some tips that may save your time.

Let's start with the simle thing. When I run unit tests for the first time the following exception was thrown:

System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime...' or one of its dependencies.

To fix this problem I simply installed the following packages via Nuget:
  • Microsoft.CodeAnalysis.CSharp
  • Microsoft.CodeAnalysis.CSharp.Workspaces
Later it was harder. In my code I use MSBuildWorkspace.OpenSolutionAsync and MSBuildWorkspace.OpenProjectAsync methods to respectively open the entire solution or a single project for further processing.

The next issue was that the first method called from within a unit test was returning an empty solution i.e. without any projects. Whereas the second one was throwing an exception with the message: The language 'C#' is not supported. What was strange these problems occurred only in unit tests! To investigate a problem I opened Exception settings window in Visual Studio and selected a check box next to Common Language Runtime Exceptions. Then, I run the unit tests one more time and Visual Studio quickly reported the exception in the line with MSBuildWorkspace.OpenProjectAsync:

System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.CodeAnalysis.CSharp.Workspaces...' or one of its dependencies.

It was even more strange because my unit test project was actually referencing Microsoft.CodeAnalysis.CSharp.Workspaces.dll! To double check, I went to the unit tests working directory. It is a folder called TestResults which by default is located in the solution directory. To my surprised this dll was missing!

Fortunately, I reminded myself the similar situation from the past. The problem is that MSTest doesn't copy all assemblies to the output directory by default. As far as I know it tries to figure out which assemblies are really needed by the code being tested. Here, I'm not sure but Microsoft.CodeAnalysis.CSharp.Workspaces.dll may be cumbersome because it is not directly referenced by other Roslyn assemblies. Instead, it is probably loaded dynamically when needed.

To fix a problem you can use the simple hack i.e. use directly any code from Microsoft.CodeAnalysis.CSharp.Workspaces.dll in your unit tests in the following way:

[ClassInitialize]
public static void ClassInitialize(TestContext ctx)
{
   var t = typeof(Microsoft.CodeAnalysis.CSharp.Formatting.LabelPositionOptions);
}

Why did I use LabelPositionOptions? Because majority of types defined in aforementioned assembly is internal and this one was the first public type I found :)

27/02/2017

Report from the battlefield #8 - always remember about the context

Home


Title: Sunrise seen from the top of Mount Fuji, , Source: own resources, Authors: Agnieszka and Michał Komorowscy

I decided to change a little bit a character of Report from the battlefield series. Initially, in this series, I was describing my observations from my work as a reviewer for a recruitment company. Now, I'll be also writing about my findings from my day-to-day work. To start, I'll give you a tip how to log useful information.

I worked with an application that is responsible for monitoring folders. If it detects any new files, they are processed and copied somewhere else. The application logs information like the number of files to be processes, the file that is currently being processed etc. This information are logged with severity Information or Debug. It happens that a given file cannot be copied for example because the file with the same name already exists in the destination directory. In that case .NET throws System.IO.IOException. This exception is caught and logged with the severity Error. The simplified version of the log could look in this way:

INFO - 10 files have been found
INFO - Processing file started
...
INFO - Processing file ended
INFO - Processing file started
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...
INFO - Processing file ended
...

It's good that many important information are logged. However, there is a major issue with this log. The lack of the context! For example we know that some files have been processed but we don't know which exactly. This log should look as follows (I used red color to mark changes):

INFO - 10 files have been found in the directory 'C:\Input'
INFO - Processing file 'C:\Input\a.txt' started
...
INFO - Processing file 'C:\Input\a.txt' ended
INFO - Processing file 'C:\Input\b.txt' started
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...
INFO - Processing file 'C:\Input\b.txt' ended
...

It looks much better. Based on the log we can figure out which directory was monitored, which files have been processed successfully and which not. However, it's not everything. There is one more subtle problem here. What if messages with the severity Information or lower won't be logged (for example because of performance issues) and an error will be reported. In this case we'll get the following log:

...
ERROR - An error occurred while processing a file: Cannot create a file when that file already exists.
...

It's better than nothing but again we don't know processing of which file has actually failed. The expected result is:

...
ERROR - An error occurred while processing a file 'C:\Input\b.txt': Cannot create a file when that file already exists.
...

To sum up, always remember about the context when logging.

20/02/2017

Interview Questions by MK #8

Home


Title: Sunset seen from the top of Mount Fuji, Source: own resources, Authors: Agnieszka and Michał Komorowscy

This is the first post from Interview Questions by MK series for a long time. The motivation to write it was a short talk with my colleague. His company really want to hire new .NET developers. The situation on the market is difficult for employers so they are also considering juniors without experience. And still they have a problem to find someone. Why?

The requirements are not extremely high. I'd say that they are standard. They don't demand God knows what. The ideal candidate doesn't have to: know all formatting options available in .NET, enumerate classes in System.DirectoryServices namespace, tell about all new features introduced in any .NET version or any other thing that can be checked in the documentation within seconds. However, they want someone with general knowledge. What I mean by that?
  • It's good to know how to write a class, properties, derived a class... but it's also good to understand and can explain principals of the object oriented programming. For example could you tell why OOP is better than the procedural programming? Or maybe it isn't? Could you justify why encapsulation is actually a good thing?
  • You don't have to know all possible collections available in .NET API but it's worth knowing some of them and their characteristics. Just to mention the list, the dictionary, the stack or the queue.
  • You don't have to be very good in algorithms but knowing how to search a binary tree is not the rocket science.
  • Writing a code that compiles without errors is only a first step. You should also know how to write a readable and a maintainable code. This knowledge comes from experience but at the beginning you should hear about refactoring, knows that the method with 50 parameters is not the best choice...
  • It's not a problem if you have never worked with the continuous integration but you should at least know that there is something like that.
  • As a developer you'll probably not work directly with IT infrastructure but knowing what is the load balancing or the computer cluster does not seem very demanding.
  • ...
I can continue and continue this enumeration. According to me these are basic things, but still many candidates don't know them. Sometimes even developers with a few or more years of experience.

If you are one of them and you want to have better chances on the job market, I recommend one simple thing i.e. reading books, blogs, web sites... whatever you want. Several minutes (better more) every day, regularly, will make a big difference.

You may also say that you don't care because you'll get a job anyway. Well, it's true at least for now. Nonetheless it'll be only some job.

13/02/2017

C++ for C# developers - automatic garbage collection

Home


Title: Sushi on the fish market in Tokyo, Source: own resources, Authors: Agnieszka and Michał Komorowscy

In my previous post I wrote that C++ doesn't have the automatic garbage collection. However, it's not entirely true. Let's think about it more carefully. To start, look at the following method in C#. It simply creates an instance of MyObject class.
public void Fun() 
{
   var x = new MyObject();
   //...
}
When this object will be collected by GC? When it is needed but not sooner than the execution of Fun method is finished. Can we do the same thing in C++? Of course yes, here is an example:
void fun() {
   MyObject x;
   //...
}
If you are not familiar with C++ you may say that I only declared the variable x in this example and that I didn't create any object. Well, in C++ the following 3 commands are equivalent:

MyObject x;
MyObject x();
MyObject x = MyObject();

Personally, I still cannot used to that ;) But let's return to C++ and ask the similar question as for C#. When a destructor will be called for created instance of MyObject? The answer is easy - when the execution of fun method is over. Or in other words when an object goes out of scope. It's worth nothing that this behaviour is called the automatic storage duration. Usually it is implemented by using stack however it's not the rule. Now, let's consider this code in C++:
void fun() {
   MyObject* x = new MyObject();
   //...
}
It looks almost the same like in C#. However, this time we're creating an object dynamically with new keyword. And this kind of objects won't be removed automatically from the memory, even if the execution of fun is over. This is also known as the dynamic storage duration. How to release this kind of objects?

In the past a C++ developer had to use delete keyword to do so. I did the same in the code from my post about virtual destructors. However, since C++ 11 we can use something else i.e. the explicit automatic garbage collection. More precisely I'm talking about smart pointers. Here is the rewritten version of Node class:

class Node {
 public:
  static int _count;
  
  Node(int i) : _value(i) { Node::_count++; }
  ~Node() {
    std::cout << "~Node " << _value;
    _count--;
  }
  
  int _value;

  std::unique_ptr<Node> _left = nullptr;
  std::unique_ptr<Node> _right = nullptr;
};

int Node::_count = 0;
In this code I defined ~Node destructor only for its side effects i.e. to decrease a counter. I didn't use delete in this code at all. Instead I wrapped pointers with std::unique_ptr. It works in this way that it releases the pointer when it goes out of scope. Nothing more nothing less. But thanks to that we don't have to remember about delete. Almost like in C#. Here is the testing code:
int main()
{
    Node* root = new Node(1);
    root->_left = std::unique_ptr<Node>(new Node(2));
    root->_right = std::unique_ptr<Node>(new Node(3));

    std::cout << " Existing nodes: " <<  Node::_count;
    delete root;
    std::cout << " Existing nodes: " <<  Node::_count;
}
I didn't wrap root in a smart pointer because I wanted to explicitly use delete and verify the final number of nodes. Easy, isn't it?

At the end it's worth mentioning that there are also different types of smart pointers. shared_ptr should be used when the same pointer can have many owners. Whereas std::weak_ptr represents the same concept as C# WeakReference class. Last but not least, except the automatic storage duration and the dynamic storage duration we also have the static storage duration and the thread storage duration. The former is used to store static variables which are release at the end of program (pretty the same as in C#) and the latter to store variables that will survive until the end of a thread (in C# we can use TheradLocalStorage for the similar effect). More reading can be found here.

06/02/2017

C++ for C# developers - virtual destructors

Home


Title: Tokio, Source: own resources, Authors: Agnieszka and Michał Komorowscy

In C# it's simple, we use destructors a.k.a. finalizers almost never. The only one case when they are inevitable is the implementation of Disposable pattern. In C++ the situation is different because we don't have the automatic garbage collection. It means that if we create a new object with new keyword we have to destroy it later by using delete keyword. And if the object being deleted contains pointers to other objects created dynamically, they also need to be deleted. It's where destructors come to game. Here is an example with a class Node which models a binary tree. It's simplified and it is why all fields are public, don't do it in the production! Node::_count is a static field that I'm using to count created objects.
#include <stdexcept>
#include <iostream>

class Node {
 public:
  Node(int i) : _value(i) { Node::_count++; }
  
  ~Node() {
    std::cout << " ~Node " << _value <<;
    
    if(_left != nullptr) delete _left;
    if(_right != nullptr) delete _right;
    
    _count--;
  }

  static int _count;
 
  int _value;
  
  Node* _left = nullptr;
  Node* _right = nullptr;
};

int Node::_count = 0;
Here is a testing code. If you run it you should see the result as follows: Existing nodes: 3 ~Node 1 ~Node 2 ~Node 3 Existing nodes: 0. We can see that all nodes have been deleted and that a destructor was executed 3 times.
int main()
{
    Node* root = new Node(1);
    root->_left = new Node(2);
    root->_right = new Node(3);

    std::cout << " Existing nodes: " << Node::_count;
    delete root;
    std::cout << " Existing nodes: " << Node::_count;
}
Now let's derived a new class from Node in the following way:
class DerivedNode : public Node {
 public:
  DerivedNode(int i) : Node(i) {
  }
  ~DerivedNode() {
    std::cout << " ~DerivedNode " << _value;
  }
};
And modify a testing code a little bit in order to use our new class:
int main()
{
    Node* root = new DerivedNode(1);
    root->_left = new DerivedNode(2);
    root->_right = new DerivedNode(3);

    std::cout << " Existing nodes: " << Node::_count;
    delete root;
    std::cout << " Existing nodes: " << Node::_count;
}
The expectation is that ~DerivedNode destructor should be called together with the base class destructor ~Node. However, if you run the above code you'll notice see that it's not true i.e. you'll see the same result as earlier. To explain what's going look at the C# code below and answer the following question: Why I see "I'm A" if I created an instance of class B
public class A
{
   public void Fun() { Console.WriteLine("I'm A"); }
}

public class B: A
{
   public void Fun() { Console.WriteLine("I'm  B"); }
}

A a = new B();
a.Fun();
I hope that it's not a difficult question. The answer is of course because Fun is not a virtual method. In C++ we have the same situation. Now you may say "Wait a minute, but we're talking about destructors not methods". Ya, but destructors are actually a special kind of methods. The fix is simple we just need to use a concept completely unknown in C# i.e. a virtual destructor.
virtual ~Node() {
  ...
}
This time the test code will give the following result Existing nodes: 3 ~DerivedNode 1 ~Node 1 ~DerivedNode 2 ~Node 2 ~DerivedNode 3 ~Node 3 Existing nodes: 0 .

30/01/2017

C++ for C# developers - var and foreach

Home


Title: A-Bomb dome in Hiroshima, Source: own resources, Authors: Agnieszka and Michał Komorowscy

When I returned to programming in C++ after years of using C# a few things were especially painful. Today I'll wrote about 2 at the top of the list. The first one was a need to explicitly declare types of local variables. For example:
std::vector< std::string > v = someMethod();
std::map< std::string, std::map<std::string, std::string> > m = someMethod2();
It looks terrible and is simply cumbersome. However, as you may noticed I used the past tense. It turned out that it's not needed any more. Glory and honor to C++11!!! Now, I can write something like this.
auto v = someMethod();
auto m = someMethod2();
The second problem was the lack of foreach operator. For example let's write a code that iterates through a map from the example above:
typedef std::map<std::string, std::map<std::string, std::string>>::iterator outer_iterator;
typedef std::map<std::string, std::string>::iterator inner_iterator;
    
for(outer_iterator it1 = m.begin(); it1 != m.end(); it1++) {
   for(inner_iterator it2 = it1->second.begin(); it2 != it1->second.end(); it2++) {
      std::cout<< it1->first << " " << it2->first << " " << it2->second << std::endl;
   }
}
Again it looks terrible and is cumbersome. All this begin(), end(), typedef are horrible. We can fix it a little bit if we use auto keyword:
for(auto it1 = m.begin(); it1 != m.end(); it1++) {
 for(auto it2 = it1->second.begin(); it2 != it1->second.end(); it2++) {
  std::cout<< it1->first << " " << it2->first << " " << it2->second << std::endl;
 }
}
But even the better result we will achieve if we use a new for loop syntax:
for(auto it1 : m) {
   for(auto it2 : it1.second) {
      std::cout<< it1.first << " " << it2.first << " " << it2.second << std::endl;
   }
}
The difference is striking! It's so much readable and easier to write and uderstand.

23/01/2017

C++ for C# developers - code like in Google

Home


Title: Elephant Retirement Camp in the vicinity of Chiang Mai, Source: own resources, Authors: Agnieszka and Michał Komorowscy

In the post Nuget in C++ rulez I wrote that I returned to programming in C++. It is like a new world for me but it's better and better. I'm even reminding myself things that I learned many years ago so it's not bad with me ;) Recently, I've discovered a C++ alternative for .NET StyleCop. StyleCop is a tool that analyses C# code in order to check if it is consistent with given rules and good practices. What is obvious there is a similar thing for C++ I'm talking about a tool called CppLint that was created by Google. It's written in Python and is fairly easy in use. However, please note that CodeLint requires the old Python 2.7. I tried and it won't work with Python 3.5.

When I run CppLint on my code it turned out that my habits from C# don't fit to C++ world according to Google. Here is an example of Hello Word written in C++ but in C# style.
#include <iostream>

namespace sample 
{
    class HelloWorld
    {
        public:
            void Fun()
            {
                std::cout << "Hello World Everyone!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" << std::endl;
            }
    };
}

int main()
{
  sample::HelloWorld hw = sample::HelloWorld();
  hw.Fun();
  
  return 0;
}
If we verify this code, we will get the following errors:
a.cpp:0:  No copyright message found.  You should have a line: "Copyright [year] <Copyright Owner>"  [legal/copyright] [5]
a.cpp:3:  Line ends in whitespace.  Consider deleting these extra spaces.  [whitespace/end_of_line] [4]
a.cpp:4:  { should almost always be at the end of the previous line  [whitespace/braces] [4]
a.cpp:5:  Do not indent within a namespace  [runtime/indentation_namespace] [4]
a.cpp:6:  { should almost always be at the end of the previous line  [whitespace/braces] [4]
a.cpp:7:  public: should be indented +1 space inside class HelloWorld  [whitespace/indent] [3]
a.cpp:9:  { should almost always be at the end of the previous line  [whitespace/braces] [4]
a.cpp:10:  Lines should be <= 80 characters long  [whitespace/line_length] [2]
a.cpp:13:  Namespace should be terminated with "// namespace sample"  [readability/namespace] [5]
a.cpp:16:  { should almost always be at the end of the previous line  [whitespace/braces] [4]
a.cpp:19:  Line ends in whitespace.  Consider deleting these extra spaces.  [whitespace/end_of_line] [4]
a.cpp:21:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
At the beginning of each line we have the line number where an error was detected. The number in square brackets at the end of each line informs you how confident CppLint is about each error i.e. 1 - it may be a false positive, 5 - extremely confident. In order to fix all these problems I did the following things:
  • Added Copyright 2016 Michał Komorowski.
  • Removed whitespaces at the end of lines.
  • Added a new line at the end of file.
  • Added a comment // namespace sample
  • Move curly braces. This one I don't like the most.
  • Break a too long line. It's also a little bit strange to me. 80 doesn't seem to be a lot. However, shorter lines makes working with multiple windows easier (see also this answer).
Then I run a tool again and I got some new errors:
a.cpp:6:  Do not indent within a namespace  [runtime/indentation_namespace] [4]
a.cpp:7:  public: should be indented +1 space inside class HelloWorld  [whitespace/indent] [3]
I also fixed them and the final version of Hello Worlds compliant with Google rules looks as follows: Here is the correct version:
// Copyright 2016 Michal Komorowski

#include <iostream>

namespace sample {
class HelloWorld {
 public:
  void Fun() {
    std::cout
      << "Hello World Everyone!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
      << std::endl;
  }
};
}  // namespace sample

int main() {
  sample::HelloWorld hw = sample::HelloWorld();
  hw.Fun();
  return 0;
}

It's worth adding that CppLint has many configuration options for example you can disable some rules if you don't agree with them or change the maximum allowed length of a line (default is 80). Options can be also read from the configuration file CPPLINT.cfg.

16/01/2017

When Excel is better than machine learning?

Home


Title: Ruins of a castle in southern Poland, Source: own resources, Authors: Agnieszka and Michał Komorowscy

I can bet that some of you think that I'm crazy because I'm saying such blasphemies! Surely everyone knows that Excel is not for real developers ;) If you think so, I'll tell you a short story.

09/01/2017

Yerba Mate to the rescue

Home



I heard about Yerba Mate many years ago. It must have been in a television program run by quite famous Polish journalist, writer, satirist and traveler Wojciech Cejrowski who is also a popularizer of this drink. However, it took some time until I decided to give it a try. Now it is one of basic "tool" in my developer tool box. Why? This post explains it.

03/01/2017

Is it possible to do PhD and work full time?

Home


Source: own resources, Authors: Agnieszka and Michał Komorowscy

I decided to return to my series about Ph.D. studies and write about sharing a time between a job in the industry and scientific work. According to my experience it is quite a common scheme here in Poland (at least if we talk about the computer science). A vast majority of my colleagues had additional job during their Ph.D. studies. In this post I'll try to answer some question on this topic.

A short clarification. By working in the industry I uderstand engineering/technical work that in general doesn't have scientific part. However, I'm aware that there are positions in the industry that required scientific qualification and I'll also write a few words about it.

Why a Ph.D. student may want to work instead of focusing on his/her research?

Well the answer is trivial and it is money. Let's start with the fact that many MSc / BsC students work during their studies. It means that they may have 2, 3 years of experience when they start Ph.D. studies (I assume here that they start Ph.D. just after MSc). With such an experience their salary in the industry could be 2x, 3x, 4x times higher than on the university.

Is it feasible at all to work full time and finish Ph.D. Studies?

Short answer is yes it is possible. I was working full time for almost all my Ph.D. studies and I did it.

What about money from grants and additional projects on the university?

Someone may say that Ph.D. students can also earn additional money by working for their doctoral advisors. It's true but it depends strongly on your advisor. Some of them have grants, projects etc. and will allow you to earn additional money and sometimes this are quite good money. However, not all advisors have such possibilities. Besides you have to remember that grants/projects will end at some point. So you may have good money for X months and then poor money for another Y months.

Another option is to get a grant on your own. There are even dedicated funds for young scientists. However, I couldn't say much about that because I didn't have such a grant. The problem may be that in order to get such a grant you have to have good results. And in order to have good results you should focus on your research. And in order to focus on the research you can't have a full time job. But if you don't have a full time job, you'll have to live for considerable smaller amount of money...

How did a full time job affect your scientific work?

I'm convinced that my Ph.D. thesis would have been better if hadn't worked full time. I have no doubts here. It may sound trivial but the main problem is that scientific work required a lot of innovative thinking, much more than average programming work. And this kind of thinking is difficult after a day of work not to mention about a time for family.

I have one more, not so obvious, observation. I think that because of my full time job in the industry my Ph.D. thesis has engineering inclination. Is it good or bad? I'd say that it depends. We have to remember that Ph.D. is mainly about doing science and engineering part is less important.

I know the case when Ph.D. student didn't defend his thesis because it was too technical! On the another hand if you know the industry your work may be potentially more useful or it may be easier to find practical applications. To sum up it is important to preserve a proper balance between engineering and science with the focus on the latter.

Last but not least. If you work full time during Ph.D. studies you may not have time to take part in additional courses, trainings... that are dedicated to Ph.D. students.

What kind of job will be the best for Ph.D. student if any?

At the beginning of my Ph.D. studies I was working part time. It was a very good idea. I simply had more time for my research. So, if you need to work I strongly recommend you to consider the part time job. The problem is that not every employer will agree for that.

Of course a full time job can sometimes help you in doing Ph.D. i.e.:
  • when it is somehow related to your area of research
  • if the industry is paying you for doing research
  • if the industry is paying you for doing Ph.D.
In this situation you actually don't have 2 jobs but ideally only 1 job, or maybe 1.5 but still it's better than 2. Unfortunately, according to my experience it is difficult to find such a job.

Do you have tips for Ph.D. students who want to work in the industry?

Except what I've already written it'll be good to find a job with flexible working hours. Thanks to that you will be able to go to the university, meet with an advisor etc. without problems. Besides avoid overtime like the plague. It's another think that can kill your scientific work.

I also recommend to have a rule that every week we have to do something related to our Ph.D. It could be reading some articles, doing an experiment, implementing a tool... It's important to do that despite everything. Thanks to that you will constantly see some progress and you will not lose sight of the main objective i.e. Ph.D.

The last advice might be surprising because I did something different ;)

If you seriously think about the scientific career forget about working in the industry. The only exception is if a job in the industry is related to your scientific work.