Michał Komorowski

Showing posts with label performance. Show all posts

18/05/2017

Report from the battlefield #11 - premature optimization is the root of all evil?

Title: A toucan in Warsaw Zoo, Source: own resources, Authors: Agnieszka and Michał Komorowscy

Have you ever heard that "premature optimization is the root of all evil"? Probably yes. It's quite well known Donald Knuth's phrase. However, the whole cite is much less known:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Why Am I writing about that? Because recently I had on occasion to fix an application which was written according to the fist part of this cite. Or even worse it was written according to the rule "any optimization is the root of all evil". Here are some examples what not to do and some tips what to do.

Report from the battlefield #5 - Logging can kill performance

Home

Public Domain, https://commons.wikimedia.org/w/index.php?curid=48390

Source: own resources, Authors: Agnieszka and Michał Komorowscy

So far in Report from the battlefield series I wrote about my experiences as an expert in the recruitment company. This time I'll write a bug that I found in the production. It was all about the performance. The problem was that in the new version of an application one operation slowed down about 6 times. Initially, I suspected that amount of data simply increased considerably or some network problems. Fortunately, I easily reproduced the issue on my dev machine. Reproducing a problem is half the battle. Though performance problem are usually difficult to analyse so I was ready for a long investigation.

I started stepping through the code with a debugger just to see what is going on. Everything seemed to be ok until... One of the final operations was to log into a file what was retrieved from a database. What's important the log level was set to Trace so even large amount of data shouldn't matter in the production. Why? Because in the production, precisely because of the performance reasons, the logger should be configured not to log everything to a file. In other words it should ignore messages usually with the log level = Trace or Debug. However, after I had pressed F10 (Step Over), I had to wait a few seconds till the logging ends. BINGO!

My first though was that someone configured the logger in the wrong way in the production. Typical PEBKAC problem. To verify my hypothesis I changed the configuration of the logger and executed the problematic operation. Unfortunately, the problem occurred again. Another look at the code and I know what was wrong. And do you already know?

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

The problem was that for large amount of data the application required a few seconds just to create/prepare a message for the logger. To make thing worse this message was created, regardless if it was later used by the logger or not. During development it may acceptable but not in the production! There are 2 potential solutions of this problem. Details depends on the logging framework:

The first approach is to simply check the logging level before creating/preparing a message e.g.:
```
if(Logger.LogLevel == LogLevel.Trace) 
{
    /* Prepare and log a message */
}
```
The second approach is to use deferred execution for example lambdas e.g.:
```
Logger.Trace(() => /* Prepare a message */).
```
If a logger supports this syntax, a lambda will be executed if and only if it is required.

23/04/2015

How to build predicates dynamically using expression trees

Home

I'm working at the application which finds so called execution patterns in logs recorded by IntelliTrace historical debugger. An execution pattern is a sequence of methods calls that is widely used in the application and it is a kind of automatically generated documentation. The part of the algorithm is filtering of found patterns based on criteria like the length of a pattern or the number of different methods in a pattern.

At the beginning I used only 2 criteria so it was easy to handle all possible combinations of them i.e. use the first criterion, use the second criterion, use both and used none. Then I added 3rd criterion and I thought that for 3 criteria I still don't need a generic mechanism. However, shortly it turned out that I want to handle 5 criteria what gives 32 of possible combinations. This time I did it once and for all.

I decided to use expression trees to dynamically build an expression that verifies any combination of criteria. The code is quite simple. Firstly we need an enum for all criteria.

[Flags]
public enum Crieria : byte
{
    None = 0,
    CriterionOne = 1,
    CriterionTwo = 2,
    All = CriterionOne | CriterionTwo
}

We also need a class that will represent patterns.

public class Pattern
{
    public int FieldOne { get; set; }
    public int FieldTwo { get; set; }
}

Now we can write a code that will dynamically build needed expressions. I assumed that every criterion has a corresponding static method that knows how to check if a current pattern fulfils it or not. The final expression produced by CreateExpression method will be of the following form pattern => predicate1(pattern) && predicate2(pattern) && predicate3(pattern)....

public static class FilterBuilder
{
    public static Func<Pattern, bool> CreateExpression(Crieria filteringMode)
    {
        var param = Expression.Parameter(typeof(Pattern));

        var subExpressions = new List<MethodCallExpression>();

        if ((filteringMode & Crieria.CriterionOne) != 0)
            subExpressions.Add(Expression.Call(typeof(FilterBuilder), nameof(CriterionOnePredicate), null, param));

        if ((filteringMode & Crieria.CriterionTwo) != 0)
            subExpressions.Add(Expression.Call(typeof(FilterBuilder), nameof(CriterionTwoPredicate), null, param));

        //Other criteria...

        if (subExpressions.Count == 0)
            return p => true;

        Expression finalExpression = subExpressions[0];
        for (var i = 1; i < subExpressions.Count; ++i)
            finalExpression = Expression.And(finalExpression, subExpressions[i]);

        return Expression.Lambda<Func<Pattern, bool>>(finalExpression, param).Compile();
    }

    public static bool CriterionOnePredicate(Pattern p)
    {
        return p.FieldOne > 0;
    }

    public static bool CriterionTwoPredicate(Pattern p)
    {
        return p.FieldTwo < 0;
    }
}

The code can be made even more generic but I'll leave it as an exercise. When I finished this code I started to worry about performance. It is critical for me because my application needs to process large amount of patterns efficiently. I made the following simple test in which dynamically generated and static functions are executed 1 million times.

var iterations = 1000000;

var predicate = FilterBuilder.CreateExpression(Crieria.All);
MeasureIt<Pattern>((p) => predicate(p), new Pattern(), iterations);

predicate = FilterBuilder.CreateExpression(Crieria.CriterionOne);
MeasureIt<Pattern>((p) => predicate(p), new Pattern(), iterations);

MeasureIt<Pattern>((p) =>
{
   FilterBuilder.CriterionOnePredicate(p);
   FilterBuilder.CriterionTwoPredicate(p);
}, new Pattern(), iterations );

MeasureIt<Pattern>((p) => FilterBuilder.CriterionOnePredicate(p), new Pattern(), iterations);

In order to measure time of calculations I used MeasureIt method from my earlier post and I received the following results:

Total time: 54
Total time: 27
Total time: 18
Total time: 12

Dynamically generated predicates are 2-3 times slower than static ones. However, we are still talking here about dozens of milliseconds in order to make 1 million calls. For me it is acceptable.

Michał Komorowski

18/05/2017

Report from the battlefield #11 - premature optimization is the root of all evil?

24/07/2016

Report from the battlefield #5 - Logging can kill performance

23/04/2015

How to build predicates dynamically using expression trees

Author

Like Me

Contact Me

Subscribe Me

Disclaimer

COPYRIGHT © MICHAŁ KOMOROWSKI

Search this blog