A quick recap – we’ve written a small application to analyze how C# developers are using LINQ. if you haven’t read the first part and want to learn about how we did it go to the previous blog post: Analyzing GitHub LINQ usage – Introducing LinqAnalyzer.
But before we begin let’s discuss what exactly happened once we run LinqAnalyzer.
Lies, damned lies, and statistics
After running LinqAnalyzer for a few hours we got very interesting results.
We’ve also found a few bugs along the way – which solved, you can see how in the webinar: Debugging Comple Code.
Analyzing many open source projects does have its share of challenges – we’ve discovered that not all projects marked as “C#” did not parse well and that we could not create semantic models for them. AT the moment we’ve decided to leave those project out of our analysis and they were not included in the final results.
We did manage to gather 200 projects which seems enough in order to determine how C# developers are using LINQ. Among the projects we’ve sampled you can see some of the leading open source projects in our world -from many disciplines:
- Caliburn.Micro and Prism from the MVVM world
- FakeItEasy, nSubtitute and FluentAssertions of unit testing galore
- SignalR, Nancy, AutoMapper, Newsoft.Json, ReactiveUI – and more
I’ve exported the results to an excel file which you can download and analyze yourself.
Which flavor of LINQ developers use?
The first question we needed answered was how many C# projects use LINQ. From the projects we’ve checked it seemed that most indeed do:
Out of 200 projects less than 10% (19) do not use LINQ and out of the rest most use both Fluent and Query APIs.
Upon seeing those results we immediately understood that we need to support debugging of both LINQ flavours…
When we’ve set to add LINQ debugging capabilities to OzCode we thought that most developers prefer to use the Fluent/extension method based calls, and while we were right at least according to the results above – even more developers preferred to use both – in some cases even mixed one inside the other.
Lesson learnt – when you have a theory about your users needs it’s easier to perform an experiment and make sure you’re on the right track. This method is preferable to the usual method of arguing till you’re blue in the face.
Deep Diving into query usage
It was interesting to check the 9 projects who only used the query syntax and see what made them use that
|Repository name||Lines of code||LINQ calls||LOC per LINQ call||Operators used|
I expected to see a lot of let operators and multiple form calls – which is where the SQL like syntax shines but was amazed to find out this is not the case.
In fact looking at this table I can see that the bottom two (maybe three) repositories used simple from..where…select which I always found to be more readable using method calls.
It seems that some people prefer the query API and use extensively. I know I’m the other way around, I prefer to use method calls as oppose to the from x in y but I guess it’s mostly a matter of personal taste.
Most used LINQ operators
Now that I had populated the database I was able to run a simple query to count and sort the operators and find out which operators were used the most – out of 87,615 LINQ operators:
The mostly used operators were:
I’ve marked the query syntax operators in orange and there’s not surprises there – most used operators are from (which is kind of mandatory) followed by the classics – select and where.
By now we know enough about how developers prefer to use LINQ and so the fact that most of the operator on this list are from the fluent variety does not shock us. Just like in the query syntax the first 5 places have the basic Select (1st) and Where (3rd) and we also see ToList/ToArray – which feels like a bit of cheating since they’re not necessarily used as part of a “normal” LINQ query.
I find the fact that Single is more used than SingleOrDefault a good sign since it means that the code using it is not riddled by endless null checks although I’m left wondering why FirstOrDefault comes before First – although ther are pretty close.
Other than that I think anyone who ever used LINQ would find the results aligns with his/her experience.
Least used LINQ operators
Another interesting data we were after is which operators were least frequently used and we got the following:
Note that we have a minor “feature” – group and by are shown as two different operators, in a way they are (at least implementation-wise).
Other than that we can see the “order by” operators – Decending/ThenByDecending and ascending are not that common. Another point of interest is that the fluent Join/GroupJoin is least used – we’ve noticed that join (query API) was only used 383 – which means that developers do not use LINQ to join data that much – I guess they prefer to hold the data in the way easiest to consume – and use Where/Select instead. I get it, in code we can (and IMHO should) use pointers instead of trying to normalize data as if it’s saved in a rational database.
Looking at the list above you’ll notice that TakeWhile and SkipWhile came in the last 10 operators on the other hand Take (625) and Skip (655) are more popular – I guess there are more scenarios in which the simple form is easier to use and/or more readable.
Here at OzCode we’ve learnt quite a lot from this research and we promise the keep developing LINQ debugging according to the community needs.
We’ve learnt that we need to support the less used LINQ query syntax – especially for developers who write both LINQ syntax mixed one inside the other.
We’ve learnt which operators are more popular – and which are the “bread and butter” of C# developers.
All in all a good days work.