Top 100 Most Popular Scala Libraries – Based on 10,000 GitHub Projects

By Tal Weiss —  December 26, 2013 — 7 Comments

scala_sitting on graph

As Scala developers working in a language and ecosystem that’s rapidly growing and evolving, we’re faced with a constant dilemma whenever we write new code – go with that hot new Scala framework that everyone’s talking about, or stick with a Java library we know and trust?

When we began building Takipi we wanted to know what are the most common frameworks developers use today, so we could better optimize it for them. Since a large part of Scala applications are commercial or closed-source in nature, it can be hard sometimes to tell the number of projects putting a library to use.

We decided to use a data based approach to get more insight into this by analyzing what Scala developers are actually using on the world’s largest open project repository – GitHub. With a wide variety of projects ranging from small to very large, GH provides us with an extensive data-set, one which is also highly up-to-date.

Much like with the results we saw in Java, there were some pretty big surprises. As both Java and Scala run on the JVM, it was interesting to notice similarities between the frameworks used, and also some stark differences. Overall 42 libraries appear in both the top 100 Java and Scala libraries, helping reaffirm the fact that Scala isn’t just a different language, but it also has its own universe of tools and libraries.

The Approach

To generate our dataset we queried 10,000 Scala projects, with a bias towards the ones most favorited by the community, as a strong qualifying indicator towards their relative importance.

We searched for dependencies in sbt and Maven which the vast majority of Scala projects on GH use to build their projects. For sbt we analyzed the build.sbt, project/Build.scala and any .scala files that extend them. For Maven projects we scanned the pom.xml dependencies file.

We then analyzed and grouped the results into categories. The results were interesting to say the least -

The Results

scala-by library

scala-by type
Click here to see the complete top 100 Scala libraries list.

TDD is big in Scala. JUnit, the classic Java testing framework, is the most popular library with 2513 projects using it. Scalatest comes in at a close second with 2197 entries. TestNG which is fairly popular in Java (ranked 14th in the Java top 100) isn’t in the top 100 libraries for Scala.

 SPECS2, the framework for writing software specifications is being used by 1331 projects.  SPECS V1 which was deprecated in early 2011 still has 312 projects using it.

A new generation of frameworks. Using Scala is not just about the language, but also about a new generation of frameworks. The Play Framework for building web apps is crushing it when it comes to Scala developers, with 18% of the projects using it. The Akka framework is also doing very well with 776 entries (ranked 9th). Lift, another well known framework for building Scala web applications, is only used by 124 projects, which came as something of a surprise to us.

Some frameworks originally built for Java, however, are seeing much greater use in Scala. The lightweight web server Jetty, which was used by 100 projects in Java, has 4.5X that amount of projects using it in Scala, with 447 entries (17th).

Where’s the Java old guard? This comes in contrast to some of Java’s most venerated languages and frameworks seeing considerable less use in Scala.

  • Spring for example, which places 15(!) libraries in the top 100 Java libraries, isn’t on the Scala top 100 board.

  • Apache commons is also seeing much reduced usage. commons-io and commons-lang, which are both in the top 10 Java libraries, are at #24 and #39 respectively in the Scala top 100.

  • Google’s Guava libraries, which are at #8 in the Java top 100, are also further down the Scala list, coming in at #24 with less than half of projects using it than in Java.


Logging
. Ceki’s SLF4J is leading the pack -

  • SLF4J and logback seems to be the de-facto logging solution for Scala and is being used in 16% and 14% of the projects respectively.

  • log4j, which has 891 projects entries in Java, sees less usage in Scala with only 332 project entries (3%).

  • commons-logging is behind the pack with 105 project entries – that’s less than a third of the number of projects using it in Java.

SQL. Big surprises on the Scala DB front -

  • H2 is the most common Sql DB with 552 projects using it – that’s more than 4X the usage we saw for it in Java.
  • MySql comes in with 387 entries, which is actually more than the 255 entries we saw with Java.

  • PostgreSQL is also up there on the board with 332 entries which is almost 3X more entries than the 121 in Java.

NoSql sees less traction than in Java. It’s also worthwhile noting that Hadoop, which is seeing a good amount of usage in Java, isn’t on the Scala top 100 board. The only NoSql DB on the Scala list is MongoDB with 97 entries.

Android. While Scala is very much a server-side language aimed at building scalable server applications, we still saw some presence for Android development with 82 projects using the sbt-android-plugin.

Surprised by some of the results? We know we were with some of them. Take a look at the full list of the top 100 Scala libraries on GitHub below, and let us know what you think in the comments section. We’d love to hear your thoughts and questions.

The full list – Top 100 libraries for Scala

 

More stuff from Takipi:

 

duke

The definitive list – Java debugging tools you need to know – read more

log monster

See the variables values behind every production log error – read more 

 

CI

CI – Know when your code slowed down after deploying a new version – read more

Tal Weiss

Posts Twitter

Tal is the CEO of Takipi. Tal has been designing scalable, real-time Java and C++ applications for the past 15 years. He still enjoys analyzing a good bug though, and instrumenting code. In his free time Tal plays Jazz drums.
  • igorrumiha

    Sorry to nitpick but regarding “Postgre SQL”, the name of the database is PostgreSQL. The abbreviated name, if you want to use it is Postgres. The wikipedia entry http://en.wikipedia.org/wiki/PostgreSQL#Product_name explains more.
    Other than that, interesting article! Thanks for sharing!

    • takipiblog

      Thanks for your comment!
      We’ve updated the post accordingly.

  • Bill Venners

    To get more accurate comparisons of Java and Scala library usage, you should probably ignore the _ part of Scala library artifact Ids, because Java libraries don’t have that. When you do that, ScalaTest comes out ahead of JUnit, for example: https://docs.google.com/spreadsheet/ccc?key=0AqmJIDPGe-_TdFA4TVh6NzdEZjRlVDNYZXpjRHk4enc&usp=sharing

  • Donald McLean

    Breaking Lift into component parts the way that you have produces an inaccurate portrait of Lift usage.

    • http://www.takipi.com/ Tal Weiss

      Hey Donald, thanks for the input. We usually try to go with displaying the libraries of a specific project using its actual design in terms of module separation. So for example, if you take sbt, you’ll see that it too is broken down into its actual components. Having said that, I’d love to know what in your mind would be a better / more accurate way of portraying the data?

      • Donald McLean

        Some projects have modules, but those modules are incidental – Scala, for example. I would not say that a project was using this Scala module or that Scala module, I would simply say that the project used Scala. Joda, on the other hand, is an example of a project where modules represent plugable components and the modules *should* be counted separately. Lift, like Scala, is the kind of project where you’re either using Lift, or you aren’t, and the specific modules from Lift that are used or not used is unimportant next to the larger fact that you are using Lift.

        • http://www.takipi.com/ Tal Weiss

          You raise a good point. When analyzing the data we saw 3 options – A. providing the data as is, letting the design decisions of the project’s owner guide the results. B. Grouping the entries based on our own reasoning. C. Mixing between the two – grouping in some and keeping things separated in others. While no approach is perfect (as shown by the Lift example), we went with option A, as we felt it represented the data in the most honest way, with minimum intervention or interpenetration from our side. Hope that makes sense.