We Analyzed 30,000 GitHub Projects – Here Are The Top 100 Libraries in Java, JS and Ruby

By Tal Weiss —  November 20, 2013 — 19 Comments

main2

 

Java Developers: Takipi tells you when new code breaks in production –  Learn more

One of the biggest dilemmas developers face every day is which software libraries to use. Go with the hot new framework or the “boring” tried-and-tested one that’s been around for 10 years? One of the main things that make frameworks successful is their communities of users and contributors. While it can be easy to know how many people contribute to a project (especially if it’s open source), it’s pretty hard to know how many are actually using it. We decided to take a data-driven approach to answer these questions.

GitHub hosts more than a million projects today. Projects range from small utilities and test apps all the way to massive infrastructure projects with hundreds of contributors. As such, it provides a fairly diverse and up-to-date dataset to explore, one which is also indicative of the trends in closed-source and enterprise software.

We chose the 3 top languages on GitHub – Java, Ruby and JavaScript. For each one we analyzed 10,000 projects (i.e. GitHub repositories) leaning towards those that have been favorited the most by developers.

We analyzed what are the top 100 commonly used components, grouping them into categories (e.g. Testing, DB , UI, etc..). It’s pretty interesting to see how these differ between the different Languages.

Here are a some notable findings and the top 10 libraries for each language (you can find the full list at the bottom of this post):

Java

Click Here to get the full Java report

Java

  •  It’s Guava season – Google code has gone mainstream. Spring and Apache libraries are so prevalent they’re practically a part of the language, with over 25% of the top 100 libraries split fairly evenly between the two. Something a bit surprising is the prevalence of Google made libraries, such as GWT and Guava, in Java, with 7% of the top 100. Seems like there’s one more area in our life which Google has a big part in.
  • BigData – Hadoop is leading the chart. Data processing is a big part of Java with 16 of top 100 libraries focusing on database management, compared to 12 in Ruby and 5 in JavaScript (admittedly still a much more client side language).
  • It’s interesting to see that Hadoop is living up to its promise as the leading big data technology with 168 entries. To put in perspective, MySql, one of the most well-known and common SQL DBs, has 225 entries. Postgre SQL, another well-known relational DB, has 121.
  • ElasticSearch, a new technology for searching across large data sets, is also doing quite well on GitHub with over a 100 projects using it.
  • Test driven development (TDD) is huge in Java and Ruby (still not in JS) - across all three languages we see testing play a very big role. In Java and Ruby, 40-50% of projects reviewed are using an automated testing framework. The leading ones being JUnit in Java and RSpec in Ruby. In JavaScript’s percentage of projects using a testing framework is considerably lower, coming in at 25%.
  • Mocking, a method for simulating real world objects in testing and development, has gained a lot of traction with 10% of the projects in Java and 7% in Ruby applying it. In JavaScript mocking is still almost nonexistent.

 

Ruby

Ruby

  • SQL still dominates. While NoSQL databases are all the rage these days, relational databases (SQL) still dominate the Ruby world – Sqlite, postgreSQL, MySql are used in 25% of the projects, while Redis and mongo only appear in 3% of the projects.
  • MongoDB is however still popular in Ruby with 185 entries, which is twice as much projects than in Java.
  • In web development we see that while new frameworks have gained traction in the last few years (such as Sinatra with 570 entries), Ruby is still centered around Rails, with over 7,000 projects. For web servers, Thin (with 487 entries) is used by twice as many projects compared to Unicorn.
  • CoffeeScript, a new language layer on top of JavaScript seems to be well received by Ruby web developers with over a 1000 projects.
  • Twitter has also made a big impact in Ruby with 3 libraries in the top 100 and 382 projects using them. While that’s pretty big, it’s still not not quite as big as Google’s influence on Java.

JavaScript

Javascript

  • JS is fragmented. The top components’ reach in Java is 30% of projects. For Ruby it’s about 20%. For JS it’s not even 10%. As JavaScript is rapidly evolving to support more types of applications, a lot of new capabilities have not yet been absorbed into the language or standard libraries. As a result we see 50% more frameworks used in JavaScript than in Ruby and Java in the top 100, echoing that fact it’s still early days for the language.
  • Grunt is huge. The Grunt automation framework plays a very big role in JS development (especially for node.js) with 23% of of top 100 libraries plugging-in to it. Grunt seems to be filling the gap in the build, testing and deployment cycle in JS. This is handled externally from the project in languages such as Java by other prominent tools such as Maven or Jenkins.
  • Networking is still a big problem. A large part of JavaScript libraries (7% of the top 100) focus on networking and client/server communication. That’s 3X times more than in Java and Ruby. This is most likely due to web developers having to deal with a fragmented ecosystem on the browser side, and the relative early state of the server stack.
  • For server-side web development – the express framework for node.js is leading the chart with 631 entries.
  • Striving toward structure. JavaScript also features the largest number of language extensions with 844 entries. It’s interesting to see that while JavaScript is a very flexible language, developers are looking towards ways to mold it into something more structured. Underscore.js, which provides functional programming capabilities similar to those found in more structured languages such as Scala has 416 entries, making it the 5th most prevalent JS library.

Click here to see the complete top 100 libraries list.

 This post is also available in German, Spanish, French and Portuguese

More stuff from Takipi : 

gitHubJavaMainSml (1)

 

GitHub’s 10,000 most Popular Java Projects – Here are The Top Libraries They Use – read more

sync

5 things you didn’t know about Synchronization in Java and Scala – read more

 

 

log monster

How to add links to your log files with variable values at the moment an error occurred – read more 

 

CI

CI – How to spot if your code slowed down after deploying a new version – read more 

Tal Weiss

Posts Twitter

Tal is the CEO of Takipi. Tal has been designing scalable, real-time Java and C++ applications for the past 15 years. He still enjoys analyzing a good bug though, and instrumenting code. In his free time Tal plays Jazz drums.
  • Luis Montes

    “Networking is still a big problem.”
    Have to disagree with that. There’s a ton of ways to do networking things in JS, and of course a ton of libraries to do it. Some good, some not so good.
    Anything you can do in other languages with networking, you can likely do in JS in a few different ways.

    Choice is good.

    • http://www.takipi.com/ Tal Weiss

      You’re absolutely right that choice is good.

      The trend that we’ve seen is that in more mature languages such as Java (and to an extent Ruby), over time fundamental building blocks such as networking consolidate either into a set of standard libraries or the framework itself. That hasn’t happened yet in JS (and might not at all), indicating that it’s a problem a lot folks are dealing with and therefore are working on.

    • http://www.takipi.com/ Tal Weiss

      I agree that choice is absolutely a good thing.

      The trend that we’ve seen is that in more mature languages such as Java (and Ruby to an extent) fundamental building blocks such as networking usually either consolidate into a set of standard libraries or get absorbed into the framework itself. That hasn’t happen in JS (and might not at all), indicating that it’s a problem a lot of people are dealing with now and as a result are actively working on.

      • Luis Montes

        I still think that premise is incorrect.

        The patterns change over time, and JS can easily do things on the bleeding edge.
        Sure, you can grab a JS library to do something like a boring a synchronous SOAP call that people have done for years in Java or ruby.

        The reason why this is different now is that we have things like NodeJS, WebSockets, SPDY protocol, WebRTC, etc. All of which lend themselves to event driven asynchronous programming.

        We can still do the traditional common types of networking, and there’s plenty of JS libraries for that. But we can also very easily do something like Socket.IO with JSON-RPC instead of AJAX/REST. Or have a JS library for HTML5 games that makes a peer to peer connection using RTC.

        Add to this the fact that many JS networking libraries exists because people want to implement new tech and have different preferences on things like module formats, callbacks vs Promises/A+ spec, testing models, etc.

        New things will make it into the language (lots coming in ES6), but the JS community will never settle on a set of enterprisey patterns pushed by companies the way other tech stacks have.

    • Gerald Leenerts

      Choice can be good, but it can be very very bad. In JS terms I think it’s more negative than good. I don’t want 30 different ways to do something, I want a stable 3~5 choices with a good standing rep behind all of them.

      • Luis Montes

        I want 30 stable ways of doing something each with a good standing rep, active maintainers, and 100% test coverage :)

        • Gerald Leenerts

          I can’t disagree with that.

  • Steve Krzysiak

    Interesting, any plans to release the code so other can adapt for other topics like PHP/Python/arbitrary-tech?

    • takipiblog

      That’s a good Idea. We’ll consider releasing it.

  • Rohit

    “Test driven development (TDD) is huge in Java and Ruby (still not in JS)” If i am not mistaken, mocha was the second on the JS list, doesn’t that make TDD “huge” in JS as well…?

    • terinjokes

      It raw numbers, it is less than in Java and Ruby.

    • http://wilmoore.com/ Wil Moore III

      Writing unit or other automated testing does not necessarily equate to writing code in a TDD manner; thus, it is hard to generalize.

      That being said, from what I’ve experienced, the Ruby community tends to be less grumpy about doing actual TDD. Other language communities seem to be starting to adopt unit testing, but TDD not as much.

  • buildakicker

    Funny that most of the top Java ones are Logging.

    • AlexandreJasmin

      I’m surprised Joda-Time isn’t also up there.

  • http://aaron.md/ Aaron

    What was the definition of components used? It seems in JS the definition of component doesn’t really fit with the language: Grunt and Underscore.js are much higher relative to jQuery than I’d expect

    • http://ejohn.org/ John Resig

      I had the same question – I suspect they’re only analyzing Node.js projects. In which case it would make sense that jQuery isn’t that prevalent.

    • http://www.takipi.com/ Tal Weiss

      We analyzed projects leaning towards those that have been favorited the most by developers. In the case of JavaScript, those were mainly Server-side projects, so Grunt was ranked higher.

  • Thom

    ‘async’ is not an http framework for node, it’s a ‘util’. In fact, async has nothing whatsoever to do with http. So take 332 from the ‘http’ bucket and add it to the util bucket. This puts a dent in the “networking is an issue” argument for node.

  • terinjokes

    Why is “jasmine-node” classified as “web framework” and “grunt-jasmine” as “build”? “mocha” and “grunt-mocha” are both classified as “testing”.

    Why is “async” counted as an http library? It’s a flow control library (I guess “async” in your chart), and has nothing to do with http.