The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

By Iris Shoor —  February 3, 2014 — 2 Comments

 

img1 01 The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

 

How to phrase a question in Stack Overflow in order to get better answers

Stack Overflow offers a great interface for accessing all of its data and running any possible query in the questions/answers database. We started using this database to better understand the most common questions about debugging (what we’re trying to solve at Takipi). While learning about debugging through thousands of questions, we also noted an additional bonus: we started better understanding what drives the Stack Overflow community.

This article is the result of researching over 10,000 Stack Overflow questions. It summarizes how you need to phrase and write your question in order to get better and faster answers. When we started running our scripts on the queries’ answers we weren’t sure we’d arrive at strong conclusions. After looking into a few hundred questions, though, we already started recognizing certain distinct patterns that we were seeing over and over again across different programming languages and topics.

You can find a detailed explanation of how we performed the test at the end of the article, but let’s get to the best part first — the results. The main criteria we set for a ‘good’ question were getting high quality answers (from users with high reputation + votes), getting an answer relatively quickly and the number of votes and views.

Keep it short

Probably the strongest pattern we saw across all the different queries and scripts we ran was this: the shorter the question, the more likely you are to get a quick and useful answer. The average length of a ‘good’ question was around 1200 characters (~3-4 short paragraphs), while the average length of a standard question was 1800 characters. The strongest effect of the length was on the view count and votes, where the average length of the top questions was around 50% of that of standard questions.

There’s no such thing as too short, either — really short questions (around 200-300 characters) received the best results.

Title length matters less, we discovered. While we saw the effect of the question’s length across the board, it seems like the length of the title has a minor influence on the quality of the question. The average title length of the top questions was about 5% shorter than that of standard titles (47 characters vs. 50).

01 length The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

 

#1 influencer – asker reputation

The asker’s reputation has a huge effect on the number and quality of the answers and the speed of reply. It seems that users with high reputations tend to answer more frequently to users with high reputations as well. While we saw that short questions are 50-100% more likely to get better and faster answers, a question asked by a user with a very high reputation is three times more likely to get better answers than a question asked by a user with a low reputation. You might conclude, of course, that users with high reputations ask better questions. It’s definitely true, but we saw some very similar questions asked by users with low and high reputations, and the difference in the quality of the answer was clear.

Some examples: the average asker reputation of the top 100 Java questions with the best answers (high reputation of the user who gave the answer + votes) was 4500 points. The average asker reputation of standard Java questions was 1500.
The average asker reputation of Ruby questions that were answered within 15 minutes was 2400. The average asker reputation of Ruby questions that were answered after 24 hours was 1300.
The average asker reputation of the 100 top viewed Java questions is 3150, compared to average standard question reputation of 1100.

02 highRep The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

Should I use code snippets?

Embedding code snippets in the question was one of the few parameters we tested that gave very clear results across different languages, but opposite results across different criteria. It seems like questions which include code snippets get better answers — answers with more votes and from users with higher reputation. For example, 87% of the Python related questions answered by users with a very high reputation included code snippets. 64% of the average Python questions include code snippets. Ruby-related questions showed similar results — 91% of the top questions included code snippets compared to an average of 79%.

When it comes to votes, views and time until the first answer, the trend is opposite — for example, 58% of the most viewed Python questions included code snippets. 72% of the questions with an average number of views included code snippets. It seems like it’s related to the question’s length — code snippets lead to longer questions which get lower results.

Use fewer tags

Not a huge difference, but it seems that across the different criteria we tested, the top results have fewer tags than the average. Questions which got high quality answers, quicker response and more votes had an average of 3 tags (seems to be the same in different languages). Standard questions had around 3.5 – 3.7 tags.

When is the best time to post a question?

From our data it doesn’t seem that the time of day affects the results. Stack Overflow “rush hour” is usually between 11 am and 5 pm UTC. About 50% of the questions are asked in that time frame. Questions asked in these hours were a bit more likely to get an answer faster (by about 5-10%) but the answers weren’t higher in quality. The time the question was posted doesn’t affect other criteria like votes, views or number of answers.

How to title your question – Getting into the art science of wording

Use the name of the language/ topic in the title

One of the main scripts we used was built in order to count and analyze the titles. We grouped words according to programming terms (like “string”, “array”, “function”), language/topic (like Ruby, MySQL, C#), negative/positive words (like “can’t”, “worst”, “best”, “fail”) and more. The most distinct conclusion — if you want to get faster and better answers, use the topic you’re asking about in the title. Have a question about something in Python? Just add Python to the title.

Some examples: 36% of Ruby-related questions that were answered within 15 minutes included the word “Ruby” in the title. Only 15% of the questions that were answered after 24 hours had the word “Ruby” in the title. 58% of the Java questions with the highest view count include the word “Java.” 39% of standard Java questions included the word “Java.”

And what about using two languages in the title? For example, “Why are String comparisons (CompareTo) faster in Java than in C#?”. It seems like using two different languages/ topics in the title might reduce the number and quality of the answers, as there are fewer users who are proficient in both. However, this kind of questions performed very well and were much more likely to get good results.

It doesn’t matter if you phrase the title as a question or not

Here’s one statistic that surprised me. Phrasing the title as a question doesn’t affect the speed or quality of the answers.

For example – “Add the same method to multiple classes” vs. “How to add the same method to multiple classes?” will gain the same results. The only difference we noted in the research was that titles phrased as questions are a bit more likely to get a faster answer (by about 10%).

Something doesn’t work for you? You can’t solve a problem? The community is here to help

A title that indicates that something is not working, or asking about an error, will usually get a better and faster response.

For example:

Rspec stub doesn’t work

Why is bundler unable to reach http://rubygems.org?

Can’t import my own modules in Python

It seems like using words that indicate failure (“cannot”, “unable”, “fails”, “error”, “not working”, etc.) lead to better answers. For example, 22% of the top Ruby questions included a negative phrase in the title, while the average for standard questions was 14%.

However, it seems that indicating that something doesn’t work or asking about an error doesn’t attract users with high reputations. These questions got faster results, more answers, and higher view count, but were answered by users with a less than average reputation.

Should I use X or Y?

Comparing different technologies or methodologies is a good recipe for high quality answers. This type of questions had a significant part in every ‘top questions’ list we formed.

Here are a few examples:

Why shouldn’t I use PyPy over CPython if PyPy is 6.3 times faster?

Why is char[] preferred over String for passwords?

Why are String comparisons (CompareTo) faster in Java than in C#?

Let’s make it interesting

Pretty obvious, but many of the top ranked questions point to mysterious behavior or unexpected results.

Here are some examples of top questions:

Why is processing a sorted array faster than an unsorted array? (6960 votes)

Why does changing 0.1f to 0 slow down performance by 10x? (774 votes)

Why does parseInt(1/0, 19) return 18? (632 votes)

How did we perform the test?

We decided to focus on language-oriented questions in order to avoid general questions (like “how do I become a better programmer” or “which job interview question to use”) or humor-related questions that usually get a very high score but don’t represent typical problems developers face. We focused on 6 programming languages – Java, Ruby, Python, C++, Javascript and C#. The first three languages we analyzed were Java, Ruby and Python — the results were very similar across these three, so we decided not to analyze the other three for now.

We decided to focus only on questions asked since 2011.

We ran eight different queries and sorted the questions according to: votes, number of answers, favorites count, time until the first answer, time of day the question was posted, questions answered by users with a high reputation, questions asked by users with high reputation, view count. Then we compared the top 300 questions in each section to 300 questions that received an average score on the parameter we focused on. This is obviously not an exact science, and there’s not a “scientific recipe” for getting great answers. However, all the patterns mentioned above repeated themselves across the different parameters and different languages we questioned.

Here are some of the queries we used: have fun with it and let us know if you find other interesting results.

Thanks to Dror Cohen from CodersClan for helping out and to John Woo who justified his 120k+ SO reputation points and helped us write the queries.

 Sample Queries

1. Questions by users with high reputation and Python tag

;WITH recordsList
AS
(
SELECT 'http://stackoverflow.com/users/' + CAST(p.OwnerUserId AS VARCHAR(10)) AS OwnerLink,
'http://stackoverflow.com/questions/' + CAST(p.ID AS VARCHAR(10)) AS QuestionLink,
p.Score,
p.ViewCount,
p.FavoriteCount,
p.Title,
LEN(p.Title) AS TitleLength,
LEN(p.Body) AS BodyLength,
p.Tags,
p.CreationDate,
u.Reputation,
ROW_NUMBER() OVER
(PARTITION BY p.OwnerUserId
ORDER BY p.Score DESC) UserAnswerSequence,
CASE WHEN CHARINDEX('<code>', p.Body) &gt; 0 THEN 'True' ELSE 'False' END ContainsCodeBlock
FROM Posts AS p
INNER JOIN Users As u
ON p.OwnerUserId = u.Id
INNER JOIN PostTags AS pt
ON pt.PostId = p.Id
WHERE p.PostTypeId = 1 -- &lt;&lt;== Questions
AND p.CommunityOwnedDate IS NULL -- &lt;&lt;== not WIKI
AND pt.TagId = 16 -- &lt;&lt;== PYTHON
)
SELECT TOP 5000
OwnerLink,
QuestionLink,
Score,
ViewCount,
FavoriteCount,
Title,
TitleLength,
BodyLength,
Tags,
CreationDate,
ContainsCodeBlock
FROM recordsList
WHERE UserAnswerSequence &lt;= 50
ORDER BY Reputation DESC, Score DESC

2. Answers by users with high reputation and Java tag

;WITH recordsList
AS
(
SELECT 'http://stackoverflow.com/users/' + CAST(p.OwnerUserId AS VARCHAR(10)) AS OwnerLink,
'http://stackoverflow.com/questions/' + CAST(p.ID AS VARCHAR(10)) AS QuestionLink,
p.Score,
p.ViewCount,
p.FavoriteCount,
p.Title,
LEN(p.Title) AS TitleLength,
LEN(p.Body) AS BodyLength,
p.Tags,
p.CreationDate,
u.Reputation,
ROW_NUMBER() OVER
(PARTITION BY pa.OwnerUserId
ORDER BY pa.Score DESC, p.Score DESC) UserAnswerSequence,
CASE WHEN pa.CommunityOwnedDate IS NOT NULL THEN 'TRUE' ELSE 'FALSE' END IsAnswerWiki
FROM Posts AS p
INNER JOIN Posts As pa
ON pa.PostTypeId = 2 -- <<== Answer
AND p.Id = pa.ParentID
AND p.OwnerUserId <> pa.OwnerUserId -- <<== not come on the same poster
AND p.AcceptedAnswerId = pa.Id
INNER JOIN Users As u
ON pa.OwnerUserId = u.Id
INNER JOIN PostTags AS pt
ON pt.PostId = p.Id
WHERE p.PostTypeId = 1 -- <<== Questions
AND p.CommunityOwnedDate IS NULL -- <<== not WIKI
AND pt.TagId = 17 -- <<== JAVA
AND p.OwnerUserId IS NOT NULL -- <<== user is not deleted
AND pa.OwnerUserId IS NOT NULL -- <<== user is not deleted
)
SELECT TOP 5000
OwnerLink,
QuestionLink,
Score,
ViewCount,
FavoriteCount,
Title,
TitleLength,
BodyLength,
Tags,
CreationDate,
IsAnswerWiki
FROM recordsList
WHERE UserAnswerSequence <= 50
ORDER BY Reputation DESC, Score DESC

3. Grouping questions by the hour they were published (0-23) and showing the time it took to answer them.

;WITH hourgenerator -- <<== generates 0-23 (24Hour)
AS
(
SELECT 0 AS hourPosted UNION ALL
SELECT hourPosted + 1
FROM hourgenerator
WHERE hourPosted < 23
),
questionsPerHour
AS
(
SELECT CAST(p.CreationDate as DATE) AS [Date],
DATEPART(Hour, p.CreationDate) AS hourPosted,
COUNT(*) NumberOfQuestions,
AVG(DATEDIFF(second, p.CreationDate, pa.CreationDate)) AvgTimeAnswered
FROM Posts AS p
INNER JOIN Posts As pa
ON pa.PostTypeId = 2 -- <<== Answer
AND p.Id = pa.ParentID
AND p.OwnerUserId <> pa.OwnerUserId -- <<== not come on the same poster
AND p.AcceptedAnswerId = pa.Id
WHERE p.PostTypeId = 1 -- <<== Questions
AND p.CommunityOwnedDate IS NULL -- <<== not WIKI
AND DATEDIFF(second, p.CreationDate, pa.CreationDate) >= 0
AND p.OwnerUserId IS NOT NULL -- <<== user is not deleted
AND pa.OwnerUserId IS NOT NULL -- <<== user is not deleted
AND p.CreationDate >= CAST('2013-11-14' AS DATE)
AND p.CreationDate < DATEADD (dd, 1 ,CAST('2013-11-14' AS DATE))
GROUP By CAST(p.CreationDate as DATE), DATEPART(Hour, p.CreationDate)
)
SELECT COALESCE(qph.[Date], CAST('2013-11-14' AS DATE)) AS [Date],
hg.hourPosted,
COALESCE(qph.NumberOfQuestions, 0) AS NumberOfQuestions,
COALESCE(qph.AvgTimeAnswered, 0) AS AvgTimeAnswered
FROM hourgenerator AS hg
LEFT JOIN questionsPerHour AS qph
ON hg.hourPosted = qph.hourPosted
ORDER BY hg.hourPosted

 

More from Takipi : 

main2 1 300x163 The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

We Analyzed 30,000 GitHub Projects – Here Are The Top 100 Libraries in Java, JS and Ruby – read more

duke 300x164 The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

The definitive list – Java debugging tools you need to know – read more

Duke T shirt 268x300 The Anatomy of a Great Stack Overflow Question (After Analyzing 10,000)

Java developers – Install Takipi now and get a free T-shirt

Iris Shoor

Posts Twitter

Iris is a co-founder and VP product at Takipi. Her main interests are creative products, data driven marketing and nitpicking small UX details.
  • http://grindaizer.blogspot.com Grindaizer

    this -> #1 influencer – asker reputation is a flawed analysis ! You’re confusing correlation and causation. And I personnaly think it’s the other way around. People get higher reputations because they tend to ask good quality questions !!!

    • IrisShoor

      Hi Grindaizer,

      There’s definitely a correlation, I agree that good ‘askers’ are likely to have more reputation. One thing we tested after I published the post was the quality of answers the same user got on different time periods and different reputation. It seems the tipping point is around 3000-4000 points where the same user get better answers. However, maybe they get better in asking throughout the time.