Scalability Lessons we can Learn from Voat

Voat - have your say

Last weekend I found voat.co on the /r/dotnet subreddit, one of the places I frequent for news and happenings related to all things C# / F#.

Voat exists simultaneously as two different things:

  1. The Voat software - an open source, ASP.NET MVC + MSSQL implementation of Reddit AND
  2. voat.co - an instance of the software that is owned and operated by its creators.

I’m a huge fan of Reddit - had an account there for years, and seeing such a kick-ass implementation of it written in C# with SignalR, ASP.NET MVC, Web API, and lots of other cool toys gets me really excited!

Voat could be a huge opportunity to expose lots of curious onlookers to .NET and C#, and it will be an even better opportunity to get C# / ASP.NET developers into open source if the voat.co website successfully establishes a community.

The authors themselves explicitly state in the README of the Voat project on Github that they used Voat as an opportunity to just for the sake of learning:

This was just a hobby project to help me get a better understanding of C# and ASP.NET MVC and Entity Framework.

So helping broaden the reach of .NET in OSS is my primary interest in getting excited about Voat, and last weekend I spent a bunch of time reviewing the Voat source code and looking for potential areas where I could contribute.

Voat’s Gradually Increasing Popularity

The Voat software has been online for about a year, but it’s been seeing a steady increase in activity over the past several months.

Voat’s founders gave me some interesting usage and performance statistics in this thread I asked on /v/voatdev, but unfortunately due to the load the site is currently under I can’t retrieve them for you at the time of writing this.

The point being: Voat.co’s usage has been increasing rapidly over the past few months, apparently as a result of dissatisfied users moving on from Reddit (based on my own limited observations.)

It became clear to me that Voat.co was going to have to start dealing with scaling issues sooner rather than later - I didn’t think it would take much for Reddit to stir the pot and cause a mass exodus onto Voat given the state of the political climate at Reddit lately.

Having been in a situation where sudden onset traffic nearly crushed my business before, I immediately started looking through the Voat source code for anything that would indicate obvious scaling problems or bottlenecks.

Here’s what I found:

  1. Everything uses the built-in CRUD tools from Microsoft: Entity Framework, ASP.NET Identity, and so on. All of those built-in tables are baked into the Voat SQL schemas as well.
  2. No SignalR Backplane: means that this software is currently only designed to run on a single box.
  3. SignalR on every page: page loads are going to get expensive, given that SignalR websocket calls will be used on every tab.

I suspected that these might be sources of problems if scalability became an issue.

EF, ASP.NET Identity, and Strong Coupling

The built-in Microsoft CRUD and ASP.NET Identity tools are designed for rapid application development, and they tightly couple your user identity schema to SQL Server.

The real danger I see in this is that when a write-heavy workload arrives, such as counting votes / comments / posts from thousands of concurrent users, SQL Server is going to implode and it’s going to require a major rewrite to decouple the comment / voting store from the SQL store without some abstraction between the two.

Fact of life: SQL Server and other relational DB stores are not intended for applications with high write / read ratios. They’re heavily optimized for read-heavy workloads.

SignalR

As for the SignalR issues, I was able to confirm that Voat is running on a single box. Not a big deal given that it’s still a hobby project for the developers, but the fact that it’s not designed to run on anything other than a single box is an obvious problem. Either remove SignalR or support a backplane for it.

But beyond that, SignalR is much more resource-intensive than serving up a single HTTP request. It keeps an open connection open for each tab for as long as the tab is active. If you’re serving up a large number of page views, this will get expensive in terms of memory utilization.

“We have bigger problems to worry about than scalability”

I started asking questions about the current performance numbers for Voat.co, the network architecture, and so forth - mostly because .NET scalability is an area I know a lot about and love helping with! But I was more or less told that the voat.co team wasn’t worried about it at the moment. Wouldn’t be an issue.

“Ok, let me know how I can help” is how I left it.

72 Hours Later: Mass Exodus from Reddit Takes Down Voat.co’s Servers

As fate would have it, Reddit’s administrators banned a number of offensive subreddits with a large number of users a few days after I had this conversation with Voat’s developers. Hundreds of thousands of regular users.

I don’t really care about the politics of why that happened, but the point is - Voat.co was hit with an avalanche of traffic that they were not prepared for. Thus, voat’s homepage now looks like this:

Voat - HTTP 503

Scalability problems are great problems to have, because this means that the demand for your service is beyond your ability to provide it. But, as I’ve learned, they are also immensely scary and frustrating to deal with if you are not prepared.

What can we learn?

Voat.co is missing out on a fantastic opportunity to capture a ton of traffic, new users, and advertising revenue right now. It may have started out as a hobby project for the original developers, but now the platform has really taken a life of its own.

So what we can learn from this? What can we learn as .NET developers about Voat?

  1. Don’t prematurely optimize your code, but plan for scalability - what does this mean exactly? For starters, don’t tightly couple your application to the implementation details of a particular database. I’m extremely skeptical of anyone who says “YAGNI” to this. Here’s my personal foray into the hell known as “realizing you’ve picked the wrong database at exactly the wrong time.” Design your system in a way that the components you launched with can be replaced by the right tool for the job later under different levels of stress.
  2. Don’t use anything you can’t scale beyond one machine - if your system can only run on one machine, you’re screwed. You have no redundancy or resiliency against most types of network failures. Design your system to have more hardware thrown at it from day one.
  3. Use a monitoring / alerting service from day one - Pingdom has a free trial and starts at $15 / month. Don’t be stuck outside golfing the day your server gets engulfed on precious, precious traffic. Make sure your servers can let you know if there’s a problem!
  4. You never know when traffic will strike; plan for it happening any time. Have a contingency plan to scale your service at any given time. Don’t assume that it’s going to happen gradually. At MarkedUp we experienced 600% growth for 3 consecutive days without any warning, and we didn’t have a plan. This happened right around Thanksgiving here in the US, so my holidays that year were pretty stressful to say the least.
  5. Measure your bottlenecks carefully, and if you’re an OSS project - share that data with contributors! - I can’t really do more than just make educated guesses as to where Voat’s bottlenecks are, because I don’t have any data to quantify the problems! So make sure you’re actively monitoring CPU utilization and profiling your code - run traces on your SQL queries and try to tune up slow-running indexes and so forth. Solving scaling problems requires good data.

I’m excited for what the future has in store for Voat.co and the OSS project behind it!

And if you want to get involved in Voat’s OSS efforts, check out the Voat project on Github!

Discussion, links, and tweets

I'm the CTO and founder of Petabridge, where I'm making distributed programming for .NET developers easy by working on Akka.NET, Phobos, and more..