Akka.NET Request for Contributors: Akka.Cluster
The next major milestone for Akka.NET is Akka.NET v1.1, the primary focus of which is a stable, production-ready release of the Akka.Cluster module.
We’ve made some solid progress towards that goal in some of our maintenance releases, but I came to the realization over the past couple of months that the way we’ve been working with our contributors isn’t the most effective way for the project to achieve these milestones.
In this post I’m going to invite you, a .NET developer who’s interested in distributed computing, to work with me directly on the guts of Akka.Cluster and take on a project that’s bigger than you or me. And it’ll be awesome - you’ll learn a lot and make a real difference.
But in meantime, let’s get some background for context.
What does Akka.Cluster do and why is it important?
Akka.Cluster is the foundation of Akka.NET’s high-availability toolchain; it’s what allows you to build elastic, distributed, fault-tolerant peer-to-peer networks of Akka.NET applications that don’t have any single point of failure or bottleneck.
My Petabridge co-founder Andrew explained it well in Akka.NET: Introduction to Clustered Applications w/ Akka.Cluster (21:32), embedded below:
TL;DR; Akka.Cluster is what makes it feasible for .NET developers to build applications that are capable of being highly available and stateful in a way that’s largely unheard of in the .NET ecosystem. It’s capable of truly amazing stuff - just try out the WebCrawler Akka.Cluster demo we wrote and watch what happens when you spin up multiple Crawler
nodes in the middle of a job.
All of the cool stuff depends on Akka.Cluster
Akka.Cluster is pretty cool in and of itself, but all of the really exciting modules all depend on it. For example:
- Akka.Cluster.Sharding - automatically persist application state to a durable store and maintain in-memory replicas of that state across the cluster; includes the ability to execute partition handoffs and other fun stuff.
- Akka.Cluster.Tools - gives developers the ability to extend the
EventBus
to work across the entire cluster, rather than just within oneActorSystem
; allows you to specify cluster “singletons” that when killed will be recreated elsewhere in the cluster; and also allows for theClusterClient
, which enables you to create read-only clients for Akka.Cluster clusters that don’t actually participate in the cluster themselves as members. - Akka.Cluster.Metrics - extend Akka.Cluster’s built-in
Gossip
mechanism to includeMetricsGossip
, information about the CPU and memory utilization of each node in the cluster, and allows you to load-balance work distribution across the cluster based on those metrics. Works great in combination with auto-scaling. - Akka.Cluster.DData - Distributed Data, an experimental module that makes it really easy to share data across Akka.Cluster nodes using Communicative Replicated Data Types (CRDTs).
- Akka.Streams - yeah, you can technically use it without Akka.Cluster, but for 80% real-world applications this capability is really meant to be used in conjunction with a distributed Akka.NET cluster.
Want those cool toys? Then let’s ship Akka.Cluster first.
The current state of Akka.Cluster
Akka.Cluster has existed in some form since September, 2014 - so it’s been around for about a year and users are running it under serious production workloads. The library has been mostly code complete for a while.
However, there’s one major problem with Akka.Cluster: me. I’ve been the bottleneck on Akka.Cluster for most of its lifespan.
Historically I’ve created the impression that I have it all under control with Akka.Cluster, but the truth is that it’s a project that’s bigger than any one person. Today, if I have to take time away from Akka.Cluster to work on projects for Petabridge, speak at conferences, or whatever, then all progress on Akka.Cluster comes to a halt. That sucks and it’s my fault.
So let’s create a possibility where lots of people understand how the guts of Akka.Cluster work and are motivated to improve upon it and share the knowledge of how it works with others. That’s what this is really all about.
I need your help
I have to come clean here - I’ve been reluctant to really ask for help on Akka.Cluster for a long time, because:
- I believed that I had the bandwidth to do it all myself (how hard could it be?) and
- I believed everyone else when they told me “I don’t know if I know enough to work on that.”
I don’t believe either of those anymore. First and foremost, Akka.Cluster is a module that’s too important to be the responsibility of just one contributor. If something happened to me then someone else would have to step up the take over that part of the project. Rather than have it come to that, I would rather coach some capable developers beginning this week on how that system works so others can always contribute and lead there.
Redundancy is just as important for people as it is for servers. Plus: I’m not perfect. I fuck stuff up all the time. And that’s ok!
As for the second point: Akka.Cluster is not hard, it’s just different - based on distributed programming concepts that are unfamiliar to most developers, but totally transformational once understood.
So here it is: if you’re interested in distributed systems, .NET, and open source then I would like you to become a contributor with me on Akka.Cluster. I’m still going to be focusing most of my contributions in that area, and I’d like to do it alongside a team of contributors.
What needs to be done
The burden of this module is its high quality assurance requirements. Writing multi-node specs; ensuring that the MultiNode TestKit’s behavior is correct; beating racy and inconsistent behavior out of the system; and thoroughly documenting the expected behavior of a cluster under lots of different scenarios.
Here’s what needs to be done to complete our work on Akka.NET v1.1:
- Port the remaining Akka.Cluster multi-node specs to .NET - this is easier than it looks. Start by reading how the Akka.NET MultiNode Testkit (
Akka.Remote.TestKit
) works. - Fix the current Akka.Cluster multi-node specs to eliminate racy or inconsistent behavior - you can see a list of issues we’re having with current multi-node specs here.
- Fix any bugs in Akka.Cluster you find in the course of working on a multi-node spec. Most of them are subtle and depend on certain network partitions occurring in different states of the cluster. Finding these bugs, of course, is why we go through the trouble of writing multi-node specs for them.
- Fix the Akka.Remote.TestKit - we have at least one known issue with deadlocks inside the Akka.Remote.TestKit; I’m sure there are others that are yet to be documented. We need to track these down and resolve them.
- Contribute documentation to getakka.net on Akka.Cluster and Akka.Remote - we have a backlog of areas where we’re still working on improving our documentation, but these two modules are some of the most highly requested areas for docs.
- Port Multi-Node Specs for Akka.Remote - in case you didn’t get enough
MultiNodeSpec
excitement from just Akka.Cluster, we also have an entire battery of equivalent tests we need to port for Akka.Remote too.
How to get involved
The first thing to do if you want to get involved is to fill out this interest form below!
We’re asking people to fill this out so I can do the following to help you help us:
- Schedule regular calls where we can coordinate on problems and work;
- Train you on the internals and distributed programming concepts that power Akka.NET; and
- Teach you how other parts of the Akka.NET development process work, such as our build system and the multi-node test runner.
I’m committed to doing that for you - and, frankly, pretty excited about it. I think this will be something huge.
Once you’ve filled this out, check out the outstanding issues for Akka.NET v1.1 on our waffle (in the “For Next Release” tab) and hop into the Akka.NET Gitter chat and introduce yourself if you haven’t already! Looking forward to working with you!