MongoDB vs. SQL Server 2008: A .NET Developer’s Perspective
One of the first projects I put together this year was Captain Obvious, a nifty little application that runs off of AppHarbor and ASP.NET MVC3. What made Captain Obvious special for me was that it was my first time using something other than a relational database[footnote: typically I’ve only used SQL Server / MySQL in the past] in production – I chose MongoDB because it stands out to me as a lightweight, easy-to-work with store that’s easier to use for most CRUD applications. Since then I’ve gone on to build other projects which depend on Mongo.
What I’ve learned since is that MongoDB and SQL Server are tools that aren’t 100% interchangeable and are more situational than dogmatists make them out to be.
My goal in writing this is to help inform you on how you should decide to judge these two technologies as options for your ASP.NET / WebMatrix / WCF applications.
Relational Data Models vs. Document Models
The key to using Mongo or SQL Server effectively is understanding how the underlying data model works and how this impacts your ability to read / write what you want when you want to the datastore. Below are illustrations of the relational (SQL) model versus the document model.
In a relational model all of your information is expressed in the form of tables, all of which contain keys and some of which contain foreign keys to other tables. All of the information you read from and write to the database is expressed as either adding rows to these tables or combining their values based on some keys.
In a document model you have a relatively flat collection of items all identified by one primary key[footnote: You can add indices to other fields in Mongo too, but that’s outside the scope of this article], and instead of defining a relationship between two collections you simply embed one inside the other. So the relationship between the three tables in our relational model is expressed as a single, flat document in this model.
It is important to note here that there’s no schema that tells MongoDB what fields should or shouldn’t be expected in each document in the “Things” collection – in the relational universe each table is strongly, declaratively typed and every single item inserted into the row must conform to all of the constraints imposed by the relational database management system (RDBMS.) Anything goes in Mongo (although this can cause major problems, as we will see later.)
What Do Relational and Document Models Have in Common?
So what do these two data models have in common?
- Both support the notion of primary keys and indexes, and MongoDB can support multiple indices if needed;
- Both support queries and have models for sorting / limiting results;
- Both support the ability to reference other documents / tables (“wha? in Mongo? Yup.”)
- Both have a strong typing system; and
- Both support aggregation operations like SUM(), COUNT(), etc…
Seems pretty straightforward, right? But what’s up with documents being able to refer to each other?
As it turns out, implementing a database where every possible piece of information of interest to users is all embedded inside of its own distinct document comes with some drawbacks, so the creators of MongoDb added support for DbReference and the ability to do cross-collection references as well. A necessary evil in the name of practicality.
What’s Different between Relational and Document Models?
So what’s really different between the two models?
- Document models don’t have SQL – their query tools are extremely primitive in contrast (but the models are also much simpler and don't require sophisticated queries;)
- Fetching document references in the document model has to be done inside of separate queries[footnote: unless I am doing it wrong, which I may well be doing] whereas they can be done all in the same transaction in the relational model;
- Document models don’t have a schema – each document in a collection can have extra fields or fields with different type values, whereas all rows must conform to the same set of constraints in order to be inserted;
- In the document model types are associated with data upon assignment, rather than declared in advance;
- Document models have much more primitive tools for performing aggregations (edit: actually, not necessarily true for Mongo - it has built-in MapReduce which is powerful and sophisticated;) and
- Queries are defined on a per-collection basis in the document model, whereas a query in the relational model is an abstraction that simply refers to any number of related tables.
The picture that should be starting to form in your head of MongoDb vs. SQL Server at this point is a flat, simple system on one side and a multi-dimensional, rich system on the other. This is how I look at the two technologies in contrast to each other.
What’s Different about Developing .NET Apps Against Mongo and SQL Server 2008?
So at the end of the day, what's the bottom line for .NET developers who want to use Mongo or SQL Server in a web application? What are the REAL trade-offs?
Mongo IS the OR/M
The core strength of Mongo is in its document-view of data, and naturally this can be extended to a "POCO" view of data. Mongo clients like the NoRM Project in .NET will seem astonishingly similar to experienced Fluent NHibernate users, and this is no accident - your POCO data models are simply serialized to BSON and saved in Mongo 1:1. No mappings required.
I'm going to show you two similar pieces of source code I am using in production - one using NoRM and MongoDb on CaptainObvious and the other using Dapper and SQL Azure on XAPFest:
NoRM:
public IdeaResultSet GetIdeasByAuthor(int authorID, int offset = 0, int count = 10) { using (var db = Mongo.Create(ConnectionString)) { var ideas = db.GetCollection().AsQueryable().Where(x => x.AuthorReference.Id == authorID) .Skip(offset) .OrderByDescending( x => x.DatePosted).Take(count).ToList(); var totalIdeas = db.GetCollection().AsQueryable().Where(x => x.AuthorReference.Id == authorID).Count(); //Fetch the referenced authors before we serve up the list again foreach (var idea in ideas) { idea.Author = idea.AuthorReference.Fetch(() => db); } var resultSet = new IdeaResultSet { Ideas = ideas, PaginationValues = new PaginationTuple { MaxPages = PageCounterHelper.GetPageCount(totalIdeas, count), CurrentPage = PageCounterHelper.GetPageCount(offset, count) } }; return resultSet; } }
Dapper:
public IList<XfstApp> GetAppsByUser(string userName) { using (var conn = new SqlConnection(ConnectionString)) { try { conn.Open(); var appResults = conn .Query(@"SELECT * FROM Apps INNER JOIN AppOwners ON AppOwners.AppName = Apps.AppName WHERE LOWER(AppOwners.UserName) = LOWER(@UserName) ORDER BY Apps.DateCreated ASC", new { UserName = userName }); //Return whatever we were able to collect return appResults.Select(x => x.ToApp()).ToList(); } catch (SqlException ex) { TraceError(ex); return new List<XfstApp>(); } finally { //Close the connection when we're finished, regardless of what happened conn.Close(); } } }
The amount of source code for these two technologies or the nature of it isn't wholly different.... What is drastically different is how I was thinking about the data when I was writing this code - I can save an instance of an Idea object to MongoDb on CaptainObvious without ever having created the collection first or defined a schema.
Whenever I want to look up an idea, I just pick one based off of a key value that I specify and I don't worry about any joins or anything (although I do have to load objects from the author collection if I need to display the author's name and contact info.)
In the SQL universe, I have to define my tables in advance and each time I want to extract an object, I have to think of it in terms of combined relationships between my tables and this requires a more thoughtful approach.
Mongo, in other words, lends itself to rapid application development whereas SQL Server has some innate friction built into any schema-based system.
In Mongo, the DBMS Isn't There to Protect Your Data's Integrity
One of the major advantages of a schema-based DBMS is that if the data a calling application tries to insert something that doesn't fit the schema into a row, the operation always fails. In Mongo, this isn't true - you can have one record in a collection with extra fields or fields of an odd type, and it can totally screw up the BSON serializer when it tries to process the collection (depending upon how flexible the serializer is.)
SQL users take this for granted, but when you have issues in Mongo along these lines they can be really frustrating to solve and difficult to debug.
In Mongo, Operations May Not Be Atomic
Operations are not atomic in Mongo by default, so all sorts of fun things can happen when you have multiple users changing properties on the same document. You can set an atomic flag to true, but even then operations still aren't really atomic (they're written from memory to disc in bulk.)
If you use Mongo and carry the same ACID assumptions that we learned on SQL Server, you might be in for a nasty surprise :p
Conclusion
Overall, the biggest difference between these two technologies is the model and how developers have to think about their data. Mongo is better suited to rapid application development, but in my opinion falls apart in scenarios where ACID-compliant systems are a must, like anything that goes anywhere near a financial transaction.
But, that's just my opinion :p