Frustration Central: Parsing the DateTime Values from the SlideShare REST API
I feel a little bad about posting this given that Jon Boutelle, the CTO of SlideShare, already admitted that this portion of the SlideShare 2.0 REST API sucks and that they're going to fix it eventually, but given that I'm in the middle of rewriting my original SlideShare Presentation XML deserializer I wanted to post my current solution to this problem and solicit the opinion of the .NET community in order to find a more elegant solution, if one exists.
Here's the problem - if you're trying to query any number of presentation objects from SlideShare, you're inevitably going to make a call to get_slideshow or to some method which depends on it. That's an unavoidable fact of life when you're dealing with the SlideShare API. For the most part, the SlideShare API's response format is intuitive and intelligible - here's an example response using actual data:
<?xml version="1.0" encoding="utf-8"?> <Slideshow> <ID>3997766</ID> <Title>Startup Metrics 4 Pirates (Montreal, May 2010)</Title> <Description>slides from my talk at Startup Camp Montreal 6 (May, 2010)</Description> <Status>2</Status> <Username>dmc500hats</Username> <URL>[removed for fomatting reasons]</URL> <ThumbnailURL>[removed for fomatting reasons]</ThumbnailURL> <ThumbnailSmallURL>[removed for fomatting reasons]</ThumbnailSmallURL> <Embed> [removed for fomatting reasons] </Embed> <Created>Thu May 06 14:10:46 -0500 2010</Created> <Language>en</Language> <Format>ppt</Format> <Download>1</Download> <Tags> <Tag Count="1" Owner="1">startup</Tag> <Tag Count="1" Owner="1">scmtl</Tag> <Tag Count="1" Owner="1">leanstartup</Tag> <Tag Count="1" Owner="1">acquisition</Tag> <Tag Count="1" Owner="1">activation</Tag> <Tag Count="1" Owner="1">poutin</Tag> <Tag Count="1" Owner="1">pirate</Tag> <Tag Count="1" Owner="1">metrics</Tag> <Tag Count="1" Owner="1">referral</Tag> <Tag Count="1" Owner="1">aarrr</Tag> <Tag Count="1" Owner="1">retention</Tag> <Tag Count="1" Owner="1">revenue</Tag> </Tags> <NumDownloads>18</NumDownloads> <NumViews>1002</NumViews> <NumComments>0</NumComments> <NumFavorites>3</NumFavorites> <NumSlides>67</NumSlides> <RelatedSlideshows> <RelatedSlideshowID rank="6">89026</RelatedSlideshowID> <RelatedSlideshowID rank="4">602558</RelatedSlideshowID> <RelatedSlideshowID rank="3">629696</RelatedSlideshowID> <RelatedSlideshowID rank="5">629833</RelatedSlideshowID> <RelatedSlideshowID rank="9">1064559</RelatedSlideshowID> <RelatedSlideshowID rank="10">1566287</RelatedSlideshowID> <RelatedSlideshowID rank="1">2992302</RelatedSlideshowID> <RelatedSlideshowID rank="2">3017886</RelatedSlideshowID> <RelatedSlideshowID rank="8">3387416</RelatedSlideshowID> <RelatedSlideshowID rank="7">3951684</RelatedSlideshowID> </RelatedSlideshows> <PrivacyLevel>0</PrivacyLevel> <SecretURL>0</SecretURL> <AllowEmbed>0</AllowEmbed> <ShareWithContacts>0</ShareWithContacts> </Slideshow>
Intuitive, no? However, let's take a closer look at the <Created> field, which contains the date that this SlideShare presentation was originally uploaded:
<Created>Thu May 06 14:10:46 -0500 2010</Created>
What, pray tell, is this? It appears that SlideShare REST 2.0's date format consists of the following elements in left to right order:
<Created>Day of week name, Month name, Day of month, Hours:Minutes:Seconds, UTC offset, Year</Created>
Unfortunately, this format is not one of the many supported Standard Date and Time Formats in .NET nor is it a UNIX time format, so DateTime.Parse and the other built-in DateTime parsing variants in .NET will not be able to parse this DateTime data from the SlideShare API. Thus, I created my own admittedly ugly solution to this problem, which I will reveal below, every ugly piece by ugly piece.
Despite how ugly this solution is, it's been tested pretty thoroughly and works (as far as I know.) The only thing it does not do, currently, is use the UTC offset in any way, shape or form, which is a modification I am currently working on in my new version of the SlideShare presentation deserializer which uses these functions.
The first issue is that there's no numeric value for the month - you have to map a three-letter code to its correspodning month, i.e. "Mar" = 3, "May" = 5, and so forth. Here's the function I made for that very purpose:
public static int GetMonth(string slideShareDatetime) { try { //Skip past the three-letter code for the day of the week and the space in between. string MonthStr = slideShareDatetime.Substring(4, 3).ToLower(); int returnValue = 1; switch (MonthStr) { case "jan": returnValue = 1; break; case "feb": returnValue = 2; break; case "mar": returnValue = 3; break; case "apr": returnValue = 4; break; case "may": returnValue = 5; break; case "jun": returnValue = 6; break; case "jul": returnValue = 7; break; case "aug": returnValue = 8; break; case "sep": returnValue = 9; break; case "oct": returnValue = 10; break; case "nov": returnValue = 11; break; case "dec": returnValue = 12; break; default: throw new InvalidOperationException("Unable to recognize month" + MonthStr + " in SlideShareDate."); } return returnValue; } catch (Exception ex) { logger.ErrorException("Unable to parse month from SlideShare presentation.", ex); throw; } }
The next thing we have to do is extract the numeric day of the week, which I do using the function below:
public static int GetDay(string slideShareDatetime) { string strRegexPattern = @"((\d|\d{2})\b){1}"; Regex r = new Regex(strRegexPattern); string matchingDay = r.Match(slideShareDatetime).ToString(); return Convert.ToInt32(matchingDay); }
I tested several regular expressions numerous times before I settled on this one, which I liked for its simplicity - all it does is look for the first instance of a one or two numeric character word in a given string.
Next, we have to extract the year from the string, which I do using the function below:
public static int GetYear(string slideShareDatetime) { string YearStr = slideShareDatetime.Substring(slideShareDatetime.Length - 4, 4); return Convert.ToInt32(YearStr); }
All this function does is extract the last four characters from the string (the date, in this case) and converts it into an integer. I suppose this function will break sometime around the year 10,000, or whenever SlideShare adds some whitespace to the end of their XML response for the Created field, whichever comes first.
Lastly we have to extract the hour:minutes:seconds values into some sort of intelligible format, so thus I present you with yet another cringe-worthy function:
public static Dictionary GetTime(string slideShareDatetime) { try { Dictionary returnValue = null; string strRegexPattern = @"((\d{2}:\d{2}:\d{2})\b)"; Regex r = new Regex(strRegexPattern); string matchingDay = r.Match(slideShareDatetime).ToString(); if (matchingDay == String.Empty) throw new InvalidOperationException("Unable to parse time of day from SlideShareDate:" + slideShareDatetime); string[] TimeParts = matchingDay.Split(':'); if (TimeParts.Length < 3) throw new InvalidOperationException("Invalid date/time parsed from SlideShareDate:" + slideShareDatetime); returnValue = new Dictionary(); returnValue.Add("hours", Convert.ToInt32(TimeParts[0])); returnValue.Add("minutes", Convert.ToInt32(TimeParts[1])); returnValue.Add("seconds", Convert.ToInt32(TimeParts[2])); return returnValue; } catch (Exception ex) { logger.ErrorException("Failed to parse time from presentation.", ex); throw; } }
And then we put it all together in one, big, happy function:
public static DateTime ParseSlideShareDateTime(string slideShareDatetime) { int Month = GetMonth(slideShareDatetime); int Year = GetYear(slideShareDatetime); int Day = GetDay(slideShareDatetime); Dictionary Time = GetTime(slideShareDatetime); DateTime properDate = new DateTime(Year, Month, Day, Time["hours"], Time["minutes"], Time["seconds"]); return properDate; }
Yeah. So - can anybody think of a much more, ahem, "robust" way of parsing the Created field from the SlideShare API? I would love to hear it! :)