Thursday, 22 May 2014

NSBCon 2014 – All About NServiceBus

I'm excited to say I'll be speaking at NSBCon here in London in June. NServiceBus is the most popular open-source service bus for .NET platforms, and I'll be talking about how we use NServiceBus at Spotlight to deliver online photo and multimedia publishing systems aimed at professional actors, casting directors and production professionals.
I'll be talking about two distinct systems. The first is our proprietary system for managing online photo publication. Spotlight has been publishing photographs in one form or another since 1927, and so our current system is the latest in a long, long series of incremental developments - from acid-etched brass photograph plates, to lithographic bromide machines, to industrial film scanners and ImageMagick. Photography is an absolutely fundamental part of the creative casting process, and we reached a point a few years ago where our web servers were spending 60%-70% of their CPU time rendering photographs. Working within the limitations of our legacy systems, we developed a distributed thumbnailing and caching system that delivers the same results with a fraction of the processing overhead. I'll discuss how we used NServiceBus to work around those limitations, the architectural and operation challenges we faced building and running this system, and some of the lessons we learned when, a year after deploying it into production, we had to migrate this system from onsite hosting to a private cloud.
The second is an online audio/video upload and publishing system. Working with online video offers a unique set of challenges - dozens of incompatible file formats, massive file uploads, unpredictable analysis and encoding jobs. We built and deployed a system that satisfied these requirements whilst offering an intuitive, responsive user experience, and in this talk I'll cover the high-level architectural approach that enabled this. We'll look at how we used NServiceBus to decouple the 'heavy lifting' components of the application from the customer-facing user interface components, and some of the lessons we learned deploying a distributed NServiceBus application in an Amazon EC2 hosting environment.
NSBCon is at SkillsMatter here in London, on the 26th and 27th of June 2014. I'll be speaking alongside a great panel of .NET and distributed systems experts, including Udi Dahan, Ayende, Greg Young, Andreas Ohlund and many others. Follow @NSBCon on Twitter for more updates, and sign up at SkillsMatter if you're interested.

Friday, 31 January 2014

How to read FormsAuthentication.Timeout in .NET 3.5

Forms authentication in .NET has a timeout property, controlled via the web.config section like so:

<system.web>
    <authentication>
        <forms name="foo" loginUrl="~/Account/Login" timeout="30" />
    </authentication>
</system.web>

This setting has existed since .NET 1.0, I believe, but only in .NET 4 did we get a corresponding property on the FormsAuthentication object. To access the Timeout property in .NET 4+, you just call

FormsAuthentication.Timeout

To do the same in .NET 3.5, you need to do this:

var defaultTimeout = TimeSpan.FromMinutes(30);
var xml = new XmlDocument();
var webConfigFilePath = Path.Combine(HttpRuntime.AppDomainAppPath, "web.config");
xml.Load(webConfigFilePath);
var node = xml.SelectSingleNode("/configuration/system.web/authentication/forms");
if (node == null || node.Attributes == null) return (defaultTimeout);
var attribute = node.Attributes["timeout"];
if (attribute == null) return (defaultTimeout);
int minutes;
if (Int32.TryParse(attribute.Value, out minutes)) return(TimeSpan.FromMinutes(minutes));
return(defaultTimeout);

I'll file this under "Things that wrong with Forms Authentication, #217"

Friday, 17 January 2014

Friday Puzzle Time! (aka EU Roaming Data Charges)

I’m going to France. I want to be able to use my phone for data when I’m out there, so I’m investigating the cheapest way to do this. My phone is on Orange, who are now owned by EE, so I wander over to http://explore.ee.co.uk/roaming/orange/france and have a look.

(I should mention here that I’m generally really happy with Orange. The reception’s good. Their network doesn’t randomly drop calls or send me text messages 12 hours late, and Orange Wednesdays are awesome.)

Anyway. Here’s their data options:

image

Before we even get into the chaos of daily limits and terms and conditions, notice how the EU 100MB daily bundle gives you 100MB for £3, whilst the EU Data 500MB bundle gives you 500Mb for a hundred and fifty pounds? Same data but the bundle offers “93% saving on standard rates” – which is such a huge discount as to be really quite unsettling. If someone tried to sell you a MacBook for £50, you’d be a bit suspicious, right?

So I tried to work out what that catch might be – using EE’s website; Orange’s EU Mobile Internet Daily Bundles T&Cs, and a phone call to Orange customer services. Here’s what I found.

My Question Website T&Cs Customer Services (referring to an internal  T&C document dated Dec 15th 2013)
What happens if I go over 100MB in a single day? “This bundle reoccurs up to 10 times until midnight of that same day (local time), each time the bundle reoccurs you will be charged an additional bundle fee and recieve additional bundle allowance. If you exceed the reoccuring bundle limit you’ll be charged 45.9p/MB.”

(Does this mean I get ten bundles per day? Or one bundle per day, for up to ten days?)
“Any data used above the amount in your bundle, will be charged at our standard data roaming rate of 45.9p per MB. So, for example, if you have the £1 per day bundle and use more than your 20MB you will be charged an additional £1 for every additional 20MB you use up to 200MB (as the bundle reoccurs ten times). Any data usage in excess of 200MB will then be charged at our standard data roaming rate of 45.9p per MB.”

(that sounds to me like you get ten bundles per day – otherwise it would just say “if you use more than 20Mb you’ll pay standard rates for the rest of the day” – no?)
Customer services advised me that bundles only work once per day – so when you reach 100MB, you’ll pay normal roaming charges until midnight, and you’ll get another 100MB bundle the next day.
How many times can I use the bundle in a single day? See above. “A daily bundle will last until midnight (local time of the country you purchased the bundle in) on the day you made the  purchase, or until you have used up your data allowance, whichever comes first in time. Once  you’ve purchased a bundle, you will be opted in for the bundle to reoccur up to ten times”

Doesn’t say if that’s ten times per day, or ten times per trip, or per billing period, or what.
Just once – see above. You can’t use two daily bundles in the same day.
How many days does it last for? Really unclear whether it’s ten bundles per day – for as many days as you like – or whether it’s one bundle per day, for up to ten days. And no indication of whether that ten day limit is per trip or per billing period or what. See above. Customer services said there’s no limit – if I went to France for fifty days, I’d get fifty bundles. One per day, for the duration of my trip.
How much does it cost once I exceed my bundle limit? “If you exceed the reoccuring bundle limit you’ll be charged 45.9p/MB.” “Any data used above the amount in your bundle, will be charged at our standard data roaming rate of
45.9p per MB.”
£3.70 per MB. Though I was advised that information might be out of date.
What does the “ten bundle” limit actually mean? Unclear Unclear No idea. Neither of the advisors I spoke to could tell me what the “up to 10 times” limit actually meant.

So, let’s spin those various answers into some possible phone bills, bearing in mind nobody can actually give me a definitive answer. Imagine we’re going to France from Feb 1st to Feb 16th, and we’re going to use 25Mb/day most days on e-mail and Facebook and the like, and 250Mb on Fridays, ‘cos we’re watching snowboarding videos on YouTube.

  • With 1 EU100 bundle per day, unlimited days – that’d cost us £185.70
  • At 1 EU100 bundle per day, up to 10 days – would cost us £270.98
  • The EU500 plan? Given our 500MB quota runs out halfway through our trip, we’d pay £322.13
  • If the chap on the phone was right about £3.70/MB, we’d be looking at £1,110 in excess data charges for our two nights of YouTube, and a total bill of £1,158.

And on top of all that, whoever wrote their (legally binding?) terms & conditions cannot spell ‘receive’ or ‘recurring’.

No wonder I have a headache.

UPDATE: So when I texted EU100 to 2088 to activate the bundle, it failed… I rang them (again), and apparently there’s already a bundle on there. It’s been on there since February 2013. Will it run out? “No, sir”. Is there any kind of limit on how many days I get? “No, sir.”

So that’s an hour of research and phone calls to work out that nobody knows for sure what the bundle is, but it’s probably safe to assume it *doesn’t* run out because I’ve already got one and it’s been active for nearly a year.

O_o

Tuesday, 17 December 2013

The Joy of GUIDs

Like many developers, my first experience of primary keys was the Autonumber field in MS Access. At the time, this looked like a pretty neat idea. When I outgrew Access and moved on to using SQL Server, I adopted the convention of an identity(1,1) field as a primary key - a pattern that's used extensively both within my own code and throughout the industry.

Recently, though, I've been working on big distributed systems where the identity field really doesn't work too well. The biggest drawback is that if you create a new entity in any of your data stores, you need to wait for the response from the database before you can find it again. For RESTful systems where your Create method might just return an HTTP 202 "Accepted for processing" and leave you to work out whether your request eventually succeeded or not, this approach just plain doesn't work. Enter the GUID - globally unique identifier. The idea is that every GUID ever created is completely unique - it uses various system/hardware IDs, the system clock and a degree of randomness to create unique 128-bit values. The lovely thing about working with GUIDs is that when you create your new customer, or order, or invoice, YOU can assign the ID that will identify the object for the rest of its life.

Thing is, working with GUIDs is very, very different to working with ints. For starters, a lot of security practises around integers just don't apply.

A great example is URL-hacking. If you've ever seen an address in your browser's address bar that looks like:

http://www.myshop.com/basket.asp?basketid=2789

Try editing it. Try changing the query string to be basketid=2788 - on many (poorly written!) systems, that'll show you what's in someone else's basket. It doesn't take a lot of work to write a little script that'll GET all the basket IDs from 0-999999 and voila! you've got a folder full of other people's shopping baskets. Even if the developers do something terribly clever - like using identity(27897,17) instead of identity(1,1) - if you can do ten HTTP requests per second, you can still sweep the entire ID range from 0-1,000,000 in just over 24 hours. Sure, only 1 in 17 of your requests will succeed - but who cares? That's still 58,823 compromised shopping baskets.

Now, imagine instead of sequental integers, you use a GUID:

http://www.myshop.com/basket.asp?basketid=fa8f6423-e7d6-ff40-7afc-ae874c755c00

Try hacking it. It won't work. There are 3.4×1038 possible GUIDs, and roughly seven billion people on earth. If every single person on the planet used your website every day, at the end of a year you'd have 10,000,000,000 * 365 shopping baskets in your system. Let's see how long it would take an attacker to steal those baskets by URL-hacking:

Number of possible basket IDs Bp = 3.4x10^38

Number of actual basket IDs Ba = 3.65x10^10

Likelihood of a randomly chosen GUID finding a shopping basket = Bp/Ba = 1.0x10-28

Let's assume the website we're attacking is running on some serious distributed hardware so it'll remain responsive throughout our attack, and you have a massive botnet so you can check baskets until the server starts to struggle under the load. Facebook manages 6 million page-views per minute, or 100,000 requests per second. At 100,000 requests per second, it will take you (on average) 2,953,788,842,323,276 years to find a single shopping basket by URL-hacking.  Given those odds, a lot of security concerns around vulnerabilities like URL-hacking just disappear.

I've sat through numerous conversations that get onto "what'll happen if we get a GUID collision?"

There's really only one sensible answer to this: You'll find the place in your code where you're accidentally re-using the same GUIDs, and fix the bug. GUIDs don't collide - you're doing it wrong. If you're using Guid.NewGuid() or NEWID(), you won't get collisions. It just won't happen. There isn't even any meaningful analogy with daily human experience that can convey just how unlikely a GUID collision is. Things with a probability of 1.0x10-28 just don't happen.

Friday, 11 October 2013

Automatic Semantic Versioning with GitHub and TeamCity

You’ve quite possibly come across the idea of semantic versioning. It's a set of rules for versioning releases of software, designed to minimise disruption to people who are relying on your APIs not to change. In a nutshell, every release of your software has a three-part version number.

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards-compatible manner, and
  3. PATCH version when you make backwards-compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

Pretty simple, huh? In other words, 2.3.4 should offer the same feature set as 2.3.1 but with extra bug fixes. 2.4.0 is backwards-compatible with 2.3.2, and with 2.2.*, and with 2.1.*, and when you go for the ‘big rewrite’ and break everything, that’s version 3.0.0 which isn’t compatible with any previous versions.

Our main development pipeline is now based on TeamCity 8 and GitHub – both of which are absolutely wonderful once you get the hang of using them properly – and I wanted to automate the versioning of our builds according to semantic versioning principles. Here’s the workflow I settled on:

  • Every build has a MAJOR, MINOR, PATCH, and BUILD version, and an optional SUFFIX
  • Each part of the version number is managed as a TeamCity build parameter.
  • MAJOR and MINOR are managed manually. TeamCity can’t tell whether your commit is a feature or a bugfix, so for now we’ve just settled for manually changing the relevant parameters when you release a major or minor release.
  • All development is done on feature branches and submitted via pull requests.
  • Any pull request is assumed to be a PATCH release unless a developer changes the major or minor numbers when they accept it.
  • The SUFFIX is used to denote a pre-release package, and should indicate which branch’s features are included in that prerelease
    • e.g. Dylan.Web-1.2.3.0-branch7 should be the code you’d get if you accepted branch7 into master 1.2.3
  • Anything built off master should be releasable, and tagged as such
  • Finally, whenever a production release is built, we’ll tag the GitHub repository with the version of that release – for tagging, we ignore the build number, so the tag will be something like v0.0.0

This lot actually distils to quite a simple guideline: The patch number is the number of pull requests accepted since the last tagged minor release.

Now that sounds like something you could automate… so with Powershell on one screen and the Github API documentation on the other, I hacked together the following script, which is now happily running as a build step on our TeamCity server.

$MajorVersion = %MajorVersion% # The major version of your current build
$MinorVersion = %MinorVersion% # The minor version of your current build
$RepoUrl = "%vcsroot.url%" # The HTTPS path to your repo – https://github.com/MyCompany/MyRepository.git
$Token = "%GitHubAccessToken%" # Your GitHub access token, from https://github.com/settings/applications

try {
    # Parse the supplied Git repo URL to determine the repo name and owner
    $RepoUrl -match "
https://github.com/([^/]+)/(.+)\.git$"
    $repoOwner = $matches[1]
    $repoName = $matches[2]
    Write-Host "Reading repo $repoName owned by $repoOwner"
       
    # Find the tag resource matching the baseline of the current major/minor version
    # Remember, we’re not looking for X.Y.Z, we’re looking for X.Y.0 since patch numbers
    # should increment until the next minor release.

    $uri = "
https://api.github.com/repos/$repoOwner/$repoName/git/refs/tags/v$MajorVersion.$MinorVersion.0?access_token=$Token"
    Write-Output "Looking for tag v$MajorVersion.$MinorVersion.0 at $uri"
    $tag = Invoke-RestMethod -Uri "$uri"

    # $tag.object.url will now give us a more detailed tag resource, including the commit that was tagged
    $uri = $tag.object.url
    Write-Output "Getting tag info from $uri`?access_token=$Token"
    $tag = Invoke-RestMethod -Uri "$uri`?access_token=$Token"
       
    # $tag.object.url is now a link to the tagged commit
    $uri = $tag.object.url
    $tag = Invoke-RestMethod -Uri "$uri`?access_token=$Token"

    # now we can dig into the commit itself and find out WHEN the baseline release was tagged...
    $since = $tag.committer.date

    # Now we can retrieve all the commits in this repo SINCE that date
    $commitsUri = "
https://api.github.com/repos/$repoOwner/$repoName/commits?since=$since&access_token=$Token"
    Write-Host "Retrieving commit log from $commitsUri"
    $commits = Invoke-RestMethod -Uri "$commitsUri"
    $merges = @($commits | Where-Object { $_.commit.message -match "^Merge pull request" })

    # Reversing the merges just means they show up in TeamCity’s build log in chronological order.
    # Which is nice.
    [Array]::Reverse($merges)

    $mergeCount = $merges.Count
    Write-Host "Found $mergeCount merges since last release tag"
    for($i = 0; $i -lt $merges.Count; $i++) {
        $merge_number = $i+1
        $merge_message = $merges[$i].commit.message.split("`r`n",[StringSplitOptions]"RemoveEmptyEntries") -join "`r`n    "
        $merge_sha = $merges[$i].sha
        $merge_web_url = $RepoUrl.Replace(".git", "/commit/$merge_sha")
        "`r`n  Merge #$merge_number`: $merge_web_url"
        "    $merge_message"
    }
    Write-Host "`r`n"
    Write-Output "##teamcity[setParameter name='PatchVersion' value='$mergeCount']"
} catch [System.Exception] {
    Write-Host "Exception trying to determine patch number from Github API"
    Write-Host $_
    Write-Host "Using default patch number 0"
    Write-Output "##teamcity[setParameter name='PatchVersion' value='0']"
}

A couple of fun details to look out for:

  • If we’re doing the first build of a new minor version, there won’t be any ‘baseline’ tagged version yet – which is fine; we just set the patch number to zero and off we go.
  • You can control TeamCity build parameters by outputting specially-formatted messages in your build log – that’s what those lines

    Write-Output "##teamcity[setParameters name='SomeParameter' value='NewValue']"

    are doing. See Build Script Interaction with TeamCity for more info.

Finally, if you'd like the commit/merge hyperlinks in your TeamCity build log to be clickable, check out this awesome tip from Christian Rodemeyer about how to embed clickable URL links in TeamCity build logs.

Happy versioning!

Monday, 19 August 2013

Building a ServiceStack-based OAuth2 resource server using DotNetOpenAuth

Last week, I asked on StackOverflow if anyone had used DotNetOpenAuth to secure a ServiceStack API using OAuth2. Well, it would appear not… so alongside a week of band rehearsals and a big old party last weekend, I’ve put a proof of concept up onGithub that demonstrates how to use ServiceStack and DotNetOpenAuth together to create a REST API that’s secured using OAuth2. The app I’m working on is aimed specifically at existing customers, who will already have a login account on our site, so I don’t need to use OpenID at all. In fact, I spent a good few days puzzling over this before I realised my mistake. After digging through the various specs and innumerable blog posts, I decided to use the OAuth2 authorization framework – specifically, a feature of that framework called the resource owner password credentials grant

The DotNetOpenAuth project includes a couple of great OAuth2 samples, including an authorization server (the bit that does the customer login), but they’re built primarily around ASP.NET MVC or WCF. I’ve worked with both of these in the past, but recently I’ve switched to using the ServiceStack library to build ReST APIs, and I absolutely love it. It’s easy to use, easy to extend, and wonderfully lightweight, so I wanted to use ServiceStack to build the data API that our app is going to use. I assumed it wouldn’t be too hard to build our own authorisation server, and then use a standard protocol like OAuth to control access to the resources exposed by this server.

There’s a working prototype / proof of concept up on GitHub now, at https://github.com/dylanbeattie/OAuthStack – complete with commentary on how the various integration points worked. OAuth2 is actually conceptually quite simple, but getting the two libraries to work together proved a little fiddly; both ServiceStack and OpenAuthDotNet use their own abstractions over the ASP.NET request/response objects, making me really wish Microsoft hadn’t made the .NET HTTP framework quite so inflexible back in the day… but after a week or so of tinkering, testing, poring over RFCs and bouncing around in the Visual Studio debugger, I’ve got something that seems pretty lightweight, doesn’t reinvent the wheel, and seems to satisfy the requirements.

Suffice to say that if DotNetOpenAuth and ServiceStack weren’t open source this kind of integration would have been practically impossible, so I want to say a huge thank you to the authors and maintainers of both of those projects. Thanks, guys. You rock :)

Monday, 24 June 2013

Linking Subversion Commits to Pivotal Tracker on Windows using Powershell

My team at work now keep our entire lives in PivotalTracker, and rather like it. We recently linked it to our GitHub repository using the built-in PivotalTracker integration that’s provided by GitHub, which worked really nicely. Thing is, we’re still in the process of migrating our codebase to GitHub – there’s lots of older projects with TeamCity jobs still linked to our in-house Subversion repo – and I wanted to add a similar facility for our projects that are still in Subversion.

This is all based on the Source Control Management Post-Commit Hook Integration feature in the Pivotal Tracker API, and getting it working required two scripts.

post-commit.bat – this is the standard batch file used by Subversion on Windows to run code after a successful commit. Subversion runs this automatically, and passes in two command-line arguments – the (local) path of the repository that accepted the commit, and the revision number created by that commit. Ours now looks like this:

@echo off
$ps = %SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe
$ps Set-ExecutionPolicy unrestricted $ps -command "D:\svn\data\repositories\projects\hooks\post-commit.ps1" -repopath %1 -revision %2

A couple of things to notice. First, we’re running the powershell Set-ExecutionPolicy as part of the script. This is brute force and there’s probably a nicer way of doing it, but since the Powershell script runs as the user who owns the Subversion server process, and that user doesn’t have interactive logon rights, I couldn’t see a quicker way of getting that script to run reliably. Ah, Powershell security, how we love you.

Second – we’re passing the two command-line arguments into the powershell script as named parameters. Subversion will call the batch file as:

post-commit.bat D:\svn\data\repositories\projects 12345

and we’re translating that into

powershell.exe –command post-commit.ps1 –repopath D:\svn\data\repositories\projects –revision 12345

The bit that actually does the heavy lifting is post-commit.ps1

param([string]$repopath, [string]$revision)   

$svnlook = "C:\Program Files\Subversion\bin\svnlook.exe"

$message = & "$svnlook" log $repopath -r $revision | Out-String

if ($message -match "#\d\d\d\d\d\d\d\d") {
    $reponame = Split-Path $repopath –leaf
    $author = & "$svnlook" author $repopath -r $revision | Out-String
   
    # Grab a reference to System.Web so we can HtmlEncode things.
    Add-Type -AssemblyName System.Web
    $author = [System.Web.HttpUtility]::HtmlEncode($author)
    $message = [System.Web.HttpUtility]::HtmlEncode($message)

    # We run ViewCV on our Subversion server to browse code via the web. Tweak this to suit,
    # or remove the <url></url> element from the XML completely.

    $url = "
http://subversion.mycompany.com/viewvc/$reponame`?view=revision&revision=$revision"

    $headers = @{"X-TrackerToken" = "put your Pivotal Tracker API Token Here"}
    # This is Powershell's "here-string" syntax for multiline string literals.
    $body = @"

<source_commit>
    <message>$message</message>
    <author>$author</author>
    <commit_id>$revision</commit_id>
    <url>$url</url>
</source_commit>
"@

    $r = Invoke-WebRequest -Uri http://www.pivotaltracker.com/services/v3/source_commits -ContentType "application/xml" -Method POST -Headers $headers -Body $body

    Write-Host $r
} else {
    # if nothing in the commit message matched the #12345678 syntax, then don’t bother calling the API.
}

To test it, first run the Powershell script directly from your command line. Say you’ve just committed revision 12345 to the “projects” repository, remembering to include [#12345678] in your commit comment where 12345678 is the PivotalTracker story ID.

PS D:\svn\data\repositories\projects\hooks> ./post-commit.ps1 –repopath D:\svn\data\repositories\projects –revision 12345

Your commit comment should appear in the story history within a few seconds.

Now check the batch file harness can run your Powershell properly – see the note below about revisions:

D:\svn\data\repositories\projects\hooks> post-commit.bat D:\svn\data\repositories\projects 12346

If that works, the whole kaboodle should work. A couple of caveats to watch out for:

  1. If your post-commit hook fails, your Subversion client will give you something cryptic about “Commit failed: MERGE … 200 OK” – which is not particularly transparent. Best to get things working on the command line first before testing via Subversion itself.
  2. I couldn’t get the same commit to produce multiple Pivotal tickets – when I was testing it, I had to create a fresh commit for each test run of the integration hook. This is possibly just Pivotal automatically de-duplicating identical stories – but worth knowing for test purposes