Friday, 31 January 2014

How to read FormsAuthentication.Timeout in .NET 3.5

Forms authentication in .NET has a timeout property, controlled via the web.config section like so:

<system.web>
    <authentication>
        <forms name="foo" loginUrl="~/Account/Login" timeout="30" />
    </authentication>
</system.web>

This setting has existed since .NET 1.0, I believe, but only in .NET 4 did we get a corresponding property on the FormsAuthentication object. To access the Timeout property in .NET 4+, you just call

FormsAuthentication.Timeout

To do the same in .NET 3.5, you need to do this:

var defaultTimeout = TimeSpan.FromMinutes(30);
var xml = new XmlDocument();
var webConfigFilePath = Path.Combine(HttpRuntime.AppDomainAppPath, "web.config");
xml.Load(webConfigFilePath);
var node = xml.SelectSingleNode("/configuration/system.web/authentication/forms");
if (node == null || node.Attributes == null) return (defaultTimeout);
var attribute = node.Attributes["timeout"];
if (attribute == null) return (defaultTimeout);
int minutes;
if (Int32.TryParse(attribute.Value, out minutes)) return(TimeSpan.FromMinutes(minutes));
return(defaultTimeout);

I'll file this under "Things that wrong with Forms Authentication, #217"

Friday, 17 January 2014

Friday Puzzle Time! (aka EU Roaming Data Charges)

I’m going to France. I want to be able to use my phone for data when I’m out there, so I’m investigating the cheapest way to do this. My phone is on Orange, who are now owned by EE, so I wander over to http://explore.ee.co.uk/roaming/orange/france and have a look.

(I should mention here that I’m generally really happy with Orange. The reception’s good. Their network doesn’t randomly drop calls or send me text messages 12 hours late, and Orange Wednesdays are awesome.)

Anyway. Here’s their data options:

image

Before we even get into the chaos of daily limits and terms and conditions, notice how the EU 100MB daily bundle gives you 100MB for £3, whilst the EU Data 500MB bundle gives you 500Mb for a hundred and fifty pounds? Same data but the bundle offers “93% saving on standard rates” – which is such a huge discount as to be really quite unsettling. If someone tried to sell you a MacBook for £50, you’d be a bit suspicious, right?

So I tried to work out what that catch might be – using EE’s website; Orange’s EU Mobile Internet Daily Bundles T&Cs, and a phone call to Orange customer services. Here’s what I found.

My Question Website T&Cs Customer Services (referring to an internal  T&C document dated Dec 15th 2013)
What happens if I go over 100MB in a single day? “This bundle reoccurs up to 10 times until midnight of that same day (local time), each time the bundle reoccurs you will be charged an additional bundle fee and recieve additional bundle allowance. If you exceed the reoccuring bundle limit you’ll be charged 45.9p/MB.”

(Does this mean I get ten bundles per day? Or one bundle per day, for up to ten days?)
“Any data used above the amount in your bundle, will be charged at our standard data roaming rate of 45.9p per MB. So, for example, if you have the £1 per day bundle and use more than your 20MB you will be charged an additional £1 for every additional 20MB you use up to 200MB (as the bundle reoccurs ten times). Any data usage in excess of 200MB will then be charged at our standard data roaming rate of 45.9p per MB.”

(that sounds to me like you get ten bundles per day – otherwise it would just say “if you use more than 20Mb you’ll pay standard rates for the rest of the day” – no?)
Customer services advised me that bundles only work once per day – so when you reach 100MB, you’ll pay normal roaming charges until midnight, and you’ll get another 100MB bundle the next day.
How many times can I use the bundle in a single day? See above. “A daily bundle will last until midnight (local time of the country you purchased the bundle in) on the day you made the  purchase, or until you have used up your data allowance, whichever comes first in time. Once  you’ve purchased a bundle, you will be opted in for the bundle to reoccur up to ten times”

Doesn’t say if that’s ten times per day, or ten times per trip, or per billing period, or what.
Just once – see above. You can’t use two daily bundles in the same day.
How many days does it last for? Really unclear whether it’s ten bundles per day – for as many days as you like – or whether it’s one bundle per day, for up to ten days. And no indication of whether that ten day limit is per trip or per billing period or what. See above. Customer services said there’s no limit – if I went to France for fifty days, I’d get fifty bundles. One per day, for the duration of my trip.
How much does it cost once I exceed my bundle limit? “If you exceed the reoccuring bundle limit you’ll be charged 45.9p/MB.” “Any data used above the amount in your bundle, will be charged at our standard data roaming rate of
45.9p per MB.”
£3.70 per MB. Though I was advised that information might be out of date.
What does the “ten bundle” limit actually mean? Unclear Unclear No idea. Neither of the advisors I spoke to could tell me what the “up to 10 times” limit actually meant.

So, let’s spin those various answers into some possible phone bills, bearing in mind nobody can actually give me a definitive answer. Imagine we’re going to France from Feb 1st to Feb 16th, and we’re going to use 25Mb/day most days on e-mail and Facebook and the like, and 250Mb on Fridays, ‘cos we’re watching snowboarding videos on YouTube.

  • With 1 EU100 bundle per day, unlimited days – that’d cost us £185.70
  • At 1 EU100 bundle per day, up to 10 days – would cost us £270.98
  • The EU500 plan? Given our 500MB quota runs out halfway through our trip, we’d pay £322.13
  • If the chap on the phone was right about £3.70/MB, we’d be looking at £1,110 in excess data charges for our two nights of YouTube, and a total bill of £1,158.

And on top of all that, whoever wrote their (legally binding?) terms & conditions cannot spell ‘receive’ or ‘recurring’.

No wonder I have a headache.

UPDATE: So when I texted EU100 to 2088 to activate the bundle, it failed… I rang them (again), and apparently there’s already a bundle on there. It’s been on there since February 2013. Will it run out? “No, sir”. Is there any kind of limit on how many days I get? “No, sir.”

So that’s an hour of research and phone calls to work out that nobody knows for sure what the bundle is, but it’s probably safe to assume it *doesn’t* run out because I’ve already got one and it’s been active for nearly a year.

O_o

Tuesday, 17 December 2013

The Joy of GUIDs

Like many developers, my first experience of primary keys was the Autonumber field in MS Access. At the time, this looked like a pretty neat idea. When I outgrew Access and moved on to using SQL Server, I adopted the convention of an identity(1,1) field as a primary key - a pattern that's used extensively both within my own code and throughout the industry.

Recently, though, I've been working on big distributed systems where the identity field really doesn't work too well. The biggest drawback is that if you create a new entity in any of your data stores, you need to wait for the response from the database before you can find it again. For RESTful systems where your Create method might just return an HTTP 202 "Accepted for processing" and leave you to work out whether your request eventually succeeded or not, this approach just plain doesn't work. Enter the GUID - globally unique identifier. The idea is that every GUID ever created is completely unique - it uses various system/hardware IDs, the system clock and a degree of randomness to create unique 128-bit values. The lovely thing about working with GUIDs is that when you create your new customer, or order, or invoice, YOU can assign the ID that will identify the object for the rest of its life.

Thing is, working with GUIDs is very, very different to working with ints. For starters, a lot of security practises around integers just don't apply.

A great example is URL-hacking. If you've ever seen an address in your browser's address bar that looks like:

http://www.myshop.com/basket.asp?basketid=2789

Try editing it. Try changing the query string to be basketid=2788 - on many (poorly written!) systems, that'll show you what's in someone else's basket. It doesn't take a lot of work to write a little script that'll GET all the basket IDs from 0-999999 and voila! you've got a folder full of other people's shopping baskets. Even if the developers do something terribly clever - like using identity(27897,17) instead of identity(1,1) - if you can do ten HTTP requests per second, you can still sweep the entire ID range from 0-1,000,000 in just over 24 hours. Sure, only 1 in 17 of your requests will succeed - but who cares? That's still 58,823 compromised shopping baskets.

Now, imagine instead of sequental integers, you use a GUID:

http://www.myshop.com/basket.asp?basketid=fa8f6423-e7d6-ff40-7afc-ae874c755c00

Try hacking it. It won't work. There are 3.4×1038 possible GUIDs, and roughly seven billion people on earth. If every single person on the planet used your website every day, at the end of a year you'd have 10,000,000,000 * 365 shopping baskets in your system. Let's see how long it would take an attacker to steal those baskets by URL-hacking:

Number of possible basket IDs Bp = 3.4x10^38

Number of actual basket IDs Ba = 3.65x10^10

Likelihood of a randomly chosen GUID finding a shopping basket = Bp/Ba = 1.0x10-28

Let's assume the website we're attacking is running on some serious distributed hardware so it'll remain responsive throughout our attack, and you have a massive botnet so you can check baskets until the server starts to struggle under the load. Facebook manages 6 million page-views per minute, or 100,000 requests per second. At 100,000 requests per second, it will take you (on average) 2,953,788,842,323,276 years to find a single shopping basket by URL-hacking.  Given those odds, a lot of security concerns around vulnerabilities like URL-hacking just disappear.

I've sat through numerous conversations that get onto "what'll happen if we get a GUID collision?"

There's really only one sensible answer to this: You'll find the place in your code where you're accidentally re-using the same GUIDs, and fix the bug. GUIDs don't collide - you're doing it wrong. If you're using Guid.NewGuid() or NEWID(), you won't get collisions. It just won't happen. There isn't even any meaningful analogy with daily human experience that can convey just how unlikely a GUID collision is. Things with a probability of 1.0x10-28 just don't happen.

Friday, 11 October 2013

Automatic Semantic Versioning with GitHub and TeamCity

You’ve quite possibly come across the idea of semantic versioning. It's a set of rules for versioning releases of software, designed to minimise disruption to people who are relying on your APIs not to change. In a nutshell, every release of your software has a three-part version number.

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards-compatible manner, and
  3. PATCH version when you make backwards-compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

Pretty simple, huh? In other words, 2.3.4 should offer the same feature set as 2.3.1 but with extra bug fixes. 2.4.0 is backwards-compatible with 2.3.2, and with 2.2.*, and with 2.1.*, and when you go for the ‘big rewrite’ and break everything, that’s version 3.0.0 which isn’t compatible with any previous versions.

Our main development pipeline is now based on TeamCity 8 and GitHub – both of which are absolutely wonderful once you get the hang of using them properly – and I wanted to automate the versioning of our builds according to semantic versioning principles. Here’s the workflow I settled on:

  • Every build has a MAJOR, MINOR, PATCH, and BUILD version, and an optional SUFFIX
  • Each part of the version number is managed as a TeamCity build parameter.
  • MAJOR and MINOR are managed manually. TeamCity can’t tell whether your commit is a feature or a bugfix, so for now we’ve just settled for manually changing the relevant parameters when you release a major or minor release.
  • All development is done on feature branches and submitted via pull requests.
  • Any pull request is assumed to be a PATCH release unless a developer changes the major or minor numbers when they accept it.
  • The SUFFIX is used to denote a pre-release package, and should indicate which branch’s features are included in that prerelease
    • e.g. Dylan.Web-1.2.3.0-branch7 should be the code you’d get if you accepted branch7 into master 1.2.3
  • Anything built off master should be releasable, and tagged as such
  • Finally, whenever a production release is built, we’ll tag the GitHub repository with the version of that release – for tagging, we ignore the build number, so the tag will be something like v0.0.0

This lot actually distils to quite a simple guideline: The patch number is the number of pull requests accepted since the last tagged minor release.

Now that sounds like something you could automate… so with Powershell on one screen and the Github API documentation on the other, I hacked together the following script, which is now happily running as a build step on our TeamCity server.

$MajorVersion = %MajorVersion% # The major version of your current build
$MinorVersion = %MinorVersion% # The minor version of your current build
$RepoUrl = "%vcsroot.url%" # The HTTPS path to your repo – https://github.com/MyCompany/MyRepository.git
$Token = "%GitHubAccessToken%" # Your GitHub access token, from https://github.com/settings/applications

try {
    # Parse the supplied Git repo URL to determine the repo name and owner
    $RepoUrl -match "
https://github.com/([^/]+)/(.+)\.git$"
    $repoOwner = $matches[1]
    $repoName = $matches[2]
    Write-Host "Reading repo $repoName owned by $repoOwner"
       
    # Find the tag resource matching the baseline of the current major/minor version
    # Remember, we’re not looking for X.Y.Z, we’re looking for X.Y.0 since patch numbers
    # should increment until the next minor release.

    $uri = "
https://api.github.com/repos/$repoOwner/$repoName/git/refs/tags/v$MajorVersion.$MinorVersion.0?access_token=$Token"
    Write-Output "Looking for tag v$MajorVersion.$MinorVersion.0 at $uri"
    $tag = Invoke-RestMethod -Uri "$uri"

    # $tag.object.url will now give us a more detailed tag resource, including the commit that was tagged
    $uri = $tag.object.url
    Write-Output "Getting tag info from $uri`?access_token=$Token"
    $tag = Invoke-RestMethod -Uri "$uri`?access_token=$Token"
       
    # $tag.object.url is now a link to the tagged commit
    $uri = $tag.object.url
    $tag = Invoke-RestMethod -Uri "$uri`?access_token=$Token"

    # now we can dig into the commit itself and find out WHEN the baseline release was tagged...
    $since = $tag.committer.date

    # Now we can retrieve all the commits in this repo SINCE that date
    $commitsUri = "
https://api.github.com/repos/$repoOwner/$repoName/commits?since=$since&access_token=$Token"
    Write-Host "Retrieving commit log from $commitsUri"
    $commits = Invoke-RestMethod -Uri "$commitsUri"
    $merges = @($commits | Where-Object { $_.commit.message -match "^Merge pull request" })

    # Reversing the merges just means they show up in TeamCity’s build log in chronological order.
    # Which is nice.
    [Array]::Reverse($merges)

    $mergeCount = $merges.Count
    Write-Host "Found $mergeCount merges since last release tag"
    for($i = 0; $i -lt $merges.Count; $i++) {
        $merge_number = $i+1
        $merge_message = $merges[$i].commit.message.split("`r`n",[StringSplitOptions]"RemoveEmptyEntries") -join "`r`n    "
        $merge_sha = $merges[$i].sha
        $merge_web_url = $RepoUrl.Replace(".git", "/commit/$merge_sha")
        "`r`n  Merge #$merge_number`: $merge_web_url"
        "    $merge_message"
    }
    Write-Host "`r`n"
    Write-Output "##teamcity[setParameter name='PatchVersion' value='$mergeCount']"
} catch [System.Exception] {
    Write-Host "Exception trying to determine patch number from Github API"
    Write-Host $_
    Write-Host "Using default patch number 0"
    Write-Output "##teamcity[setParameter name='PatchVersion' value='0']"
}

A couple of fun details to look out for:

  • If we’re doing the first build of a new minor version, there won’t be any ‘baseline’ tagged version yet – which is fine; we just set the patch number to zero and off we go.
  • You can control TeamCity build parameters by outputting specially-formatted messages in your build log – that’s what those lines

    Write-Output "##teamcity[setParameters name='SomeParameter' value='NewValue']"

    are doing. See Build Script Interaction with TeamCity for more info.

Finally, if you'd like the commit/merge hyperlinks in your TeamCity build log to be clickable, check out this awesome tip from Christian Rodemeyer about how to embed clickable URL links in TeamCity build logs.

Happy versioning!

Monday, 19 August 2013

Building a ServiceStack-based OAuth2 resource server using DotNetOpenAuth

Last week, I asked on StackOverflow if anyone had used DotNetOpenAuth to secure a ServiceStack API using OAuth2. Well, it would appear not… so alongside a week of band rehearsals and a big old party last weekend, I’ve put a proof of concept up onGithub that demonstrates how to use ServiceStack and DotNetOpenAuth together to create a REST API that’s secured using OAuth2. The app I’m working on is aimed specifically at existing customers, who will already have a login account on our site, so I don’t need to use OpenID at all. In fact, I spent a good few days puzzling over this before I realised my mistake. After digging through the various specs and innumerable blog posts, I decided to use the OAuth2 authorization framework – specifically, a feature of that framework called the resource owner password credentials grant

The DotNetOpenAuth project includes a couple of great OAuth2 samples, including an authorization server (the bit that does the customer login), but they’re built primarily around ASP.NET MVC or WCF. I’ve worked with both of these in the past, but recently I’ve switched to using the ServiceStack library to build ReST APIs, and I absolutely love it. It’s easy to use, easy to extend, and wonderfully lightweight, so I wanted to use ServiceStack to build the data API that our app is going to use. I assumed it wouldn’t be too hard to build our own authorisation server, and then use a standard protocol like OAuth to control access to the resources exposed by this server.

There’s a working prototype / proof of concept up on GitHub now, at https://github.com/dylanbeattie/OAuthStack – complete with commentary on how the various integration points worked. OAuth2 is actually conceptually quite simple, but getting the two libraries to work together proved a little fiddly; both ServiceStack and OpenAuthDotNet use their own abstractions over the ASP.NET request/response objects, making me really wish Microsoft hadn’t made the .NET HTTP framework quite so inflexible back in the day… but after a week or so of tinkering, testing, poring over RFCs and bouncing around in the Visual Studio debugger, I’ve got something that seems pretty lightweight, doesn’t reinvent the wheel, and seems to satisfy the requirements.

Suffice to say that if DotNetOpenAuth and ServiceStack weren’t open source this kind of integration would have been practically impossible, so I want to say a huge thank you to the authors and maintainers of both of those projects. Thanks, guys. You rock :)

Monday, 24 June 2013

Linking Subversion Commits to Pivotal Tracker on Windows using Powershell

My team at work now keep our entire lives in PivotalTracker, and rather like it. We recently linked it to our GitHub repository using the built-in PivotalTracker integration that’s provided by GitHub, which worked really nicely. Thing is, we’re still in the process of migrating our codebase to GitHub – there’s lots of older projects with TeamCity jobs still linked to our in-house Subversion repo – and I wanted to add a similar facility for our projects that are still in Subversion.

This is all based on the Source Control Management Post-Commit Hook Integration feature in the Pivotal Tracker API, and getting it working required two scripts.

post-commit.bat – this is the standard batch file used by Subversion on Windows to run code after a successful commit. Subversion runs this automatically, and passes in two command-line arguments – the (local) path of the repository that accepted the commit, and the revision number created by that commit. Ours now looks like this:

@echo off
$ps = %SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe
$ps Set-ExecutionPolicy unrestricted $ps -command "D:\svn\data\repositories\projects\hooks\post-commit.ps1" -repopath %1 -revision %2

A couple of things to notice. First, we’re running the powershell Set-ExecutionPolicy as part of the script. This is brute force and there’s probably a nicer way of doing it, but since the Powershell script runs as the user who owns the Subversion server process, and that user doesn’t have interactive logon rights, I couldn’t see a quicker way of getting that script to run reliably. Ah, Powershell security, how we love you.

Second – we’re passing the two command-line arguments into the powershell script as named parameters. Subversion will call the batch file as:

post-commit.bat D:\svn\data\repositories\projects 12345

and we’re translating that into

powershell.exe –command post-commit.ps1 –repopath D:\svn\data\repositories\projects –revision 12345

The bit that actually does the heavy lifting is post-commit.ps1

param([string]$repopath, [string]$revision)   

$svnlook = "C:\Program Files\Subversion\bin\svnlook.exe"

$message = & "$svnlook" log $repopath -r $revision | Out-String

if ($message -match "#\d\d\d\d\d\d\d\d") {
    $reponame = Split-Path $repopath –leaf
    $author = & "$svnlook" author $repopath -r $revision | Out-String
   
    # Grab a reference to System.Web so we can HtmlEncode things.
    Add-Type -AssemblyName System.Web
    $author = [System.Web.HttpUtility]::HtmlEncode($author)
    $message = [System.Web.HttpUtility]::HtmlEncode($message)

    # We run ViewCV on our Subversion server to browse code via the web. Tweak this to suit,
    # or remove the <url></url> element from the XML completely.

    $url = "
http://subversion.mycompany.com/viewvc/$reponame`?view=revision&revision=$revision"

    $headers = @{"X-TrackerToken" = "put your Pivotal Tracker API Token Here"}
    # This is Powershell's "here-string" syntax for multiline string literals.
    $body = @"

<source_commit>
    <message>$message</message>
    <author>$author</author>
    <commit_id>$revision</commit_id>
    <url>$url</url>
</source_commit>
"@

    $r = Invoke-WebRequest -Uri http://www.pivotaltracker.com/services/v3/source_commits -ContentType "application/xml" -Method POST -Headers $headers -Body $body

    Write-Host $r
} else {
    # if nothing in the commit message matched the #12345678 syntax, then don’t bother calling the API.
}

To test it, first run the Powershell script directly from your command line. Say you’ve just committed revision 12345 to the “projects” repository, remembering to include [#12345678] in your commit comment where 12345678 is the PivotalTracker story ID.

PS D:\svn\data\repositories\projects\hooks> ./post-commit.ps1 –repopath D:\svn\data\repositories\projects –revision 12345

Your commit comment should appear in the story history within a few seconds.

Now check the batch file harness can run your Powershell properly – see the note below about revisions:

D:\svn\data\repositories\projects\hooks> post-commit.bat D:\svn\data\repositories\projects 12346

If that works, the whole kaboodle should work. A couple of caveats to watch out for:

  1. If your post-commit hook fails, your Subversion client will give you something cryptic about “Commit failed: MERGE … 200 OK” – which is not particularly transparent. Best to get things working on the command line first before testing via Subversion itself.
  2. I couldn’t get the same commit to produce multiple Pivotal tickets – when I was testing it, I had to create a fresh commit for each test run of the integration hook. This is possibly just Pivotal automatically de-duplicating identical stories – but worth knowing for test purposes

Saturday, 23 March 2013

Adventures in Querying with RavenDB

For the last couple of months, we've been working on a project that uses RavenDB as the main data store. We went with Raven for several reasons - it looked like a pretty good fit for what we were doing, we were using NServiceBus which now includes Raven as its default persistence store, and we were keen to see what it could do. Generally, it's been really, really positive. However, RavenDB is the first document database I've ever used, and so once in a while the learning curve just leaves me scratching my head in frustration...

So, last week, a request comes in for a straightforward one-off report. We're using Raven to store details about media clips - video and audio files - that are linked to our customers' online CVs, and the product owner wanted a list of customers along with the total duration of the media clips linked to those customers' CVs.

In SQL, this would be

SELECT Customer, SUM(Duration) as TotalDuration FROM MediaClips GROUP BY Customer

The two developers who actually built the system weren't in the office last week, so this request ended up on my desk. "Ah, no problem", thinks me. After all - it's a database; how hard can it be? I'd spent a bit of time building custom indexes to drive an SLA report we built last week, and with quite a lot of Googling, managed to get some really impressive results out of Raven.

Thing is - I'm not building a feature here, I just need to copy & paste a bunch of numbers into a spreadsheet and get on with my life. And I have no idea how you do that in Raven. I spend an hour or so trying to get Ronnie Overby's RavenDB provider for LinqPAD to install. This is a third-party library that isn't part of RavenDB, but I like LinqPAD, I'm pretty comfortable with Linq query syntax, and this looked like a good place to start. Half an hour later, after downloading the source, compiling it, creating a certificate, signing it and getting LinqPAD to recognise and install the driver, I gave up because I just couldn't work out how to get any data out of the damn thing.

Instead, I throw together a .NET console app and start playing around with indexes, working from "Using Linq to query RavenDB indexes". I get as far as creating a named map/reduce index that's pulling out the data I need - this takes a good hour or two of trial and error. The eureka moment is when I work out that you can define Raven indexes as raw LINQ code - they're not even embedded strings, they're actually strongly-typed, compiled Linq expressions that are passed in to the query definition. I get as far as this:

class Total {
  public string CustomerId { get; set; }
  public int Duration { get; set; }
}

public void BuildIndex() {
  var store = new DocumentStore() { ConnectionStringName = "raven" };
  store.Initialize();
  store.DatabaseCommands.PutIndex("ClipInfos/TotalDurations", new IndexDefinitionBuilder<ClipInfo, Total> {

    Map = clips => from clip in clips select new {
    Customer = clip.CustomerId,
      Duration = clip.DurationInSeconds
    },

    Reduce = results => from result in results group result by result.Customer into g select new {
      Customer = g.Key,
      Duration = g.Sum(x => x.Duration)
    }
  });
}

Now, Raven encourages "safe" query patterns. Which means it'll only give you 128 records per query by default, and if you run more than 30 queries in a single session it'll fail with an exception telling you you're doing something dangerous. The only way I can find to actually retrieve all the data I need is to circumvent these defaults to increase the batch size from 128 to 1024, and then repeated use .Skip(1024*batch).Take(1024) to pull records back 1,024 at a time and add them to my result set.

using (var session = raven.OpenSession()) {
  session.Advanced.MaxNumberOfRequestsPerSession = 512;
  while (true) {
    var totals = session.Query<Total>("ClipInfos/TotalDurations").Skip(1024 * batches).Take(1024).ToList();
    foreach (var total in totals)  {
      // Do something useful.
    }
    if (totals.Count() < 1024) break;
    batches++;
  }
}

At this point, I'm thinking one of two things:

  1. This is fine. It's an ad-hoc requirement, and if we end up doing this a *lot* I should read up on the index replication bundle - a plug-in that, I gather, will export indexed data into a relational DB for easier querying.
  2. This is completely wrong. It should not take three hours and a dedicated console application to run a simple ad-hoc query against a set of Raven data. I am missing something obvious...

As somebody who's been writing SQL for nearly 20 years, it's really frustrating to hit a brick wall like this... and it's been a long day, and I'm getting fed up, and I wander over to Twitter to have a bit of a rant and @ayende pops up:

image

Ah. Dynamic Queries. That looks interesting. So I head over to Raven Studio and sure enough, there's a "Dynamic Queries" facility. So I paste in a simple query and hit "Execute":

image

No results? Huh? But there's tens of thousands of documents in there! At this point I give up and go to the pub.

This morning, I go back to it and notice the little "i" icon next to the Query header, and see this...

image

Ah. I have no idea what Lucene syntax is... but that probably explains why none of my dynamic queries worked.

Raven is really impressive. We've managed to get it to do some very cool things; it's fast, and most of the time it Just Works. And there is an inevitable learning curve associated with adopting a new technology - particularly one that represents a paradigm shift in approach. It's kinda like learning Powershell and treating it as another CLR/.NET language... which is really, really frustrating, until you realize it's not "scriptable .NET", it's more like command.com on steroids. The shift from relational to document databases is much the same. That said - it would be really, really useful if the dynamic query tool in Raven Studio would distinguish between "I understand that query but could find no matching records" and "I have no idea what you are trying to do" - I have no idea how strict the Lucene query language or parser is, but knowing whether your query is syntactically valid or not would be a big help when you're trying to work out why it didn't return anything.

image