Indiana Jones and the Last Spreadsheet

We recently introduced our ten-year-old daughter to the Indiana Jones movies. Even the fourth one, but let’s not talk about that. At the very end of the original Raiders of the Lost Ark, there’s a scene in which the Ark of the Covenant is boxed up and placed in a warehouse surrounded by what looks like tens of thousands of identical boxes. The modern equivalent of its desert resting place is not an underground tomb, guarded by snakes and poison darts, but a total immersion in endless, bureaucratic sameness. Government_WarehouseNow, I don’t know if you’ve ever tried to implement a “Spreadsheet Amnesty” but if you have, you’ll know it is exactly like that. Ex. Act. Ly.

For the uninitiated, a spreadsheet amnesty is an essential part of any data migration project. Essentially, the problem you have is that the old system you’re setting out to replace is total rubbish. After all, if it wasn’t, you wouldn’t be there. the staff are clever people with a job to do, and they can’t be hampered by this bad system, so they’ve created all manner of spreadsheets, access databases, and who-knows-what to record all their day-to-day working data in. If you ignore it, you’ll be starting out with an incomplete system, missing key data. So, you have to ask everyone to identify their data sources so you can scoop them all in and use them to plug gaps in the main database. Sometimes, you might come across political impediments; in all likelihood, internal business teams have been waging a war of attrition against these spreadsheets for years, and I’ve heard of cases where, when they get onto the project board there’s been a desire to exert some influence to “punish” the offenders by ruling these contraband data sources as invalid and out of scope. If you go down that route, you are condemning the project to repeat the mistakes of the past. Hence the name “amnesty”. No blame. Everyone is welcome, and so is their data.

But politics aside, when you go out and trawl through the spreadsheets, it can be like searching for a needle in a large stack of identical needles. Or, if you prefer, a sacred relic in a warehouse full of fake sacred relics. The trouble is, there are often hundreds of spreadsheets which often seem to be capturing slightly different view of the same data, and a lot of the time, it’s hard to pick out the ones that have unique data being used to drive a specific business process, not just a mish-mash of items drawn from other places. I thought it would be helpful to list out a few questions that are worth asking when deciding which box to pry open in your search for the ark.

So you visit the team room, and ask them to show you the spreadsheets they’re using day-to-day in addition to recording in the case recording system, which for the sake of argument, I’ll refer to as “Disappointech”. As each one is produced you ask:

Is this a report from Disappointech?

Is the answer is yes, it’s not a data migration source. Whatever is in it must be in the core system already. As an aside though, if this is used as part of business-as-usual, you need to make sure the business analysts who are configuring the software are aware of it, and that the new system will produce an equivalent report.

Is there anything here that’s not in Disappointech?

If the answer is no, the team might just have been compiling or copy/pasting the information into a spreadsheet for ease of use. Again, there’s nothing here that you need to migrate but it’s worth thinking how the new system can save them having to do such a tedious, time-consuming chore. Maybe a report could help?

Is it information you need to have in the new system?

OK, so we’ve established that this is unique information that can’t be found in Disappointech and only exists in the spreadsheet. But does it belong in the new system? If it’s about a social worker’s annual leave, say, or contributions to the coffee fund, or the staff Eurovision sweepstake then the answer is probably no and you can move on. More likely, it’s something borderline: information that’s pretty close to the type of data you need to bring in but not quite in scope, and you might have to get a project decision on which side of the fence it lies on so you don’t get into “scope creep”.

Is the data about a specific person? 

If the data is aggregated – for example, if it shows – such-and-such a percentage of visits done on time in each month, say, but you can’t identify specific dates for a specific service-user in there then this is not likely to be usable. Most social care systems have to associate each item with a specific client record and it isn’t possible to go from a statistic back to the source data that went into producing the statistic.

What’s the quality like?

Having established that there’s something here that the project needs to have, you’ll need to assess whether it’s tidy enough to import. Is each service user’s Disappointech ID present and correct? Is each item on its own row? Do columns with dates in all have real dates or do some of them have things like “To Be Confirmed” or the dreaded “N/K”? Do the team store all the data in this one spreadsheet or do they have multiple copies, say one for each financial year?

If the quality is low, you’ve got a few choices. you could improvise and write some fairly complicated code that tries to work around the problems. That’s a high-risk strategy because it’s quite likely to need a lot of maintenance as new rows get added to the sheet, introducing new problems. A second option is to ask them to transfer the data to a new template which you can set up with lots of validation so that it has to be filled in in a certain way. Lastly, it’s always worth considering whether all this pain is necessary and whether – if the list is fairly short – they could manually transfer the data by keying it in on day 1, perhaps with a bit of help from someone who knows the new software well. It’s surprising how often this is the best solution for everyone.

indiana-jones-boulder-oOK, I think that covers Excel spreadsheet. I was going to write about how to cope with poorly designed Access Databases as well, but instead, here is a visual metaphor to describe what that feels like:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s