When I did my first ever migration, I used a tool that had been designed for the purpose by another developer in Microsoft Access. It was interesting and certainly simple enough for a beginner to use, and I think it was probably the right tool for then, but it had some major flaws. It was slow and temperamental. Sometimes when working with large text fields, I’d have to take a Long (an outmoded Oracle text field) and put it into a CLOB, both pretty awkward things to work with, but when you’re using an access Note field as your bridge, all bets were off. It would chop off great chunks of text but not tell me there was a problem. “Oh yes, Colin”, it would say, cheerfully, “I’ve loaded all the rows, don’t you worry”. Only later would I be told that when the testers looked, all the notes stopped abruptly at the 4000 character mark. It was a bit picky about what inputs it would use, too. CSV or nothing. In short, it had its limitations. These days, I need something with a bit more oomph, and I think I’ve found it in a fantastic product called Talend Open Studio.
Talend is an open source product built around the eclipse platform. It is made up of several distinct components. I use the data integration tool and the data quality tool, but they come in Cloud, Big Data, MDM and various other cool flavours too, if you need those things. And the best part? The products are all free to use if you are working at the scale of only one developer. The company even runs a very good online support system which is open to the free customers too. So how the heck do they make money? Speaking as someone who once tried to get them to take some of mine, I’m not really sure, because their very lovely, helpful programmers didn’t seem to be all that interested. However, they must be doing it, because I’ve been to their office and it was very nice. The revenue flow comes from a fully integrated platform product which does cost extra and which allows you to move swiftly from data quality to data integration with zero faffing about, collaborate with other developers, and generally scale the whole thing up. I can’t tell you how utterly impressed I am with this attitude to distributing software. Charities and small outfits with relatively modest requirements can get by with a great product that they look after themselves, while there’s an option of going to the next level if you happen to belong to a larger enterprise.
This review is only of the free version of the two products I’ve used. They are all you need to run a data migration. Or rather, they’re all you need to handle the technical aspects. If you want to know what you’ll need for the more “projecty” aspects, get yourself a copy of “Practical Data Migration“.
Talend Open Studio for Data Quality
If you’re lucky enough to have any data quality in your organisation, this tool will find it! ;^)
The backbone of this version of Talend is similar to the Integration tool. You get an Eclipse workspace with folders to store various sets of enquiries into the state of the data in the system. These can be at several levels. You can do an analysis of the entire database to see how many rows are in each table and what sort of general structure it has, or you can get closer to a set of tables, the relationships between the values in two columns say, or at an even finer level look at the patterns of data: Do the postcodes all match the regex for a valid postcode, for example. How many of the customer names are null? What’s the longest value in a text field? What are the outliers in a set of loan amounts? Are any of the dates insanely far in the future/past? Results are usually presented in bar charts but there are other formats too, like the nice correlation shown in the picture. In each case, you can drill down to get a closer look at problem rows. This is a great way to convey to stakeholders what the issues are in the data and what needs to be addressed right at the start. The GUI is easy to use and – with a little fiddling around – you can soon work out how it all works and get some real work done.
Talend Open Studio for Data Integration
When you’re ready to start extracting, this is the next step. The integration tool has connectors. that can hook into most common database types and quite a few I hadn’t heard of. Values extracted from one datbase can be joined with external data, manipulated, mapped using a table from an excel sheet and have functions applied to them before they are reinserted into a table in the target database. Exception lists and logs can be dropped into a CSV or other flat file for a cleanser to look at, and a stored procedure called to kick off any cleansing routines you might be using. The separate steps can be joined together using connectors and orchestrated any way you like. I’ve seen people do this with Visual Studio but I’ve never been in a standardised SQL Server-only environment where that’s been an option, so it’s really good to have the same functionality but with a lot more flexibility (not to mention colour!) in this product. In short, it’s as easy as building lego models but quite a bit more productive.
In general, I’ve found it pretty reliable, stable and trouble-free. It can be a little slow, especially if you’re working outside the LAN (Hint: Remote desktop!). I daresay performance is quite dependent on the resources of the machine you’re on too, of course, but you can set things going in parallel so as to get them all in in whatever window you have available.
What else is there?
There are other software products available of course, and in some cases you might need them. I’ll try and do some more tech reviews as and when I get a chance to get my hands on them. I must say though, from what I’ve seen of the others none is as well-rounded as this one. Pandora strikes me as too limited for proper ETL (although I’ll admit I’ve only seen an older version), and some of the others seem to be geared up to very specific challenges around technical limitations of moving bulk data. None of them seem like they can do a complete, end-to-end process on anything like this kind of diverse set of data sources, but by all means, if anyone knows of anything better, I’d love to hear about it. Not MS Access though. Sorry, been there, done that.