# Saturday, July 25, 2009
A client asked me to help recently with a small mystery. They had a database provided by a customer and they'd been asked to import the contents into the tables used by their own product. One of the tables had a BLOB column and from context they were quite sure it was used to hold scans of documents. There was even a "filename" column and a "filetype" column that suggested very strongly the scans were stored as TIFFs.

It had taken a while to find code to read the blobs, and when they ran it, the resulting file was rejected as not being a valid TIFF. They weren't sure if they were handling the blobs wrongly, if the data was encrypted, or if it was some other image format (they had tried PDF and GIF already.) In a highly enjoyable two hours, here's what I did:
  • Found short (ten lines including initialization and cleanup) code to read one blob in VB6 and save it to disk.
  • Found the TIFF format details, looked in the resulting file with notepad and confirmed it didn't start either II or MM and so wasn't a TIFF.
  • Looked at a few other file formats but wasn't really gaining any knowledge, just ruling things out that you could rule out by renaming and double clicking, then having the file rejected by the app that tried to open it.
  • Discovered Marco Pontello's absolutely cool File Identifier, TrID, and downloaded it
  • Removed the extension from what had been test.tif, pointed TrID at it, and was told 100% it was a zip file. Duh, the file started PK, I might have guessed that one.
  • Renamed it to test.zip, unzipped it by hand -- ooh, it IS a zip! -- and was rewarded with file.txt for my trouble
  • Looked at file.txt in notepad and noticed that it was full of binary-looking gibberish, but it DID start with II
  • Hand renamed file.txt to file.tif and double-clicked it
  • Presto! A scan of a document!
I left my client to write the code that did all the blobs, including unzipping them and renaming (every single blog contained a zip which  contained a TIFF renamed to file.txt and no, I don't know why) from within a quickly written importer application. The big mystery was solved. Thanks, Marco!

Kate

Saturday, July 25, 2009 12:13:01 PM (Eastern Daylight Time, UTC-04:00)  #