How much of a large collection of files is spatial data?

We need to determine how much of the 6.5 TB used space on a shared network drive is geographic or spatial data. It's a squirrels nest, built up over 3 decades from hundreds of people (though by volume mostly in past 3 years) across all walks of business using a variety of naming and filing standards -- including none and the dreaded "misc".

"Spatial" in this context is anything produced by ArcView 3, ArcInfo, ArcGIS, PCI, ER Mapper, Global Mapper, QGIS, DNR Garmin, Basecamp, MapInfo ...

My plan of the moment is a simple brute force itemization of file extensions, at least to get us some ballpark estimates.

Singular types like shapefiles and ArcInfo grids & coverages are easy as the extensions .shp, .shx, .adf, ... aren't used by anything else.

Shared types like DBase .dbf and images .tif, .jpg, offer complication, though ones larger than 300 MB are very likely spatial.

Before I hack together a personal and idiosyncratic solution. Has anyone solved this already?


