Last time we looked at geotagging and how the images we take with our GPS-enabled smartphones can broadcast more than a pretty picture. The same applies to our Microsoft Office files and Adobe PDFs.
When it comes to metadata, there are two types: file system metadata and metadata embedded in the document itself. Especially metadata inside of the document provide valuable history information about document.
The most famous metadata misstep came in 2003, during the walkup to the Iraq war. Tony Blair’s administration published a dossier on Iraq security and intelligence operations. Colin Powell famously quoted it during his address to the United Nations. But after the British government made the dossier public, it became clear via metadata that much of the information was cribbed from a U.S. researcher. The dossier’s metadata history soon spoke more than the information itself.
If the British government has metadata problems, the how do humble business and home users handle it? Actually, it isn’t that hard.
Each computer document format has its own set of metadata fields. They can range from “document creator”—usually the computer username or name of the software was registered to—date, time, computer information and sometimes even IP address, which sophisticated computer users can trace.
This actually has a host of benefits. Search engines, for example, can pick up keywords from metadata buried in MS Office and PDF files. Computer investigators routinely look at metadata to pinpoint when files were created, modified, printed and often recognizing backdated documents. As with hygienic web browsing and geotags, the key with metadata in office files is to be aware of what you’re sending when you’re sending it.
There’s a very easy way to see metadata in your office files. Simply right click on the file and select Properties. Under the Detail tab you’ll see what metadata is associated with that document. If you’re posting the file online and want to remove that information it is as easy as clicking “Remove Properties and Personal Information” at the bottom of the screen that is available in MS Office 2007.
If you want to have more granular control use “Document Inspector” in Office 2007, which is an integral metadata removal tool that strips Word, Excel, and PowerPoint documents of information such as author name, comments and other information.
This trick works for both Microsoft Office and Adobe PDF files. Or, with Word files, you can save your file as a PDF before sharing online.
In MS Office for Mac OS, metadata can be striped before saving the document by enabling the option “Remove personal information from this file on save” on the Preferences menu, under the Security tab. This applies to MS Word and MS Excel.
This conversion scrubs metadata and is available with MS Office 2007 as a save option (check document properties to edit saved metadata), or with Adobe PDF Print Engine 2 and free programs such as PDFCreator.
For more software specific information, see:
- Microsoft Office 2007
- Microsoft Office 2003
- Adobe metadata information
- Document Inspector Marquette University article
For the technical minded, the National Security Agency published a report in 2008 on the risks, and countermeasures, associated with metadata in PDF files, available here.
Life is tough, and technology can make life easier. But life is tougher if you are not educated aware technology user.