Accession Number : ADA483465


Title :   Automated Metadata Extraction


Descriptive Note : Master's thesis


Corporate Author : NAVAL POSTGRADUATE SCHOOL MONTEREY CA


Personal Author(s) : Migletz, James


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a483465.pdf


Report Date : Jun 2008


Pagination or Media Count : 83


Abstract : Metadata is data that describes data. There are many computer forensic uses of metadata and being able to extract metadata automatically provides positive forensic implications. This thesis presents a new technique for batch processing disk images and automatically extracting metadata from files and file contents. The technique is embodied in a program called fiwalk that has a plug-in architecture allowing new metadata extractors to be readily incorporated. Output from fiwalk can be provided in multiple formats such as ARFF and text. The plug-ins created for this thesis include one created by Simson Garfinkel for extracting metadata from .jpeg files, two for Microsoft Office documents (one for prior to Office 2007 release and one for Office 2007 release), and a default plug-in for extracting metadata from .gif, .pdf, and .mp3 files. To better understand the metadata available in common file formats such as .doc, .docx, .odt, .pdf, .mp3, .mp4, .jpeg, .tiff, and .gif, an examination of these formats is provided.


Descriptors :   *FEATURE EXTRACTION , *METADATA , *AUTOMATIC , BATCH PROCESSING , INFORMATION RETRIEVAL , THESES , AUTOMATION , COMPUTER FILES


Subject Categories : Computer Programming and Software


Distribution Statement : APPROVED FOR PUBLIC RELEASE