PRESS RELEASE

Word to PDF Bulk Conversion

Word to PDF Bulk Conversion and Embedded Object Extraction

Data Strategies Interchange today is pleased to announce the successful completion of Word to PDF Bulk Conversion and Embedded Object Extraction for a major Life Insurance company.

The client had around half a Million .WDBN (Word Binary files – Word 3/4/5 Created on Macintosh) along with Old Microsoft Word files (.DOC) and New Word Files (.DOCX) with total size of 80GB ranging from the year 1993 to 2019. For portability and accessibility, client decided to convert these files to PDF.

DSI was able to execute this project in a month with following steps:

  1. Pre-Process: DSI ran a pre-process using its proprietary tool to scan all the documents to identify file types and categorize documents into the following 3 groups: Documents with only Text, Documents with only Embedded Objects and Documents with Text and Embedded Objects. Other file types were discovered other than word files (PDF, TIFF, HTML, RTF, and TXT)
  2. Extraction of Embedded Objects and Conversion to PDF: DSI ran the following process based on each group type:
    • Documents with only Text: Convert Word to PDF
    • Documents with only Embedded Objects: Extract Embedded Objects
    • Documents with Text and Embedded Objects: Convert Word to PDF and Extract Embedded Objects
  3. Quality Check: DSI provided a detailed report containing information like the number of words in the source documents, list of extracted embedded objects with file locations, source, and target file sizes, and the list of password-protected or corrupted source files.

A major challenge that DSI was successfully able to handle during the Word to PDF Bulk Conversion was to identify the various versions of Microsoft Word files and to handle conversion to PDF and extraction of Embedded Objects differently based on different versions.

Further information about DSI’s document archive migration services can be found here.

POST A REPLY