PDF Cross-file Storage Reduction

 


Zip Download

Chinese Version

Difference-based PDF storage optimization that is a self-consistent set of files without database.

Difference-based storage reduction for batch PDF files is a patent-pending technology that employs a distinct way of resolving the challenge of data de-duplication in the post-production stage, by identifying and removing the duplication data at physical level yet in a non-destructive way. This is particularly challenging in that the original data source is inaccessible or unusable as the PDF files are already generated and focus must be put on identifying duplicated data.

pdfx-demo-2

The scheme has the following advantages that are not easily found in other solutions. Contact us about a customizable solution that suites your system architecture, including online applications.

  • Migratible – Standalone set of files without being bound to a database.
  • Portable – All files are well-formed PDF files in syntax.
  • Minimal – Files contain only completely compressed stream data.
  • Fast – Based on optimized object hashing and sorting.
  • Non-destructive – Atomic operations on object level.
  • Reliable – Made possible by object import algorithm to restore data.

pdfx-demo-3

In order for you to try out this brand-new approach, we’ve made a Demo for download. With this Demo you can process up to 300 files. The assumption is that these files are similar in composition and resource usage, and are stored in a same folder. You should add all files by clicking on the “Input” button, and then specify a folder for creating output by clicking on the “Output” button. The resultant *.pdfz files are well-formed PDF files syntax-wise but needs to be opened from within this program, in order to fetch necessary bytes from a template file to create a new document in the system temporary folder. Some files are used as templates, some are skipped if they are encrypted, corrupted, or too small.

As a demo, the correlation between a PDFz and its corresponding template is not persisted, and the file will not be opened once the program is closed. In real life application, such correlation should be stored in a database or a disk file for future look-up.

This program updates itself automatically.

About the Author: Cyphia