Wednesday, November 12, 2014

Solid PDF to Word for Mac - Version 2.1

We have released an update to Solid PDF to Word for Mac - version 2.1.

Read more ...

Monday, June 23, 2014

Version 9 - Scanning - WIA and TWAIN support

Question: What scanner support is included with Solid PDF Tools v9?

Answer: We have added WIA support into version 9 as too many customers only had WIA support for their scanning devices. Now you can choose between TWAIN and WIA. We also pop up the manufacturer's device scanning dialog so you can use all the features that your particular scanner supports.

Scanner Drivers: TWAIN or WIA

TWAIN and WIA are standards for communicating with imaging devices - scanners, in our case, and software applications.

"The word 'twain' is an archaic form meaning 'two'. It appears in Kipling's 'The Ballad of East and West' - '... and never the twain shall meet ...', reflecting the difficulty, at the time, of connecting scanners and personal computers. It was up-cased to TWAIN to make it more distinctive. This led people to believe it was an acronym, and then to a contest to come up with an expansion. None were selected, but the entry 'Technology Without An Interesting Name' continues to haunt the standard." - from Free On-Line Dictionary of Computing
TWAIN drivers are provided by your scanner's manufacturer (Brother, Canon, Epson, HP, etc.). When you run Solid Converter it discovers what TWAIN drivers you have installed so it can talk to your scanning devices.

WIA (Windows Imaging Acquistion) is from Microsoft and is an utility that allows you to scan into any application that supports WIA.

Wednesday, June 18, 2014

Version 9 is available

Solid Documents has just released version 9 of all our Windows software products:

Desktop Software:


Free, fully functional trials are available on our website.

Changes

New features include File | Print, Search and a new scanner interface (supporting TWAIN and WIA drivers). 

We continue to focus on improvements to our industry leading PDF conversion engine, OCR and overall document reconstruction process. 

Monday, January 13, 2014

PDF Files - Converting to Word

Question: When I convert a PDF file to a Word document not all of the characters are correct.  Why does this happen?

Answer: Not all PDFs are made the same way.

PDFs - you can say there are basically two types of PDF files - scanned from paper and created on a computer.


Scanned PDFs - pages are images. OCR (optical character recognition) has to look at all the dots of ink and figure out what is what.

Based on patterns our software tries to determine characters, images, font, etc. Some dots can be noise or can be diacritics. We try to figure this out as best we can. Quality of scan and font type certainly affect analysis and success of detection. 

Computer generated PDFs - usually these types of PDFs include font information (PDFs are built with mapping/encoding details, they do not include characters, words, sentences, paragraphs, headers/footers, details). We figure all this stuff out and reconstruct the document with structure. Some PDF creators do not use standard encoding so we have to again do some detective work to try and determine what is in the PDF. If the font has been computer generated and we have details included in the PDF and the computer you convert to Microsoft Word has the same font installed, then you should get these other fonts in the Word file. For example, we convert CJK (Chinese, Japanese, Korean) PDFs that have been computer generated into correct Word documents.

Fonts - if we have the information from the PDF or we determine the language/font and it includes diacritics we use the correct characters for the font/language. We have done a lot of work to support foreign languages with our OCR and again, with our standard PDF conversion software.

Forcing OCR on computer generated PDFs - turning on text recovery for all characters and pages forces all PDFs through our OCR processing which is not the best solution. With non-standard encoded documents it can provide a better conversion, but if your PDF is a standard computer generated one then our default document reconstruction processing is best.