Page 1 of 1

2 columns into one

Posted: Fri Jan 26, 2024 10:54 pm
by medicina_khv
Hello,

There is a PDF document with two columns.
Image
link https://ncjindalps.com/pdf/BIOLOGY/Text ... 20Hall.pdf

We need to remove the headers and footers and convert the two columns into one.
How can you do this in Enfocus?
Is there a ready-made macro (sequence of actions)?

Sincerely,
Eugene

Re: 2 columns into one

Posted: Mon Jan 29, 2024 11:00 am
by freddyp
Text in PDF is described in terms of a piece of text that starts on an XY coordinate with a graphic state (font, color, ...). Starts on an XY coordinate implies that every line is a separate text. Has a graphic state implies that whenever something in the graphic state changes a new text segment is required: the lines with italics for example are made up of multiple text lines.

Recreating text blocks is therefore an exercise in reverse engineering and it is not an easy one, especially for longer texts, texts spread over columns interspersed with titles and typographical changes, etc.

PDF does have constructs that allow to give some structure to larger pieces of text, but this information has to be added when the PDF is being created. That is not often the case and without it you are lost. With it you are probably lost too because you still need software that interprets it correctly.

With PitStop Server you can extract the text along with its location and graphic state to an XML, so theoretically you could read that XML into a new document and ignore the location (to a certain extent). Perhaps an Indesign extension like EasyCatalog can get the job done, or one or other SGML software, but it basically means that you have to start from scratch whereby you can only recover the text from the PDF.

I will send you a box of chocolates (Belgian of course) if you find a solution that converts a two-column PDF page to a one-column PDF page directly.

Re: 2 columns into one

Posted: Mon Jan 29, 2024 11:24 am
by JimmyHartington
As Freddy mentions I think this not possible directly in Pitstop Pro.

But you could use Pitstop to strip the elements you do not need.
Save the pdf as text or RTF from Acrobat.
After that you will have to lay-out the document again.
And do a lot of cleanup.

I made a test on one page. Acrobat is clever enough to read the columns.
But hyphenations are not handle very good. Also the heading is not saved into the RTF-file.

Image