2 columns into one

Post Reply
medicina_khv
Newbie
Posts: 1
Joined: Fri Jan 26, 2024 4:22 am

2 columns into one

Post by medicina_khv »

Hello,

There is a PDF document with two columns.
Image
link https://ncjindalps.com/pdf/BIOLOGY/Text ... 20Hall.pdf

We need to remove the headers and footers and convert the two columns into one.
How can you do this in Enfocus?
Is there a ready-made macro (sequence of actions)?

Sincerely,
Eugene
freddyp
Advanced member
Posts: 1023
Joined: Thu Feb 09, 2012 3:53 pm

Re: 2 columns into one

Post by freddyp »

Text in PDF is described in terms of a piece of text that starts on an XY coordinate with a graphic state (font, color, ...). Starts on an XY coordinate implies that every line is a separate text. Has a graphic state implies that whenever something in the graphic state changes a new text segment is required: the lines with italics for example are made up of multiple text lines.

Recreating text blocks is therefore an exercise in reverse engineering and it is not an easy one, especially for longer texts, texts spread over columns interspersed with titles and typographical changes, etc.

PDF does have constructs that allow to give some structure to larger pieces of text, but this information has to be added when the PDF is being created. That is not often the case and without it you are lost. With it you are probably lost too because you still need software that interprets it correctly.

With PitStop Server you can extract the text along with its location and graphic state to an XML, so theoretically you could read that XML into a new document and ignore the location (to a certain extent). Perhaps an Indesign extension like EasyCatalog can get the job done, or one or other SGML software, but it basically means that you have to start from scratch whereby you can only recover the text from the PDF.

I will send you a box of chocolates (Belgian of course) if you find a solution that converts a two-column PDF page to a one-column PDF page directly.
User avatar
JimmyHartington
Advanced member
Posts: 310
Joined: Tue Mar 22, 2011 7:38 am

Re: 2 columns into one

Post by JimmyHartington »

As Freddy mentions I think this not possible directly in Pitstop Pro.

But you could use Pitstop to strip the elements you do not need.
Save the pdf as text or RTF from Acrobat.
After that you will have to lay-out the document again.
And do a lot of cleanup.

I made a test on one page. Acrobat is clever enough to read the columns.
But hyphenations are not handle very good. Also the heading is not saved into the RTF-file.

Image
Post Reply