Initial clean-up of converted or downloaded pdf, epub and txt documents using LO Writer Find and Replace


Scenario: Your downloaded .txt or converted .pdf file has a paragraph break after each line, but you want a paragraph break only at the actual or "real" end of the paragraph. Here's how you take care of that in three easy steps in LibreOffice Writer.
Perennial Reminder: Depending on the length of the document, some Find and Replace tasks can take a long while. Take a break and let your F&R do its thing, sometimes as long as five or ten minutes. Be patient. Go to your browser and post something on Facebook or Twitter. Go to the kitchen and make a sandwich. Most Replacements are pretty quick. 
First, you should replace the actual or "real" end of each paragraph with a placeholder; the reason will become obvious later in these instructions. So here, we are replacing each instance of the "real" paragraph end (highlighted) with ".9999". Later, we will replace all instances of 9999. Below is the find and replace instruction: (note that  the end of this paragraph (and all other paragraphs in the document is a period, followed immediately by the paragraph-end mark, thus the Regex is \.$ If there is a space between the period and the paragraphh end mark, change the Regex to \. $ then "Replace All."

Step 1.

Second, now you need to delete all the remaining paragraph marks (see highlights in instruction below), on your way to making sure that only the "true end" of the paragraph is followed by a paragraph mark. To do that, you Find all paragraph marks, and Replace with a space, as illustrated below (you can't see the space in the Replace box, but trust me, it's there. Just tap your space bar in the Replace box when completing this step).
Warning! Frustration Ahead!... if you don't enter a space in the Replace box. This matters, because without the space you might wind up with hundreds or thousands of error mark-ups where there are two words without a space between them. If you wind up with two spaces between words using these instructions, that's simple enough to fix; better safe than sorry.
Step 2.

Third, let's replace all those "placeholder" 9999 instances we created, and at the same time create paragraph marks only for the true end-of-paragraph sentences. The F&R looks like this: (unclick "Current Selection)
Step 3.
Finally, two quick clean-ups you can/should do. No pics, they are simple. 
First, in Find and Replace, unclick Current Selection and Regular Expressions. Find: two spaces (tap spacebar twice). Replace: one space (tap space bar once).
Second, Find: - (hypen, followed by single space) and Replace with (blank, i.e., "nothing")
In just a few minutes time, you have done mosf of the major cleanup of your converted document (depending on how good the original document was). 

Undoubtedly, for longer documents, and depending on the initial quality of the conversion, you will need to make further adjustments. Words will be split and misspelled, footnotes and end notes might be in the wrong place, some letters or words will be gibberish after conversion for a number of reasons (some people like to scribble notes in books and documents, or spill coffee on them, of any number of other man-made disasters).






Comments

Popular posts from this blog

LibreOffice Freeze and Slowdown - "Memory" and "Undo" Settings

Creating an Epub file with a clickable TOC using Libreoffice and Google Docs

Inline Styles Have Arrived in LibreOffice 25.n