Using Find and Replace in LibreOffice Writer to Clean up Converted Text Documents in any Platform
You can use Regular Expressions (Regex) to find and replace empty paragraphs; multiple, contiguous empty spaces; empty spaces at the beginning of a paragraph; etc.
As someone who works almost exclusively in a Linux environment, and whose work requires the conversion of hundreds of pdf, txt, and epub documents to .docx and .odt formats, my favorite (and only really useful) hack is to use LibreOffice and its far-superior-to-MS Word and G Suite "Find & Replace" function.
The first step is, of course, to download and install LibreOffice to your Microsoft, Mac or Linux device (IT'S FREE!) (Spoiler Alert: I don't have a tablet (I find them pretty much useless for real work) and I don't try to do serious work on a smartphone.) It's free, and it works nicely alongside your default office suite. While you are in the installation mode, on the LibreOffice site search for and install the Alternative Find and Replace extension. I don't use it often, but it's a god-send when I need it, especially for Regex replacements.
If your Document is already in MS Word format, simply use LO Writer to open the .docx document; use Writer's F&R function (including Regex) to clean up the formatting of the converted document, then save it back to Word (docx). If you begin with a document stored in the cloud, download it, clean it up with LO Writer, save it with its original file extension, upload it back into the cloud.
If the document to be cleaned up is in G Suite/Google Doc format to begin with (which will need cleaning up only if you uploaded and converted a messy document to Google Drive to begin with), download it to your desktop (with either a .docx or .odt extension), clean it up using the LO Writer F&R (plus Regex), then upload it back to G Suite.
"That sounds like a lot of work," you say.
It's really not that much work, and in some situations it is by far the best solution. For Linux users and/or G Suite users--especially for the latter--you need some good, relatively quick hacks to fill in for those "can't be done" moments you experience when searching for Help on the internet.
Comments
Post a Comment
Only comments that improve or disprove the contents of the posts on this blog will be approved. Opinions and speculations generally will not be approved. "Self-serving" links will not be approved. Product and advertising links will not be approved, but plain text recommendations might be approved. No form of vulgarity or cursing will be approved. No personal disparaging remarks will be approved. All comments become the property of this blog immediately upon the member's/reader's posting of the comment. All comments may be rejected or edited without recourse to or by the commenter. By posting, you agree to hold harmless this blog, its owner, editors, administrators and contributors, even if your post is approved as-is.