Using Find and Replace in LibreOffice Writer to Clean up Converted Text Documents in any Platform


If your work involves converting documents, and after the conversion you need to do a lot of Find and Replace to clean up the converted document, you might have found yourself needing to use Regular Expressions. Trying to use Regex in MS Word or Google Docs is a frustrating mess unless you are, perhaps, a developer. For the everyday user at work or home (WFH), trying to figure out Regex in the MS or G Suite environment is a mess, even if you use one of the Find and Replace G Suite add-ons. 
You can use Regular Expressions (Regex) to find and replace empty paragraphs; multiple, contiguous empty spaces; empty spaces at the beginning of a paragraph; etc.

As someone who works almost exclusively in a Linux environment, and whose work requires the conversion of hundreds of pdf, txt, and epub documents to .docx and .odt formats, my favorite (and only really useful) hack is to use LibreOffice and its far-superior-to-MS Word and G Suite "Find & Replace" function. 

The first step is, of course, to download and install LibreOffice to your Microsoft, Mac or Linux device (IT'S FREE!) (Spoiler Alert: I don't have a tablet (I find them pretty much useless for real work) and I don't try to do serious work on a smartphone.) It's free, and it works nicely alongside your default office suite. While you are in the installation mode, on the LibreOffice site search for and install the Alternative Find and Replace extension. I don't use it often, but it's a god-send when I need it, especially for Regex replacements.

If your Document is already in MS Word format, simply use LO Writer to open the .docx document; use Writer's F&R function (including Regex) to clean up the formatting of the converted document, then save it back to Word (docx). If you begin with a document stored in the cloud, download it, clean it up with LO Writer, save it with its original file extension, upload it back into the cloud.

If the document to be cleaned up is in G Suite/Google Doc format to begin with (which will need cleaning up only if you uploaded and converted a messy document to Google Drive to begin with), download it to your desktop (with either a .docx or .odt extension), clean it up using the LO Writer F&R (plus Regex), then upload it back to G Suite.

"That sounds like a lot of work," you say.

It's really not that much work, and in some situations it is by far the best solution. For Linux users and/or G Suite users--especially for the latter--you need some good, relatively quick hacks to fill in for those "can't be done" moments you experience when searching for Help on the internet. 

Special Note to G Suite/Google Docs Users:

Google Docs Find & Replace has a Regex function, but personally I find it very difficult to figure out. Google uses the RE2 version of Regex. I'm not sure what version LibreOffice uses, all I can tell you is it's not RE2, and for that I am thankful. Google's Help link for RE2 is not very helpful; there are a handful of programmer-type examples, and even those are not easy to figure out how to use, at least they aren't easy for non-programmers. Personally, I have fewer than ten Regex-type F&R's I need for my work, and for new ones or those I don't use often enough to memorize, I hate the frustration of trying to figure out the Regex on the Google Docs forums. Same, in general, applices to MS Word.

Enough, already. Where can I find Regex examples that this blogger uses?

Here's a blog post that we will update as we find new uses for Regex in our Find and Replace tasks




 


Comments

Popular posts from this blog

LibreOffice Freeze and Slowdown - "Memory" and "Undo" Settings

Creating an Epub file with a clickable TOC using Libreoffice and Google Docs

LibreOffice : Stop Breaking Your "Document Hyperlinks" (Document Links)