Common Regex Find & Replace strings used by this Blogger.

As I have stated in other posts, I do a lot of document conversions from pdf, txt and epub to .docx, .odt and Google Docs.Most of the documents are between 5 and 500 pages, occasionally stretching to 700 or 1,000+ pages. I work in a Linux environment, so my most common means of converting documents are Okular, Tesseract (command line), G Suite, and Calibre--in that order, with Okular being the most-often used. In some cases, the documents I am converting have already been scanned into plain text (txt) and I am simply downloading, then converting to final format (usually .odt or .docx) so that I can add images, footnotes, comments, etc., often reconverting the final product back to pdf and/or epub formats. All that is more fully explained in this post. 
Below are some of the more common (and not-so-common) Regex and non-Regex hacks I use. Again, I work strictly in a Linux OS environment, but much of my work is converted to MS Work and Google Docs, so I use LibreOffice's built-in Find and Replace, along with an occasional foray into the Alternate Find & Replace extension, to clean up converted documents.

Some common Find &Replace hacks, using Regular Expressions: (All within LibreOffice Writer)

(non-Regex) Replacing multiple, contiguous spaces with a single space.Many of the documents I download and convert have a lot of multiple-spaces between words, sometimes as many as 30, or as few as 2, spaces between words. This is really simple, so you probably cannot screw it up, even if you try. 
In the Find box, hit your space bar multiple times; for an extremely long document with varying multiple spaces between words, I usually start with five taps of the space bar in the Find box, then 1 space in the Replacement box -- Replace All. Then I go back to the Find box and delete one space, meaning I now replace four spaces with 1 space. I do the same for three spaces, then two spaces. This will still leave some instances of two spaces, but these get cleaned up later. 




Comments

Popular posts from this blog

LibreOffice Freeze and Slowdown - "Memory" and "Undo" Settings

Creating an Epub file with a clickable TOC using Libreoffice and Google Docs

LibreOffice : Stop Breaking Your "Document Hyperlinks" (Document Links)