Posts

Showing posts with the label Document Conversion

Connect single word in a paragraph to preceding and following paragraphs, using AltSearch in LibreOffice Writer

Image
Let me say first that this will likely be a hack that most of you will never need, but for those who do need it, it's a godsend.  Background I convert a LOT of public domain books from their original formats (pdf, epub, txt, etc.) into cleaned-up, easy-to-read and easy-to-use formats, mostly into .odt files. One of the sources of those Public Domain files is Early English Books online . This fix is predicated on EEBO as the source material, but anyone who uses any method of converting text from one format to another might have a need for this fix. Notice The fix is predicated on (1) the use of LibreOffice Writer, and (2) the installation of of the LibreOffice add-in "Alternate Search and Replace" ( AltSearch ).  The Issue Unfortunately, when I search for a book on EEBO, then choose to copy "All Text" and paste it into a new LibreOffice Writer (.odt) document, most likely the word for which I searched in EEBO, in this case "God," is left hanging out in ...

Common Regex Find & Replace strings used by this Blogger.

Image
As I have stated in other posts, I do a lot of document conversions from pdf, txt and epub to .docx, .odt and Google Docs.Most of the documents are between 5 and 500 pages, occasionally stretching to 700 or 1,000+ pages. I work in a Linux environment, so my most common means of converting documents are Okular, Tesseract (command line), G Suite, and Calibre--in that order, with Okular being the most-often used. In some cases, the documents I am converting have already been scanned into plain text (txt) and I am simply downloading, then converting to final format (usually .odt or .docx) so that I can add images, footnotes, comments, etc., often reconverting the final product back to pdf and/or epub formats. All that is more fully explained in this post.   Below are some of the more common (and not-so-common) Regex and non-Regex hacks I use. Again, I work strictly in a Linux OS environment, but much of my work is converted to MS Work and Google Docs, so I use LibreOffice's built-in Fi...

Using Find and Replace in LibreOffice Writer to Clean up Converted Text Documents in any Platform

Image
If your work involves converting documents, and after the conversion you need to do a lot of Find and Replace to clean up the converted document, you might have found yourself needing to use Regular Expressions. Trying to use Regex in MS Word or Google Docs is a frustrating mess unless you are, perhaps, a developer. For the everyday user at work or home (WFH), trying to figure out Regex in the MS or G Suite environment is a mess, even if you use one of the Find and Replace G Suite add-ons.  You can use Regular Expressions (Regex) to find and replace empty paragraphs; multiple, contiguous empty spaces; empty spaces at the beginning of a paragraph; etc. As someone who works almost exclusively in a Linux environment, and whose work requires the conversion of hundreds of pdf, txt, and epub documents to .docx and .odt formats, my favorite (and only really useful) hack is to use LibreOffice and its far-superior-to-MS Word and G Suite "Find & Replace" function.  The first step i...