Linux Bash Magic - Fixing Issues on Half a Million HTML Pages with One Line of Code

Aug 06, 2024

Related to:: Linux - SysAdmin

I work for a company that, besides other things, publishes static websites and each of them has several .html pages.

We encountered a significant issue with a script used on all of these pages, which required a fix for over 500k pages.

One potential solution was to redeploy all these pages, a task that would have placed a huge strain on our infrastructure.

As a fan of Linux terminal commands and Vim, which I use daily, I devised a solution that would only change the line with the problem in all those files.

Since the solution only changes the .html files as text files, this would require very little from our servers, so I created a bash script where the central part combines find, grep, and sed.

Simply put, I had to replace bad-script.js with good-script.js on all those pages.

Since the files are stored in multiple folders and subfolders, first, I have to find all the .html files.

find . -type f -name "*.html"

Since I had a lot of files that didn’t use the bad-script.js I wanted to filter those out to avoid unnecessary changes. So, I added a grep part to my previous script:

find . -type f -name "*.html" \
    -exec grep -l '<script src="https://bad-script.js" crossorigin="anonymous" defer></script>' {} \;

Now I can find all .html files with bad-script.js in them and, for a final step, I just need to replace the things I wanted in them.

Here is where the sed part comes in handy. It can search for a string and replace it with another, much like I use all day in Vim.

And that completes the core part of the script I ran:

find . -type f -name "*.html" \ 
    -exec grep -l '<script src="https://bad-script.js" crossorigin="anonymous" defer></script>' {} \; \
    -exec sed -i 's|<script src="https://bad-script.js" crossorigin="anonymous" defer></script>|<script src="https://good-script.js" defer></script>|g' {} \;

With the constant technological advancements and shiny new frameworks we have, it's important not to forget the reliability of tried-and-true old-school tools like these.

In summary:

  • find locates all .html files
  • grep filters the ones containing the string I need
  • sed replaces the string with another one.

Graph View