Linux Bash Magic - Fixing Issues on Half a Million HTML Pages with One Line of Code
I work for a company that, besides other things, publishes static websites and each of them has several .html
pages.
We encountered a significant issue with a script used on all of these pages, which required a fix for over 500k pages.
One potential solution was to redeploy all these pages, a task that would have placed a huge strain on our infrastructure.
As a fan of Linux terminal commands and Vim, which I use daily, I devised a solution that would only change the line with the problem in all those files.
Since the solution only changes the .html
files as text files, this would require very little from our servers, so I created a bash script where the central part combines find, grep, and sed
.
Simply put, I had to replace bad-script.js
with good-script.js
on all those pages.
Since the files are stored in multiple folders and subfolders, first, I have to find all the .html
files.
find . -type f -name "*.html"
Since I had a lot of files that didn’t use the bad-script.js
I wanted to filter those out to avoid unnecessary changes. So, I added a grep
part to my previous script:
find . -type f -name "*.html" \
-exec grep -l '<script src="https://bad-script.js" crossorigin="anonymous" defer></script>' {} \;
Now I can find all .html
files with bad-script.js
in them and, for a final step, I just need to replace the things I wanted in them.
Here is where the sed
part comes in handy. It can search for a string and replace it with another, much like I use all day in Vim.
And that completes the core part of the script I ran:
find . -type f -name "*.html" \
-exec grep -l '<script src="https://bad-script.js" crossorigin="anonymous" defer></script>' {} \; \
-exec sed -i 's|<script src="https://bad-script.js" crossorigin="anonymous" defer></script>|<script src="https://good-script.js" defer></script>|g' {} \;
With the constant technological advancements and shiny new frameworks we have, it's important not to forget the reliability of tried-and-true old-school tools like these.
In summary:
find
locates all.html
filesgrep
filters the ones containing the string I needsed
replaces the string with another one.