Published 2022-07-28.
Last modified 2022-08-03.
Time to read: 3 minutes.
Lawyers like the Microsoft Office software suite; so when I am working on a court case as an expert, I endeavor to provide my clients with Word documents that contain necessary information. I like working in WSL/WSL2 because I can use Windows programs and Ubuntu programs together effectively.
Grab Image, Then Slice
Recently, I used SnagIt, a Windows program, to capture large web pages as single images. Some of these images were quite tall.
The CSS for the web pages made some content invisible on the printed page. Yes, I could have injected CSS using a Chrome plugin like My Style to ensure that all content will be printed, like this:
@media print { * { display: initial; visibility: visible; } }
I decided to use screen grabs, which would guarantee that the contents of my report would exactly match what had been displayed on the screen, without injecting anything into the web pages.
ImageMagick is preinstalled on Ubuntu Desktop. I used ImageMagick to slice the image captures into smaller page-sized images, so they could be inserted into a Word document.
The Computer Worked Hard
Grabbing such large web pages was a lot of work for my desktop computer. The only programs active during the screen grab process were the Google Chrome browser and SnagIt. I found that 10GB RAM and 30% of the GPU capability (an NVidia GTX 1660 Super) was used.
The screen grab failed if I did not start scrolling from the top of the web page; while it is possible to scrub up and down smaller web pages in order to grab portions of interest, this fails for large pages.
I also found that scrolling too fast caused the screen grabbing process to fail. Clicking and holding the bottom scroll arrowhead at the bottom right of the screen seemed to result in a smooth and optimal scrolling speed. This meant that grabbing large web pages took a few minutes as the page slowly scrolled downward.
Setting Up the Conversion
The Word documents I usually work with are formatted for North American standards. This means one-inch margins on letter-sized paper (8.5" x 11"), which gives a working area of 6.5" x 9", yielding an aspect ratio of 0.72.
The tall captured images needed to be sliced into rectangles that fit efficiently into Word documents. The computations are as follows.
-
Determine the width of a screen grab and save it into
W
. The ImageMagickidentify
command does not provide a newline after its output, however I have inserted one for readability:Shell$ identify -ping -format '%w' ../IMG2005.png 1536 $ export W="$( identify -ping -format '%w' ../IMG2005.png )"
-
Determine the height of a screen grab and save it into
H
:Shell$ identify -ping -format '%h' ../IMG2005.png $ export H="$( identify -ping -format '%h' ../IMG2005.png )"
-
The width can be divided by the aspect ratio to obtain the desired height of each
slice so they can be inserted optimally into the Word documents.
I used the
bc
calculator provided withBash
to divideW / ASPECT_RATIO
. TheH2
integer variable contains the computed height for the images.Shell$ export ASPECT_RATIO=0.72 $ export H2="$( echo "scale=0 ; $W / $ASPECT_RATIO" | bc )"
-
Now the image called
IMG2005.png
can be sliced using ImageMagick’s convert command. The slices are stored into a subdirectory calledslices
, with file names likeIMG2005-1.jpg
,IMG2005-2.jpg
, etc.Shell$ convert IMG2005.png -crop ${W}x${H2} \ -quality 100% -scene 0 slices/IMG2005-%d.jpg
Automating the Conversion
I wrote the following bash script, which incorporates the above computations. It slices all the images in a directory and saves the results to a second directory.
#!/bin/bash
function help { if [ "$1" ]; then echo "Error: $1"; fi echo " $(basename $0): slice all images in the given directory and place them into a specified directory, which will be created if required. " exit 1 }
function setup { export ASPECT_RATIO=0.72 export W="$( identify -ping -format '%w' "$1" )" export H="$( identify -ping -format '%h' "$1" )" export H2="$( echo "scale=0 ; $W / $ASPECT_RATIO" | bc )" }
function convert1 { FULLNAME=$(basename -- "$1") FILENAME="${FULLNAME%.*}" FILETYPE="${FULLNAME##*.}"
convert "$1" \ -crop "${W}x${H2}" \ -quality 100% \ -scene 0 \ "$DIR_OUTPUT/$FILENAME-%d.png" }
if [ -z "$1" ]; then help "No directory path for images to be converted was provided."; fi export DIR_INPUT="$( realpath $1 )"
if [ -z "$2" ]; then help "No directory path for the image slices to be saved into was provided."; fi export DIR_OUTPUT="$( realpath $2 )"
mkdir -p "$DIR_OUTPUT"
find $DIR_INPUT -type f -exec file --mime-type {} \+ | awk -F: '{if ($2 ~/image\//) print $1}' | while read FILE; do setup "$FILE" echo "Slicing $FILE into ${W}x${H2} pixels" convert1 "$FILE" done
Overcoming ImageMagick Processing Limits
Some of the web pages that I needed to grab were quite long, which resulted in those images requiring more computational resources than the default Imagemagick configuration allows. This caused errors such as the following to appear:
convert-im6.q16: no images defined `/mnt/c/images/slices/IMG1466-%d.png' @ error/convert.c/ConvertImageCommand/3229. convert-im6.q16: cache resources exhausted `/mnt/c/images/IMG1091.png' @ error/cache.c/OpenPixelCache/4095.
Imagemagick defines computational resources limits in /etc/ImageMagick-6/policy.xml
.
The default maximum memory is 256 KB,
the default maximum allowable height is 16,000 pixels (16KP),
and the default maximum area is 128M pixels.
These values are defined by the following entries:
<policy domain="resource" name="memory" value="256MiB"/> <policy domain="resource" name="height" value="16KP"/> <policy domain="resource" name="area" value="128MP"/>
I changed the maximum memory limit to 2 GB RAM, the maximum height limit to 10,000,000 pixels (10MP), and the maximum area limit to 2G pixels with these entries:
<policy domain="resource" name="memory" value="2GiB"/> <policy domain="resource" name="height" value="10MP"/> <policy domain="resource" name="area" value="2GP"/>
Alternatively, I could have simply commented out the limits, as shown in highlighted text below.
<!-- <policy domain="resource" name="memory" value="256MiB"/> <policy domain="resource" name="height" value="16KP"/> <policy domain="resource" name="area" value="128MP"/> -->
The largest web page to be sliced was converted to a very tall image, which was 83,703 pixels high. It was sliced into 40 images.
Word Macro
A Word macro is also needed to insert the images into the currently open Word document in alphabetical order. I modified this one.
Sub insertImages() Dim intResult As Integer Dim strPath As String Dim strFolderPath As String Dim objFSO As Object Dim objFolder As Object Dim objFile As Object Dim i As Integer
intResult = Application.FileDialog(msoFileDialogFolderPicker).Show 'Check if user canceled the dialog If intResult <> 0 Then 'dispaly message box strFolderPath = Application.FileDialog(msoFileDialogFolderPicker).SelectedItems(1) 'Create an instance of the FileSystemObject Set objFSO = CreateObject("Scripting.FileSystemObject") 'Get the folder object Set objFolder = objFSO.GetFolder(strFolderPath) i = 1 'loops through each file in the directory and prints their names and path For Each objFile In objFolder.Files 'get file path strPath = objFile.Path 'insert the image Selection.InlineShapes.AddPicture FileName:= _ strPath, LinkToFile:=False, _ SaveWithDocument:=True Next objFile End If End Sub
Done!
😁Thanks to the above automation, I was able to deliver the Word documents containing the sliced web pages to my client soon after they were requested.