WebShot
   
 
Command Line Usage | Xml Configuration | Comma-seperated output | MessageBox Automation | Output Filename Masking | Watermarking Images | Implementation Considerations | Implementation Tips | Dll Implementation | Live Implementations
   
   
 
Command Line Usage

The command line version of WebShot is named webshotcmd.exe and is located in the WebShot installation folder. Below are the command line arguments that you can use to configure the screenshot generation process. Some parameters require double-quotes around their values.
Color Coding
The arguments are color coded to indicate what version supports them.

Freeware Edition Personal Edition Server Edition
ArgumentXmlDescriptionDefault Value
/urlWebsite url
/inBatchFileText file with urls on each line 
/outImagePathOutput image filewebshot.jpg
/widthImageWidthImage widthBrowser width
/heightImageHeightImage heightBrowser height
/bwidthBrowserWidthBrowser widthAutomatically determined
/bheightBrowserHeightBrowser heightAutomatically determined
/timeoutTimeoutMaximum time to wait in sec for process to finish
Last ditch attempt to kill process, use other wait parameters first
Do not use with batch mode
Infinite / 0
/timeoutpgTimeoutPageMaximum time to wait for page to load in secondsInfinite / 0
(Recommended 85)
/timeoutmetaTimeoutMetaMaximum time in secs allowed for meta refresh 
/waitdocWaitDocumentTime to wait for scripts and controls to load after document is complete (seconds) 0
(Recommended 10)
/waitdocflWaitDocumentFlashTime to wait for scripts and controls to load after document is complete on a page with flash content (seconds)
waitdoc value used if not specified
 
/waitimgWaitImageTime to wait right before the image is captured0
(Recommended 2)
/bmwidthBrowserWidthMinMinimum browser width0
/bmheightBrowserHeightMinMinimum browser height0
/bxwidthBrowserWidthMaxMaximum browser width0
/bxheightBrowserHeightMaxMaximum browser height0
/qualityImageQualityImage quality (0-100)100
/typeImageTypeImage encoder (ie. png, gif, jpg, bmp)jpg
/csvCsvPathOutput results to specified csv file 
/usernameHttpUsernameAuthentication username
/passwordHttpPasswordAuthentication password
/htmlHtmlPathOutput main html source to specified file
/headersHttpHeadersCustom http headers separated by || 
/postdataHttpPostDataCustom post string 
/useragentHttpUserAgentCustom user agent string 
/redirectmaxRedirectMaxMaximum number of redirects to allow1
/grayscaleImageGrayscaleMake output image in grayscale 
/wmfilenameWatermarkFilenameSet the watermark image
Only uses 24-bit bitmaps, blend mode is Multiply
 
/wmpositionWatermarkPositionSet the position of the watermark image"0.0x0.0"
/wmopacityWatermarkOpacitySet the watermark image opacity100
/threadsThreadMaxBatch mode only, number of threads to use1
/clrcacheClears ALL Internet Explorer cache
Should only be used once every 5000-10000 screenshots.
 
-nosaveImageSaveToDiskSets wheither to save images to disk 
-noactivexDisableActiveXDisables running of ActiveX controls 
-noscriptsDisableScriptsDisables running of scripts 
-fileDebugTurns on debug logging to file
-perthreadDebugPerThreadTurns on creation of debug logs per thread
-appendAppendTurns on debug append logging
-verboseVerboseTurns on verbose debug logging

By default the image width and height are taken from the browser width and height which are determined automatically. On some pages it is difficult to correctly determine the width and height for a page, therefore it is recommended that you specify and minimum browser width and height.

Timing is everything

Just like a camera has a bunch of options to allow you to take the best possible picture, so does WebShot. Some of these configurable options come in the form of wait parameters that specify amounts of time to wait before or after certain browser events occur. Because of the nature of the web these days, a lot of websites have scripts or activex controls that load after a page has been downloaded by Internet Explorer. Flash is a good example of an activex control that takes time to load after the page has completed downloading. Some content heavy flash and active-x controls may take longer to load than others. Therefore the wait parameters allow you to wait a specified amount of time for objects on the pages to load.

Examples

webshotcmd.exe /url "http://www.gizmodo.com/"
Takes a full screenshot capture of Gizmodo's website

webshotcmd.exe /url "http://www.gizmodo.com/" /bwidth 600
Takes a screenshot of Gizmodo's website and clips the image off at 600 pixels high

webshotcmd.exe /url "http://www.gizmodo.com/" /width 800 /height 600
Takes a full screenshot capture of Gizmodo's website and creates a thumbnail of it with the size 800x600

webshotcmd.exe /url "http://www.google.com/" /headers "Accept-Language: en||Referer: http://www.google.com/ig"

webshotcmd.exe /url "http://www.google.com/" /postdata "Username=TestUser&Password=TestPass&Submit=OK".

   
   
 
Xml Configuration

Instead of having to pass the same parameters through the command line each time you run WebShot, you can setup an xml configuration file that contains your most commonly used parameters.

<WebShot>
	<Debug>FALSE</Debug>
	<ImagePath>\images\</ImagePath>
	<BrowserWidth>1024</BrowserWidth>
	<BrowserHeight>768</BrowserHeight>
	<BatchFile>urls.txt</BatchFile>
	<Verbose>TRUE</Verbose>
</WebShot>

The xml configuration file should be in the same directory as webshotcmd.exe and should be named webshotcmd.xml.

   
   
 
Comma seperated output
(Requires personal or server edition)

In order to determine the success or failure of screenshot generation results can be output to a csv file. By default, the csv file is named webshot.csv and is in the same directory as WebShot. You can specify your own filename if needed. Results are always appended to the current csv file and follow the following format:

Url, Image FileName, Error Message, Browser Width, Browser Height, Image Width, Image Height, Timestamp, Page Title, Meta Keywords, Meta Description

Click here to see an example csv file.

All column values have double quotes around them. The error message field may have more than one error message. Each csv entry is separated by a return break sequence \r\n. As a final check to see wheither the screenshot was successful, check to make sure that the image file exists.

   
   
 
MessageBox Automation

Often times Internet Explorer can popup messages about all sorts of things. When a message box occurs WebShot allows you to choose the best response for the dialog. It then stores you selection so that it will know how to deal with message boxes of that type in the future.



The settings for this automation are stored in webshotauto.xml in the same directory as the webshotcmd.exe. UTF8 is supported because Internet Explorer uses multiple languages. Below is an example xml configuration automation file.

<WebShotAutomation>					
	<MessageBox>
		<Text>Stack overflow at line: 0</Text>
		<Caption>Windows Internet Explorer</Caption>
		<Result>Ok</Result>
	</MessageBox>
</WebShotAutomation>
SectionInformation
TextText to match against. Match occurs if Text is found in the text of the message box. UTF8 support. Case insensitive support for ASCII.
CaptionCaption to match against. Match occurs if Caption is found in the title of the message box. Not required. Case insensitive matching.
ResultResponse to message box. Can be: Ok, Cancel, Yes, No, Retry, Close, Ignore, Abort, Help, Try Again, Countinue.
DebugInfoInformation about the message box which is displayed in debug log. Not required.
DebugUrlInformation about the url that it occured at. Only for debugging purposes, not required.

WebShot comes with an automation configuration file that contains some of the common message boxes that I've encountered. If you encounter any new ones, feel free to email me you config file.

   
   
 
Output Filename Masking

WebShot supports the following masks which only apply to the file title portion of the resultant path only. The graphical, command line, and dll interfaces all support this masking. The two command line parameters that support it are /out and /html.
MaskValue
%mUrl Md5 Hash (Default)
%hUrl Hostname
%dUrl Domain name
%eUrl Domain name without Tld
%pUrl Path
%lLiteral Timestamp (20060130120505 / YMDHMS)
%tUnix Timestamp

The final output path should not exceed more than 256 characters. The following characters are stripped from the resultant output filename "? \ / : & = %".

   
   
 
Watermarking Images

The server edition of WebShot allows you to watermark your output screenshots using the /wmfilename, /wmposition, and /wmopacity command line parameters. All watermark images must be 24-bit bitmaps. You can position your watermark image using wmposition which uses x and y multiplier values to set the location of the watermark. For instance to set the watermark image at the bottom of the screenshot use /wmposition "0.0x1.0". To set it all the way to the right use "1.0x0.0". Opacity is set using 0-100%. Below is an example.



   
   
 
Implementation Considerations

WebShot has been designed to have a low footprint and high performance.

If you are planning on using batch mode, it is important to know that WebShot uses Internet Explorer to take screenshots of webpages. And occasionally Internet Explorer has been known to leak resources. The leaks that occur leak into the WebShot process space. When the WebShot process closes all the leaked resources are reclaimed by the operating system. There is a limitation to the amount of screenshots that you can do per batch in batch mode. You should only be concerned about this if you are using batch mode to do several thousand screenshots PER BATCH.

If you are planning developing a service that uses the WebShot dll it is important to make your service recyclable.

The configuration of your screenshot harvesting machine makes a difference. It is important not to install a bunch of unnessecary BHO's (browser helper objects) or add-ons for Internet Explorer. The more add-ons installed, the longer it takes to load the web browser control.

Installing flash is perfectly okay. However, I would not recommend installing Google Toolbar as I have seen it hi-jacked and used to launch popup windows. Also, it is not a good idea to install Adobe Reader, as it is a boated piece of crap. When it comes across a PDF it cannot read it will lock up. If you still need to retain PDF reading functionality, do yourself a favor and download Foxit Reader.

If on the machine you are harvesting you are browsing with Internet Explorer and the window hangs it is possible that it will hang all of the web browser controls used by the system. In this case, if you just kill the hanged Internet Explorer process it will release all of the rest.

It is recommended that you so some simple validation on the urls that you pass to the utility in the essence of saving time.
   
   
 
Implementation Tips

There are many ways to implement the command line interface into your application. Below are some implementations tips that have been helpful for others.

Windows XP/2003 32-bit

In Windows there is a limitation to the number of consecutive Internet Explorer windows that can be opened. It is limited, so you can open maybe 30-40 or less, depending what else is running on the system. The solution for this problem is to increase the desktop heap size via the registry.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\
Control\Session Manager\SubSystems

Key: Windows
Value: %SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=24

http://support.microsoft.com/kb/126962
IIS7 / IIS6?

In the Internet Information Services Manager under Application Pool, the DefaultAppPool needs to be started by the LocalSystem identity. The default NetworkService identity does not have the permissions required to run WebShot.

PHP and IIS6

From the php.net website relating to the exec function..

"When trying to run an external command-line application in Windows 2000 (Using IIS), I found that it was behaving differently from when I manually ran it from a DOS prompt.

Turned out to be an issue with the process protection. Actually, it wasn't the application itself that was having the problem but one it ran below it! To fix it, open computer management, right-click on Default Web Site, select the Home Directory tab and change Application Protection to 'Low (IIS Process)'. "

You might also want to create a new Application Pool and set the security account for the application pool from Local Service to Local System and use that.

PHP and Internet Explorer Permissions

Internet Explorer stores the security settings for all users separately. If you run WebShot from the command line you run it as the user you are logged in as, but when you run it from PHP it is run under the SYSTEM user. In certain such instances Javascript will not have the proper permissions to execute when taking a screenshot of a Javascript enabled webpage.

Try adding the following registry values to make Internet Explorer use the HKLM, instead of HKCU security settings.
HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\
Windows\CurrentVersion\Internet Settings\

(DWORD) "Security_HKLM_only" = 1

http://support.microsoft.com/kb/182569
ColdFusion

You may need to modify the ColdFusion service to run under a higher privilaged account such as Network Service or Administrator in order to get it to work properly under Windows 2003.
   
   
 
Dll Implementation

The server edition of WebShot comes with a dll that you can use to implement screenshot generation into your own applications. With it, it is possible to

  • Create a Windows NT service that that continuely polls a database for urls that need to be screenshotted.
  • Create a Windows NT service that acts as a HTTP server. So that when an HTTP request is set to the service, it responses with an image.
  • Create a COM service that can be used by your scripting language
Below is an example on how to use the dll that comes with the server edition in C. C# and VB examples are included when you purchase the server edition. 

int32 WebShotHandle;


WebShot_DllInit("webshot.log", DEBUG_FLAGWINDOW | DEBUG_FLAGFILE);

WebShot_Create(&WebShotHandle);
WebShot_SetVerbose(WebShotHandle, TRUE);

if (WebShot_Open(WebShotHandle, "http://www.websitescreenshots.com/") == FALSE)
	printf("Error: Cannot take screenshot!\n");

WebShot_Destroy(&WebShotHandle);

WebShot_DllUninit();
To view the whole interface, you can download the doxygen documentation here.
   
   
 
Live Implementations

The following websites implement WebShot.
Google Preview | GoYellow.de | ThumbnailsPro | FreeFAV | The Hype Machine | WebShotsPro | Tjoos | CbAnalytics
If you use WebShot on your website and would like to be added to this list, please contact me.