Downloadable Software

Broked

A link checker

Pregnant Pause Home Software Search this site


Download You may have to shift-click or right-click to be able to save it to your local drive.

But we recommend you read the instructions ...

Capsule description

Broked is a stand-alone program that checks links on a web site. It's primary use is to track down broken links. It can also be used to find particular links.

License

This software is free. But as a non-profit organization, we would appreciate a donation!

Software and documentation (this page) Copyright 2000,2001 by Pregnant Pause.

In the following, the word "product" refers to both the software and the documentation, that is, both the broked.jar file and this text file, jointly or separately.

Why we wrote Broked

There are other link-checkers out there, so we don't claim to be breaking any dramatic new ground with this program. But we needed a couple of odd features that weren't in other programs that we could readily find. Besides, it was an amusing exercise.

Installation

Using Broked

To run the program

Windows: In Windows Explorer, double-click the file "broked.jar".

In Unix or Windows: Make sure your PATH includes the directory that you installed the JVM to. Then get to a command-line prompt. "cd" to the directory containing broked.jar. Type "javaw -jar broked.jar".

Basic Principle

You give Broked one or more URL's. It reads the files at these URL's and searches them for links. It produces a report listing links that meet certain conditions. Namely, you can list all links or only links that are broken, and you can further restrict it to only include links including a certain string. You can specify that Broked then read each of these files and check all of their links. In this way you can check all links on a site just by giving one starting page.

You can run Broked against Web pages stored on your local drive, but normally you want to run it over an Internet connection. You must have an Internet connection of some sort for Broked to work. It may be necessary to start up your Internet connection before running Broked.

Operation

Options

Broked first displays its "options" window. The options are:

Your e-mail Enter your e-mail address. This is sent along with requests to other sites, so that if there is a problem -- if you are overloading someone's web server or otherwise causing them trouble -- they have some way of contacting you and straightening the situation out.
From URL Enter one or more URL's to scan for links. If you enter more than one, put one on each line. It should begin with either "http:" (to read a Web page over the Internet) or "file:" (to read a file off your local hard drive). If neither is given, "http:" is assumed.

Examples:

www.mysite.com

http://www.mysite.com/index.html

//www.myisp.com/~mysite/index.html

file:///c:/webpages/somepage.htm

If you specify "http:", you normally follow this with two slashes and the domain name. If you specify "file:" (to read a file off a local disk drive), you normally follow this with three slashes and the path name. You can two slashes and a domain name if you wish, but I don't know of any valid value other than "localhost", and this is the default anyway.

If you specify "http:", you can give either a domain name alone, a domain name followed by a path to a directory, or a domain name followed by a path to a file. If you do not give a file, the Web server software supplies a default, usually "index.html". If you specify "file:", you must give a path to a file; there is no default file name when using "file:".

Known bug: If you give a directory, you must put a slash after the directory name. If you don't, the program succesfully reads the file, and absolute links work fine, but any relative links are interpreted incorrectly. For example, if you want to scan a website beginning at "www.mysite.com/marketing/index.html", it is not necessary to specify the "index.html", this will be supplied by default, but you still need the slash before it, as in "www.mysite.com/marketing/". If you leave off the final slash the program screws up.

To URL You may optionally enter a string here that must be part of the text of any URL found for that URL to be included in the output report. Any URL that does not include the text given is not included on the output.

For example, suppose you want to search your site for links to files in a directory named "foo". You could enter "foo" in this box. This is purely a text comparison, so it would match against, for example, "www.mysite.com/foo/file1.htm", but it would also match "www.mysite.com/bar/goodfood.htm", because "foo" is found in the middle of "goodfood".

Follow Tree If this box is not checked, then the only pages read are those that are explicitly listed in the "From URL" box. All links on these pages are verified, and then the program stops. If the Follow Tree box is checked, then after checking if a link is valid, Broked reads the page at that location and checks its links, and then it follows any of those links, etc. However, only pages whose URL begins with the starting URL are checked. If you give the home page of a site as the starting point, then only pages on that site are checked. Otherwise we could end up reading the entire Internet.
Manners Broked is capable of reading files very quickly. If we let it run as fast as it possibly could, it could put an unacceptable burden on your or somebody else's Internet connections. Thus, a pause is built in after each Web page read. You can set the length of the pause. The "Acceptable" choice is probably adequate for most purposes.
Been there, done that size When you are following a tree, it is quite possible for you to have more than one link to the same place. For example, page A may link to B and C, and then B also links to C. To avoid following all the links in C twice in such a case, Broked keeps a "been there, done that" table of all the places it's been. This option sets the size of that table. If possible, make it larger than the number of pages that you expect to find on this run. If the table fills, Broked must drop entries, and so it may end up exploring the same pages more than once. (When it drops entries, Broked drops those with the fewest links, in an effort to minimize the amount of repeat work. It also keeps enough information to guarantee that it will never get stuck in a loop. For example, if A links to B, B links to C, and C links back to A, Broked will not get stuck going around and around forever.) If you make the table too big, you might run out of memory and the program will fail. (Hmm, maybe for the next version we'll try to make the program smart enough to check available memory and set the size of the table accordingly.)
Show all links / Show broken links only If "show all links" is chosen, then the output includes every link Broked finds, good or bad, as it conducts its scan. If "show broken links only" is chosen, then Broked only lists links that resulted in errors of some kind when it tried to chase them.
Chase After selecting the desired options, click "chase" to begin chasing links. A new window appears to show the results, as described below.
Exit Click "exit" to quit the program.
About Click "about" for version and copyright information.

Output

This window lists the links found. As it runs, it shows a progress message near the bottom of the window (above the buttons). This lists the number of good links (those that didn't generate errors), bad links (those that did generate errors), and "not followed" links (a few special cases, mostly links to other than HTML pages, like e-mail links). The main purpose of this is to let you know that the program is still alive as it works. If you only asked for broken links to be listed, Broked might go through dozens or hundreds of good links for every broken one it finds, so there could be considerable time between new entries being added to the output.

It displays a list with four columns:

From URL The URL that contains the link. For the links that you gave in the "From URL" box on the Options page, this is given as "(start)".
To URL The link being checked.
Status The status code generated when Broked attempted to read the URL. If this is blank, then Broked did not get a status code. (This is normal when reading with the "file:" protocol.) If it is "---", then Broked was not able to even attempt the read, usually because the URL it was attempting to read was invalid.

You don't have to worry about the numeric codes too much, you can just look at the next column, the text fields. But for your information: codes in the 200 range mean it worked; the 300 range means something unusual happened but the results should still be valid; the 400 range means there was something wrong with the request, like not found or security failures; and the 500 range means the server had a problem.

Status text The text associated with the status code. This is not always consistent, because different Web servers sometimes give different descriptions for errors.

Note: You can change the relative widths of the columns by positioning the mouse pointer on the boundary lines between headings, and dragging one way or the other. You can re-arrange the columns by positioning the mouse in the middle of a heading and dragging the heading.

This window contains three buttons:

Stop Stop checking links. Click this if something is obviously going wrong, or if the list of links is longer than you expected.

Save Save the output list to a file. The program pops up a standard "save as" box for you to specify the directory and file name. The file is stored as a "comma separated values" list, or "csv" file. This format can be imported into many spreadsheets.

Known bug: If a URL includes a comma or a quote, it screws up. This happens rarely, but certainly should be fixed.

Close Close this window and return to the options window.

If you wish, you can jump back to the options window with normal windows-manipulation methods, without closing the output window, and run another search. In this case you will have multiple output windows.

Tech Support

Contact us.
Pregnant Pause Home Software Search this site

Copyright 2000 by Pregnant Pause
Contact us.