| Pregnant Pause Home | Software | Search this site |
But we recommend you read the instructions ...
Software and documentation (this page) Copyright 2000,2001 by Pregnant Pause.
In the following, the word "product" refers to both the software and the documentation, that is, both the broked.jar file and this text file, jointly or separately.
In Unix or Windows: Make sure your PATH includes the directory that you installed the JVM to. Then get to a command-line prompt. "cd" to the directory containing broked.jar. Type "javaw -jar broked.jar".
You can run Broked against Web pages stored on your local drive, but normally you want to run it over an Internet connection. You must have an Internet connection of some sort for Broked to work. It may be necessary to start up your Internet connection before running Broked.
| Your e-mail | Enter your e-mail address. This is sent along with requests to other sites, so that if there is a problem -- if you are overloading someone's web server or otherwise causing them trouble -- they have some way of contacting you and straightening the situation out. |
|---|---|
| From URL | Enter one or more URL's to scan for links. If you enter more than one, put one on each line. It should begin with either "http:" (to read a Web page over the Internet) or "file:" (to read a file off your local hard drive). If neither is given, "http:" is assumed. Examples:
www.mysite.comIf you specify "http:", you normally follow this with two slashes and the domain name. If you specify "file:" (to read a file off a local disk drive), you normally follow this with three slashes and the path name. You can two slashes and a domain name if you wish, but I don't know of any valid value other than "localhost", and this is the default anyway. If you specify "http:", you can give either a domain name alone, a domain name followed by a path to a directory, or a domain name followed by a path to a file. If you do not give a file, the Web server software supplies a default, usually "index.html". If you specify "file:", you must give a path to a file; there is no default file name when using "file:". Known bug: If you give a directory, you must put a slash after the directory name. If you don't, the program succesfully reads the file, and absolute links work fine, but any relative links are interpreted incorrectly. For example, if you want to scan a website beginning at "www.mysite.com/marketing/index.html", it is not necessary to specify the "index.html", this will be supplied by default, but you still need the slash before it, as in "www.mysite.com/marketing/". If you leave off the final slash the program screws up. |
| To URL | You may optionally enter a string here that must be part of the text of any URL found for that URL to be included in the output report. Any URL that does not include the text given is not included on the output. For example, suppose you want to search your site for links to files in a directory named "foo". You could enter "foo" in this box. This is purely a text comparison, so it would match against, for example, "www.mysite.com/foo/file1.htm", but it would also match "www.mysite.com/bar/goodfood.htm", because "foo" is found in the middle of "goodfood". |
| Follow Tree | If this box is not checked, then the only pages read are those that are explicitly listed in the "From URL" box. All links on these pages are verified, and then the program stops. If the Follow Tree box is checked, then after checking if a link is valid, Broked reads the page at that location and checks its links, and then it follows any of those links, etc. However, only pages whose URL begins with the starting URL are checked. If you give the home page of a site as the starting point, then only pages on that site are checked. Otherwise we could end up reading the entire Internet. |
| Manners | Broked is capable of reading files very quickly. If we let it run as fast as it possibly could, it could put an unacceptable burden on your or somebody else's Internet connections. Thus, a pause is built in after each Web page read. You can set the length of the pause. The "Acceptable" choice is probably adequate for most purposes. |
| Been there, done that size | When you are following a tree, it is quite possible for you to have more than one link to the same place. For example, page A may link to B and C, and then B also links to C. To avoid following all the links in C twice in such a case, Broked keeps a "been there, done that" table of all the places it's been. This option sets the size of that table. If possible, make it larger than the number of pages that you expect to find on this run. If the table fills, Broked must drop entries, and so it may end up exploring the same pages more than once. (When it drops entries, Broked drops those with the fewest links, in an effort to minimize the amount of repeat work. It also keeps enough information to guarantee that it will never get stuck in a loop. For example, if A links to B, B links to C, and C links back to A, Broked will not get stuck going around and around forever.) If you make the table too big, you might run out of memory and the program will fail. (Hmm, maybe for the next version we'll try to make the program smart enough to check available memory and set the size of the table accordingly.) |
| Show all links / Show broken links only | If "show all links" is chosen, then the output includes every link Broked finds, good or bad, as it conducts its scan. If "show broken links only" is chosen, then Broked only lists links that resulted in errors of some kind when it tried to chase them. |
| Chase | After selecting the desired options, click "chase" to begin chasing links. A new window appears to show the results, as described below. |
| Exit | Click "exit" to quit the program. |
| About | Click "about" for version and copyright information. |
It displays a list with four columns:
| From URL | The URL that contains the link. For the links that you gave in the "From URL" box on the Options page, this is given as "(start)". |
|---|---|
| To URL | The link being checked. |
| Status | The status code generated when Broked attempted to read the URL. If this is blank, then Broked did not get a status code. (This is normal when reading with the "file:" protocol.) If it is "---", then Broked was not able to even attempt the read, usually because the URL it was attempting to read was invalid. You don't have to worry about the numeric codes too much, you can just look at the next column, the text fields. But for your information: codes in the 200 range mean it worked; the 300 range means something unusual happened but the results should still be valid; the 400 range means there was something wrong with the request, like not found or security failures; and the 500 range means the server had a problem.
|
| Status text | The text associated with the status code. This is not always consistent, because different Web servers sometimes give different descriptions for errors. |
Note: You can change the relative widths of the columns by positioning the mouse pointer on the boundary lines between headings, and dragging one way or the other. You can re-arrange the columns by positioning the mouse in the middle of a heading and dragging the heading.
This window contains three buttons:
| Stop | Stop checking links. Click this if something is obviously going wrong, or if the list of links is longer than you expected.
|
|---|---|
| Save | Save the output list to a file. The program pops up a standard "save as" box for you to specify the directory and file name. The file is stored as a "comma separated values" list, or "csv" file. This format can be imported into many spreadsheets. Known bug: If a URL includes a comma or a quote, it screws up. This happens rarely, but certainly should be fixed.
|
| Close | Close this window and return to the options window. If you wish, you can jump back to the options window with normal windows-manipulation methods, without closing the output window, and run another search. In this case you will have multiple output windows.
|
| Pregnant Pause Home | Software | Search this site |
Copyright 2000 by Pregnant Pause