Free Perl Tutorials

HTML::TreeBuilder::Scanning Tutorial - look_down, as_text, as_HTML, parse_file, delete

This is an HTML::TreeBuilder::Scanning Tutorial, mainly for myself, because I am still learning too! The good thing is I'll use normal everyday language to describe what I am doing, being a NOOB myself!

First I am going to describe some of the objects we'll be using from the HTML::TreeBuilder Module. If you don't know what objects are, click here for a brief tutorial on objects classes and packages in PERL.

To return an object (which is the information that we are looking for), we need to know how to use some "object method calling" to give us the object we want returned. Here is a description of a few object method calling tools:

parse_file
This method creates a tree based on source from the file we will refer it to.

delete
This method deletes a tree making it ready for GC (thats techie talk for "Garbage Collection", which is an automated process in PERL).

as_text
This method returns a string that contains all bits that are children of a given section of the file we wish to look through (such as in between the body tags in html.

as_HTML
This method returns all the html tags ang text within specific tags.

look_down
This method is probably the most important method we use when trying to find specific information in a file when using HTML::TreeBuilder. This method looks down at a specific level within our tree starting at a specific object. It then looks for the criteria you provide. The criteria you specify are set in the look_down argument list. Each set of criteria can contain two scalar values (scalars are simply names or numbers). This set of 2 scalar values consist of a key and a value. A key and a value would be like looking for a tag in HTML. For instance, your key would be "_tag" and your value would be "h1".

Here is the simple usage:

$h1->look_down('_tag', 'h1');

Perl Packages, Perl Classes, Perl Objects....OH MY!

When I first started to learn PERL, one of my disadvantages was that I didn't even know the difference between a PERL package, a PERL class, and a PERL object. So what are they??

Think of a perl object as a way of storing a complex set of behaviors into a nice neat little bundle. A perl object resides within a perl class. And likewise, a perl package is made up of little individual perl classes. Therefore, an object resides within a class which belongs to a particular package! Easy? Good! Lets continue.

As you begin writing code, you will call on classes or objects within your programming to perform certain functions for you. If you request an entire class to do something for you, we say that you are calling a "class method". If you request an object to do something for you, we say that you are calling an "object method". When you request a class or object to return an object to you, you are calling on a constructor, which is an example of a type of method!

PERL File Operators And Their Function

There are a few perl file operators that are very useful in discovering information about a file or directory. Here are four, how they are used, and the information they provide you with:

-e Returns true (which equals 1) or false as to whether an object exists
-z Checks whether a file has zero size
-s Returns the size of a file or directory
-d Checks whether an object is a file or directory

You can use these to your advantage while programming PERL to determine whether a file exists, whether it has no size, is a file or folder, and to check the size of the object.

Usage is simplistic, here is an example in PERL code:

print -s "test2/hey.txt";


The above returns the file size in bytes. If you used "-e" instead, it would likely return a "1" indicating that, yes, the file exists. If it didn't, the file would not return anything.

PERL glob Function

The PERL glob function is another useful directory tool for pulling in file information. The thing that makes glob so useful is that you can use it to return files of only a specific extension. Before we get started, in the folder where the program you are going to make will be stored, create a subfolder called "subfolder". Then fill it with a few different file types, like a .txt file and a .jpg file. So let's dive right into the glob code to see how it is used:

#First, we create an array to store the file information and use the
#glob function to pull in information on all files from our subfolder.
@myarray = glob('subfolder/*.');

#Then we can print our array to return all files in the subfolder
print "@array\n";


The question is, how do we get it to return specific values based on a particular file extension? Simple, look at the code in bold below:

#First, we create an array to store the file information and use the
#glob function to pull in information on all .txt files
#from our subfolder.
@myarray = glob('subfolder/*.txt');

#Then we can print our array to return all files in the subfolder
print "@array\n";


Its an easy way to only pull in files of a specific extension using PERL!

PERL qw Function

The qw function is useful in saving you time while programming with PERL. It is used when you are declaring arrays. Without the qw function, you would have to put all of your bits of information in quotes and separate each with a comma, like so:

@array = ("This", "Takes", "To", "Long!");

A much easier way to format this is to do the following while using the qw function:

@array = qw(This Takes To Long!);

They are both the exact same thing, but one requires significantly less work. I stuck with the first method for a while before moving over to the qw method, but I recommend picking up the qw function immediately as it will increase your productivity, and is easy to learn!

PERL Basics

I don't want to spend a ton of time on the basics of PERL, as they are about the easiest thing you'll every program, but here are some easy program examples to get you started, and important, yet simple, functions you'll use quite often

PERL print function
Probably the most widely used function in all of PERL programming - at the beginner level anyways. "print" (note: the lowercase p in print, NOT an uppercase P because PERL is case sensitive) displays whatever you want to on your screen. You can print text using the print command like so:

print "My first line of PERL code! YIPPEE!";

You can use print to display full strings like so:
$first = "My first line of PERL code! YIPPEE!";
print $first;

You can print almost anything, as you'll soon see as you progress through the tutorial!

PERL Variables
What is a variable? A variable is a value that can assume any quantity of information. PERL has mutliple types of variables called strings, arrrays, and hashes. Each has its own method of assigning a value to a varaible:

PERL String Variables
String variables in PERL are denoted with the "$" preceding whatever terminology one would like to assign to a variable. The important thing to remember about a string type variable is that it assigns a single value to its entire quantity. Say for example you assign a sentence to a string:

$example = "This is an example of a string variable";


This makes $example equal to This is an example of a string variable

String variables can be pretty much anything: numbers, single word, sentences, file names & extensions, etc.

What if we wanted PERL to look at a longer list of data, and look at each piece of data in that individually as opposed to a sting that just bunches everything together?

PERL Array Variables
Arrays allow you to look at each individual group of information within a piece of data. You can pull individual strings from individual pieces of data. The array variable is denoted with "@" preceding the variable name. Here is an example of an array:


@array = ("earth", "wind", "fire");


Arrays are a great way for oranizing data. You can call individual data from this and set it equal to a particular string to pull individual pieces of the data from the array, merely by selecting the location of the data within the array. The important thing to remember about arrays in PERL is that they begin counting with zero. In other words, if you coun't the above items in the example @array, earth would be in position 0, wind would be in position 1, and fire would be in position 2!

Let's check out some code on how to pick apart this little array


@array = ("earth", "wind", "fire");
$array = @array[1];
print $array;


The "[1]" in our above example sets the string "$array" equal to the 2nd item in the array (remember we start counting arrays from zero). This program would return "wind"!

A quick side note. An easier format for creating arrays would be to use the qw function perl provides. It takes the work out of having to add the commas and quotes around each word. In other words, this:


@array = ("earth", "wind", "fire");


Is the exact same as this:


@array = qw(earth wind fire);


It'll start to save you time in the long run!

PERL Hash Variables
So now that we know what a string variable is and what an array is, we'll look at what a hash is! First, a hash variable is denoted with a "%" sign in front of the variable name. Here is an example of a hash in PERL:

%myhash = ("hey", 1, "my", 2, "first", 3, "hash", 4)

A hash operates pretty much the exact same as an array, the biggest difference is that we can't reference a value by index number in a hash as we can in an array.
You call the values from a hash in a differnt fashion. Rather than displaying the information here, I found an excellent video that describes how to do it in detail!



I hope this helps your basic understanding of PERL!

PERL rmdir function

Now that we have learned what the chdir and opendir functions do, it's time to learn how to delete a folder using PERL. Its easy to remove a directory using PERL. The important thing to remember is that PERL can only remove the folder or directory if the folder or directory is empty!

Here is an example of the code you would use to delete a folder using PERL:

#First, open your primary directory
opendir(DIR, 'test');

#Remove the unwanted folder (yes, keep the quotation marks)
rmdir 'name-of-unwanted-folder';

#close your directory
closedir(DIR);

You can now check your directory and see that the folder you wanted to delete is no longer there!