- Home
- Learn Linux
- Learn Electronics
- Raspberry Pi
- Programming
- Projects
- LPI certification
- News & Reviews
25 November 2005
Google have a feature called sitemaps. The sitemap allow you to provide a list of all your webpages directly to google, which should mean that they are better at adding your website to the search engine.
The perl code follows:
#!/usr/bin/perl -w
# Get wordpress pages and add to a google sitemap urllist format file
# Also incorporate a static file into the output
# Runs google sitemap script when finished
use strict;
use DBI;
use File::Copy;
my $version = "0.1 devel";
my $outputfile = '/var/www/data/urllist.txt';
my $statics = '/var/www/data/import_staticlist.txt';
# This includes the path to the google provided sitemap-gen program,
# may need to change if this is different
my $sitemapcmd = '/opt/sitemap_gen-1.3/sitemap_gen.py \
--config=/var/www/data/website_config.xml';
# code db info as config is in php format rather than perl
my $dbname = 'wordpress';
my $dbuser = 'wordpress';
my $dbpass = 'password';
my $dbhost = 'localhost';
# If prefix is not wp_, then will need to change the following line
my $dbtable = 'wp_posts';
# Priority to give to all pages (gives all the same)
my $priority = '0.7';
# First copy statics to output file, then we can append
copy ($statics, $outputfile) or die "Error copying $statics to $outputfile";
# Open file to append
open (OUTPUT, ">>$outputfile") or die "Unable to append to $outputfile";
# Make sure we are on a newline
print OUTPUT "\n";
my $dbh = DBI->connect("DBI:mysql:$dbname:$dbhost",$dbuser,$dbpass) \
or die "unable to connect to $dbname as $dbuser";
my $query = \
$dbh->prepare("SELECT guid, post_modified, post_status FROM $dbtable");
$query -> execute or die "Error getting data from DB: $dbh->errstr";
my ($url, $date, $poststatus, $null);
while (($url, $date, $poststatus) = $query->fetchrow_array())
{
# Ignore if not yet published
if ($poststatus ne "publish" && $poststatus ne "static") {next;}
# remove time from date
($date, $null) = split / /, $date;
# Otherwise add the details
print OUTPUT "$url lastmod=$date priority=$priority\n";
}
$query->finish;
$dbh->disconnect;
close OUTPUT;
# Finished create file - now run the google program
system ($sitemapcmd);
The script can be run manually, or in my case is called from a crontab job:
0 23 * * * /opt/googlesitemap/wpresspages.pl
Fairly basic, but provides a quick and easy way of having the sitemap file updated regularly.