NAME
HTML::ParseBrowser - Simple interface for User Agent string parsing.
SYNOPSIS
use HTML::ParseBrowser;
my $ua = HTML::ParseBrowser->new($ENV{HTTP_USER_AGENT});
my $browsername = $ua->name;
my $browser = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)';
# BTW: That's IE 5.5 on Windows ME
$ua->Parse($new_browser);
$browsername = $ua->name;
my $os = $ua->os_type;
$browser = 'Mozilla 3.0 - Mozilla/3.0 (Linux 2.2.19 i686; U) Opera 5.0 [en]';
# BTW: that's Opera 5.0 on Linux, English
$ua->Parse($new_browser);
my $lingo = $ua->language;
DESCRIPTION
The HTML::ParseBrowser is an Object-Oriented interface for parsing a User Agent
string. It provides simple autoloaded methods for retrieving both the actual
values stored in the interpreted (and, so far, correct) information that these
wildly varying and nonstandardised strings attempt to convey.
It provides the following methods:
new() (constructor method)
Accepts an optional User Agent string as an argument. If present, the string
will be parsed and the object populated. Either way the base object will be
created.
Parse()
Intended to be given a User Agent string as an argument. If present, it will be
parsed and the object repopulated.
If called without a true argument or with the argument '-' Parse() will simply
depopulate the object and return undef. (This is useful for parsing logs, which
often fill in a '-' for a null value.)
Case-insensitive Access Methods and properties.
Any of the methods below may be called. Properties (->{whatever}) are case
sensitive and are lowercase. Called as methods (the preferred
way ->whatever() ) they are NOT case sensitive. As a result you can say
$ua->NAME, $ua->name, $ua->Name, or $ua->nAMe if you so feel inclined.
If an item is not able to be parsed, the methods will return undef. Calling
things in the method way will not cause autovivification, while checking as
properties without using exists() in a conditional first will cause
autovivifivation first (and, in the case of the version subproperties, even
exists() will do so - Ack!)
Note that in some cases it is absolutely impossible to tell certain details.
Nothing is guaranteed to be present -- not even 'name'.
It is also possible for someone to make their browser lie about the operating
system they are using (especially with spiders) -- and in some cases, they may
even be using more than one at the same time (like running Konqueror through an
X-Windows client on a Windows box).
user_agent()
The actual original User Agent string you passed Parse() or new()
languages()
Returns an arrayref of all languages recognised by placement and context in the
User_Agent string. Uses English names of languages encountered where
comprehended, ANSI code otherwise. Feel free to add to the hash to cover more
languages.
language()
Returns the language of the browser, interpreted as an English language name if
possible, as above. If more than one language are uncovered in the string,
chooses the one most repeated or the first encountered on any tie.
langs()
Like languages() above, except uses ANSI standard language codes always.
lang()
Like language() above, but only containing the ANSI language code
detail()
The stuff inside any parentheses encountered. (Note that if for some really
weird reason some User Agent string has two sets of parens, this string will
contain the entire contents from the first paren to the last, including any
intervening close and open parens. Anyway, they aren't supposed to do that, and
such a case would likely only exist in cases of spiders and homebrewed
browsers.)
useragents()
Returns an arrayref of all intelligible standard User Agent engine/version
pairs, and Opera's, to, if applicable. (Please note that this is despiute the
fact that Opera's is _not_ intelligible.)
properties()
Returns an arrayref of the stuff in details() broken up by /;\s+/
name()
The _interpreted_ name of the browser. This value may not actually appear
anywhere inside the string you handed it. Netscape Communicator provides a good
example of this oddness.
version()
Returns a hashref containing v, major, and minor, as explained below and keyed
as such.
v()
The full version of the useragent (i.e. '5.6.0')
To access as a property, grab $ua->{version}->{v}
major()
The Major version number (i.e. '5')
To access as a property, grab $ua->{version}->{major}
minor()
The Minor version number (i.e. '6.0')
To access as a property, grab $ua->{version}->{minor}
os()
The Operating System the browser is running on.
ostype()
The _interpreted_ type of the Operating System. For instance, 'Windows' rather
than 'Windows 9x 4.90'
osvers()
The _interpreted_ version of the Operating System. For instance, 'ME' rather
than '9x 4.90'
Note: Windows NT versions below 5 will show up with ostype 'Windows NT' and
osvers as appropriate. Windows NT versions 5 and up will show up as ostype
'Windows NT' and osvers '2000'. Most of you know, but for those who don't: Windows 2000 is a version of NT, not of the 9x kernel and filesystem. I'll just have
to wait and see what to expect for XP.
osarc()
While rarely defined, some User Agent strings happily announce some detail or
another about the Architecture they are running under. If this happens, it will
be reflected here. Linux ('i686') and Mac ('PPC') are more likely than Windows
to do this, strangely.
SEE ALSO
Modules
HTTP::BrowserDetect (similar goal but with an opposite approach)
Web Sites
Distribution Site - http://www.dodger.org/modules
AUTHOR
Dodger (aka Sean Cannon)
in association with the Necrosoft Network (www.necrosoft.net)
COPYRIGHT
The HTML::ParseBrowser module and code therein is
Copyright (c)2001 Sean Cannon, Bensalem, Pennsylvania.
All rights reserved. All rites reversed.
You may distribute under the terms of either the GNU General Public
License or the Artistic License, as specified in the Perl README file.