NAME
XML::Pastor - Generate Perl classes with XML bindings starting from a
W3C XSD Schema
SYNOPSIS
use XML::Pastor;
my $pastor = XML::Pastor->new();
# Generate MULTIPLE modules, one module for each class, and put them under destination.
$pastor->generate(
mode =>'offline',
style => 'multiple',
schema=>'/some/path/to/schema.xsd',
class_prefix=>'MyApp::Data::',
destination=>'/tmp/lib/perl/',
);
# Generate a SINGLE module which contains all the classes and put it under destination.
# Note that the schema may be read from a URL too.
$pastor->generate(
mode =>'offline',
style => 'single',
schema=>'http://some/url/to/schema.xsd',
class_prefix=>'MyApp::Data::',
module => 'Module',
destination=>'/tmp/lib/perl/',
);
# Generate classes in MEMORY, and EVALUATE the generated code on the fly.
# (Run Time code generation)
$pastor->generate(
mode =>'eval',
schema=>'/some/path/to/schema.xsd',
class_prefix=>'MyApp::Data::'
);
# Same thing, with a maximum of DEBUG output on STDERR
$pastor->generate(
mode =>'eval',
schema=>'/some/path/to/schema.xsd',
class_prefix=>'MyApp::Data::',
verbose = 9
);
And somewhere in an other place of the code ... (Assuming a global XML
element 'country' existed in you schema and hence been generated by
Pastor).
my $country = MyApp::Data::country->from_xml_file('/some/path/to/country.xml'); # retrieve from a file
$country = MyApp::Data::country->from_xml_url('http://some/url/to/country.xml'); # or from a URL
$country = MyApp::Data::country->from_xml_fh($fh); # or from a file handle
$country = MyApp::Data::country->from_xml_dom($dom); # or from DOM (a XML::LibXML::Node or XML::LibXML::Document)
# or from an XML string
$country = MyApp::Data::country->from_xml_string(<<'EOF');
EOF
# or if you don't know if you have a file, URL, FH, or string
$country = MyApp::Data::country->from_xml('http://some/url/to/country.xml');
# Now you can manipulate your country object.
print $country->name; # prints "France"
print $country->city->[0]->name; # prints "Paris"
# Let's make some changes
$country->code('fr');
$country->name('FRANCE');
my $class=$country->xml_field_class('city');
my $city = $class->new();
$city->code('MRS');
$city->name('Marseille');
push @{$country->city}, $city;
print $country->city->[2]->name; # prints "Marseille"
# Time to validate our XML
$country->xml_validate(); # This one will DIE on failure
if ($country->is_xml_valid()) { # This one will not die.
print "ok\n";
}else {
print "Validation error : $@\n"; # Note that $@ contains the error message
}
# Time to write the the object back to XML
$country->to_xml_file('some/path/to/country.xml'); # To a file
$country->to_xml_url('http://some/url/to/country.xml'); # To a URL
$country->to_xml_fh($fh); # To a FILE HANDLE
my $dom=$country->to_xml_dom(); # To a DOM Node (XML::LibXML::Node)
my $dom=$country->to_xml_dom_document(); # To a DOM Document (XML::LibXML::Document)
my $xml=$country->to_xml_string(); # To a string
my $frag=$country->to_xml_fragment(); # Same thing without the part
DESCRIPTION
Java had CASTOR, and now Perl has XML::Pastor!
If you know what Castor does in the Java world, then XML::Pastor should
be familiar to you. If you have a W3C XSD schema, you can generate Perl
classes with roundtrip XML bindings.
Whereas Castor is limited to offline code generation, XML::Pastor is
able to generate Perl classes either offline or at run-time starting
from a W3C XSD Schema. The generated classes correspond to the global
elements, complex and simple type declarations in the schema. The
generated classes have full XML binding, meaning objects belonging to
them can be read from and written to XML. Accessor methods for
attributes and child elements will be generated automatically.
Furthermore it is possible to validate the objects of generated classes
against the original schema although the schema is typically no longer
accessible.
XML::Pastor defines just one method, '*generate()*', but the classes it
generates define many methods which may be found in the documentation of
XML::Pastor::ComplexType and XML::Pastor::SimpleType from which all
generated classes descend.
In '*offline*' mode, it is possible to generate a single module with all
the generated clasess or multiple modules one for each class. The
typical use of the offline mode is during a 'make' process, where you
have a set of XSD schemas and you generate your modules to be later
installed by the 'make install'. This is very similar to Java Castor's
behaviour. This way your XSD schemas don't have to be accessible during
run-time and you don't have a performance penalty.
Perl philosophy dictates however, that There Is More Than One Way To Do
It. In '*eval*' (run-time) mode, the XSD schema is processed at run-time
giving much more flexibility to the user. This added flexibility has a
price on the other hand, namely a performance penalty and the fact that
the XSD schema needs to be accessible at run-time. Note that the
performance penalty applies only to the code genereration (pastorize)
phase; the generated classes perform the same as if they were generated
offline.
METHODS
new() (CONSTRUCTOR)
The new() constructor method instantiates a new XML::Pastor object.
my $pastor = XML::Pastor->new();
This is currently unnecessary as the only method ('*generate*') is a
class method. However, it is higly recommended to use it and call
'generate' on an object (rather than the class) as in the future,
'generate' may no longer be a class method.
generate(%options)
Currently a CLASS METHOD, but may change to be an OBJECT METHOD in the
future. It works when called on an OBJECT too at this time.
This method is the heart of the module. It will accept a schema file
name or URL as input (among some other parameters) and proceed to code
generation.
This method will parse the schema(s) given by the "schema" parameter and
then proceed to code generation. The generated code will be written to
disk (mode=>"offline") or evaluated at run-time (mode=>"eval") depending
on the value of the "mode" parameter.
In "offline" mode, the generated classes will either all be put in one
"single" big code block, or in "multiple" module files (one for each
class) depending on the "style" parameter. Again in "offline" mode, the
generated modules will be written to disk under the directory prefix
given by the "destination" parameter.
In any case, the names of the generated classes will be prefixed by the
string given by the "class_prefix" parameter. It is possible to indicate
common ancestors for generated classes via the "complex_isa" and
"simple_isa" parameters.
This metod expects the following parameters:
schema
This is the file name or the URL to the W3C XSD schema file to be
processed. Experimentally, it can also be a string containing schema
XSD.
Be careful about the paths that are mentioned for any included
schemas though. If these are relative, they will be taken realtive
to the current schema being processed. In the case of a schema
string, the resolution of relative paths for the included schemas is
undefined.
Currently, it is also possible to pass an array reference to this
parameter, in which case the schemas will be processed in order and
merged to the same model for code generation. Just make sure you
don't have name collisions in the schemas though.
mode
This parameter effects what actuallly will be done by the method.
Either offline code generation, or run-time code evaluation, or just
returning the generated code.
offline
Default.
In this mode, the code generation is done 'offline', that is,
similar to Java's Castor way of doing things, the generated code
will be written to disk on module files under the path given by
the "destination" parameter.
In '*offline*' mode, it is possible to generate a single module
with all the generated clasess or multiple modules one for each
class, depending on the value of the "style" parameter.
The typical use of the *offline* mode is during a 'make'
process, where you have a set of XSD schemas and you generate
your modules to be later installed by 'make install'. This is
very similar to Java Castor's behaviour. This way your XSD
schemas don't have to be accessible during run-time and you
don't have a performance penalty.
# Generate MULTIPLE modules, one module for each class, and put them under destination.
my $pastor = XML::Pastor->new();
$pastor->generate(
mode =>'offline',
style => 'multiple',
schema=>'/some/path/to/schema.xsd',
class_prefix=>'MyApp::Data::',
destination=>'/tmp/lib/perl/',
);
eval
In '*eval*' (run-time) mode, the XSD schema is processed at
run-time giving much more flexibility to the user. In this mode,
no code will be written to disk. Instead, the generated code
(which is necessarily a "single" block) will be evaluated before
returning to the caller.
The added flexibility has a price on the other hand, namely a
performance penalty and the fact that the XSD schema needs to be
accessible at run-time. Note that the performance penalty
applies only to the code genereration (pastorize) phase; the
generated classes perform the same as if they were generated
offline.
Note that '*eval*' mode forces the "style" parameter to have a
value of '*single*';
# Generate classes in MEMORY, and EVALUATE the generated code on the fly.
my $pastor = XML::Pastor->new();
$pastor->generate(
mode =>'eval',
schema=>'/some/path/to/schema.xsd',
class_prefix=>'MyApp::Data::'
);
return
In '*return*' mode, the XSD schema is processed but no code is
written to disk or evaluated. In this mode, the method just
returns the generated block of code as a string, so that you may
use it to your liking. You would typically be evaluating it
though.
Note that '*return*' mode forces the "style" parameter to have a
value of '*single*';
style
This parameter determines if XML::Pastor will generate a single
module where all classes reside ("single"), or multiple modules one
for each class ("multiple").
Some modes (such as "eval" and "return")force the style argument to
be '*single*'.
Possible values are :
single
One block of code containg all the generated classes will be
produced.
multiple
A separate piece of code for each class will be produced.
class_prefix
If present, the names of the generated classes will be prefixed by
this value. You may end the value with '::' or not, it's up to you.
It will be autocompleted. In other words both 'MyApp::Data' and
'MyApp::Data::' are valid.
destination
This is the directory prefix where the produced modules will be
written in *offline* mode. In other modes (*eval* and *return*), it
is ignored.
Note that the trailing slash ('/') is optional. The default value
for this parameter is '/tmp/lib/perl/'.
module
This parameter has sense only when generating one big chunk of code
("style" => "single") in offline "mode".
It denotes the name of the module (without the .pm extension) that
will be written to disk in this case.
complex_isa
Via this parameter, it is possible to indicate a common ancestor (or
ancestors) of all complex types that are generated by XML::Pastor.
The generated complex types will still have XML::Pastor::ComplexType
as their last ancestor in their @ISA, but they will also have the
class whose name is given by this parameter as their first ancestor.
Handy if you would like to add common behaviour to all your
generated classes.
This parameter can have a string value (the usual case) or an array
reference to strings. In the array case, each item is added to the
@ISA array (in that order) of the generated classes.
simple_isa
Via this parameter, it is possible to indicate a common ancestor (or
ancestors) of all simple types that are generated by XML::Pastor.
The generated simple types will still have XML::Pastor::SimpleType
as their last ancestor in their @ISA, but they will also have the
class whose name is given by this parameter as their first ancestor.
Handy if you would like to add common behaviour to all your
generated classes.
This parameter can have a string value (the usual case) or an array
reference to strings. In the array case, each item is added to the
@ISA array (in that order) of the generated classes.
verbose
This parameter indicates the desired level of verbosity of the
output. A value of zero (0), which is the default, indicates
'silent' operation where only a fatal error will result in a 'die'
which will in turn write on STDERR. A higher value of 'verbose'
indicates more and more chatter on STDERR.
SCHEMA SUPPORT
The version 1.0 of W3C XSD schema (2001) is supported almost in full,
albeit with some exceptions (see "BUGS & CAVEATS"). Such things as
complex and simple types, global elements, groups, attributes, and
attribute groups are supported. Type declarations can either be global
or done locally. Complex type derivation by extension and simple type
derivation by restriction is supported. All the basic W3C builtin types
are supported. Unions and lists are supported. Most of the restriction
facets for simple types are supported (*length, minLength, maxLength,
pattern, enumeration, minInclusive, maxInclusive, minExclusive,
maxExclusive, totalDigits, fractionDigits*).
Schema inclusion (include) and redefinition (redefine) are supported,
allthough for 'redefine' not much testing was done.
Namespaces are supported in as much as there is no more than one
namespace for a given schema. 'Import' is not supported because of this.
Neither elements with 'mixed' content nor substitution groups are
supported at this time.
HOW IT WORKS
The source code of the "generate()" method looks like this:
sub generate {
my $self = shift;
my $parser =XML::Pastor::Schema::Parser->new();
my $model = $parser->parse(@_);
$model->resolve(@_);
my $generator = XML::Pastor::Generator->new();
my $result = $generator->generate(@_, model=>$model);
return $result;
}
At code generation time, XML::Pastor will first parse the schema(s) into
a schema model (XML::Pastor::Schema::Model). The model contains all the
schema information in perl data structures. All the global elements,
types, attributes, groups, and attribute groups are put into this model.
Then, the model is 'resolved', i.e. the references ('ref') are resolved,
class names are determined and so on. Then, comes the code generation
stage where your classes are generated according to the given options.
In offline mode, this phase will write out the generated code onto
modules on disk. Otherwise it can also 'eval' the generated code for
you.
The generated classes will contain class data named 'XmlSchemaType'
(thanks to Class::Data::Inheritable), which will contain all the schema
model information that corresponds to this type. For a complex type, it
will contain information about child elements and attributes. For a
simple type it will contain the restriction facets that may exist and so
on.
For complex types, the generated classes will also have accessors for
the attributes and child elements of that type (thanks to
Class::Accessor). However, you can also use direct hash access as the
objects are just blessed hash references. The fields in the has
correspond to attributes and child elements of the complex type. You can
also store additional non-XML data in these objects. Such fields are
silently ignored during validation and XML serialization. This way, your
objects can have state information that is not stored in XML. Just make
sure the names of these fields do not coincide with XML attributes and
child elements though.
The inheritance of classes are also managed by XML::Pastor for you.
Complex types that are derived by extension will automatically be a
descendant of the base class. Same applies to the simple types derived
by restriction. Global elements will always be a descendant of some
type, which may sometimes be implicitely defined. Global elements will
have an added ancestor XML::Pastor::Element and will also contain an
extra class data accessor "XmlSchemaElement" which will contain schema
information about the model. This class data is currently used mainly to
get at the name of the element when an object of this class is stored in
XML (as ComplexTypes don't have an element name).
Then you *use* the generated modules. If the generation was offline, you
actually need a 'use' statement. If it was an 'eval', you can start
using your generated classes immediately. At this time, you can call
many methods on the generated classes that enable you to create,
retrieve and save an object from/to XML. There are also methods that
enable you to validate these objects against schema information.
Furthermore, you can call the accessors that were automagically created
for you on class generation for getting at the fields of complex
objects. Since all the schema information is saved as class data, the
schema is no longer needed at run-time.
NAMING CONVENTIONS FOR GENERATED CLASSES
The generated classes will all be prefixed by the string given by the
"class_prefix" parameter. The rest of this section assumes that
"class_prefix" is "MyApp::Data".
Classes that correspond to global elements will keep the name of the
element. For example, if there is an element called 'country' in the
schema, the corresponding clas will have the name
'MyApp::Data::country'. Note that no change in case occurs.
Classes that correspond to global complex and simple types will be put
under the 'Type' subtree. For example, if there is a complex type called
'City' in the XSD schema, the corresponding class will be called
'MyApp::Data::Type::City'. Note that no change in case occurs.
Implicit types (that is, types that are defined *inline* in the schema)
will have auto-generated names within the 'Type' subtree. For example,
if the 'population' element within 'City' is defined by an implicit
type, its corresponding class will be
'MyApp::Data::Type::City_population'.
Sometimes implicit types need more to disambiguate their names. In that
case, an auto-incremented sequence is used to generate the class names.
In any case, do not count on the names of the classes for implicit
types. The naming convention for those may change. In other words, do
not reference these classes by their names in your program. You have
been warned.
SUGGESTED NAMING CONVENTIONS FOR XML TYPES, ELEMENTS AND ATTRIBUTES IN W3C SCHEMAS
Sometimes you will be forced to use a W3C schema defined by someone
else. In that case, you will not have a choice for the names of types,
elements, and attributes defined in the schema.
But most often, you will be the one who defines the W3C schema itself.
So you will have full power over the names within.
As mentioned earlier, XML::Pastor will generate accesor methods for the
child elements and attributes of each class. Since there exist some
utility methods defined under XML::Pastor::ComplexType and
XML::Pastor::SimpleType that are the ancestors of all the generated
classes from your schema there is a risk of name collisions. Below is a
list of suggestions that will ensure that there are no name collisions
within your schema and with the defined methods.
Avoid child Elements and attributes with the same name
Never use the same name for an attribute and a child element of the
same complex type or element within your schema. For instance, if
you have an attribute called 'title' within a Complex type called
'Person', do not in any circumstance create a child element with the
same name 'title'. Although this is technically possible under W3C
schema, XML::Pastor will be confused in this case. The hash field of
an object will contain one or the other (not both). The behavior of
the accessor 'title' will be undefined in this case. Please do not
count on any behavior that may exist currently on this subjet as it
may change at any time.
Element and attribute names should start with lower case
Element ant attribute names (incuding global ones) should start with
lower case and be uppercased at word boundries. Example :
"firstName", "lastName". Do not use underscore to separate words as
this may open up a possibility for name collisions of accessors with
the names of utility methods defined under XML::Pastor::ComplexType
and XML::Pastor::SimpleType.
Element and attribute names should not coincide with builtin method
names of XML::Pastor::ComplexType
Element ant attribute names (incuding global ones) should not
coincide with builtin method names defined under
XML::Pastor::ComplexType as this will cause a name collision with
the generated accessor method. Extra care should be taken for the
methods called 'get', 'set', and 'grab' as these are one-word
builtin method names. Same goes for 'isa' and 'can' that come from
Perl's UNIVERSAL package. Multiple word method names should not
normally cause trouble if you abide by the principle of not using
underscore for separating words in element and attribute names. See
XML::Pastor::ComplexType for the names of other builtin methods for
the generated classes.
Global complex and simple types should start with upper case
The names of global types (complex and simple) should start with an
upper case and continue with lower case. Word boundries should be
uppercased. This resembles the package name convention in Perl.
Example : 'City', 'Country', 'CountryCode'.
You are free to name global groups and attribute groups to your liking.
BUGS & CAVEATS
There no known bugs at this time, but this doesn't mean there are aren't
any. Note that, although some testing was done prior to releasing the
module, this should still be considered alpha code. So use it at your
own risk.
There are known limitations however:
* Namespaces
The namespace support is somewhat shaky. Currently at most one
*targetNamspace* is supported. Multiple target namespaces are not
supported. That's why schema 'import' facility does not work.
* Schema import
The 'import' element of W3C XSD schema is not supported at this
time. This is basically because of namespace complexities. If you
think of a way to support the 'import' feature, please let me know.
* 'mixed' elements
Elements with 'mixed' content (text and child elements) are not
supported at this time.
* substitution groups
Substitution groups are not supported at this time.
* Encoding
Only the UTF-8 encoding is supported. You should make sure that your
data is in UTF-8 format. It may be possible to read (but not write)
XML from other encodings. But this feature is not tested at this
time.
* Default values for attributes
Default values for attributes are not supported at this time. If you
can think of a simple way to support this, please let me know.
Note that there may be other bugs or limitations that the author is not
aware of.
AUTHOR
Ayhan Ulusoy
COPYRIGHT
Copyright (C) 2006-2008 Ayhan Ulusoy. All Rights Reserved.
This module is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
DISCLAIMER
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH
YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE
TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
SEE ALSO
See also XML::Pastor::ComplexType, XML::Pastor::SimpleType
If you are curious about the implementation, see also
XML::Pastor::Schema::Parser, XML::Pastor::Schema::Model,
XML::Pastor::Generator.