perlxstypemap - Perl XS C/Perl type mapping |
perlxstypemap - Perl XS C/Perl type mapping
The more you think about interfacing between two languages, the more you'll realize that the majority of programmer effort has to go into converting between the data structures that are native to either of the languages involved. This trumps other matter such as differing calling conventions because the problem space is so much greater. There are simply more ways to shove data into memory than there are ways to implement a function call.
Perl XS' attempt at a solution to this is the concept of typemaps. At an abstract level, a Perl XS typemap is nothing but a recipe for converting from a certain Perl data structure to a certain C data structure and vice versa. Since there can be C types that are sufficiently similar to one another to warrant converting with the same logic, XS typemaps are represented by a unique identifier, henceforth called an XS type in this document. You can then tell the XS compiler that multiple C types are to be mapped with the same XS typemap.
In your XS code, when you define an argument with a C type or when
you are using a CODE:
and an OUTPUT:
section together with a
C return type of your XSUB, it'll be the typemapping mechanism that
makes this easy.
In more practical terms, the typemap is a collection of code
fragments which are used by the xsubpp compiler to map C function
parameters and values to Perl values. The typemap file may consist
of three sections labelled TYPEMAP
, INPUT
, and OUTPUT
.
An unlabelled initial section is assumed to be a TYPEMAP
section.
The INPUT section tells the compiler how to translate Perl values
into variables of certain C types. The OUTPUT section tells the
compiler how to translate the values from certain C types into values
Perl can understand. The TYPEMAP section tells the compiler which
of the INPUT and OUTPUT code fragments should be used to map a given
C type to a Perl value. The section labels TYPEMAP
, INPUT
, or
OUTPUT
must begin in the first column on a line by themselves,
and must be in uppercase.
Each type of section can appear an arbitrary number of times
and does not have to appear at all. For example, a typemap may
commonly lack INPUT
and OUTPUT
sections if all it needs to
do is associate additional C types with core XS types like T_PTROBJ.
Lines that start with a hash #
are considered comments and ignored
in the TYPEMAP
section, but are considered significant in INPUT
and OUTPUT
. Blank lines are generally ignored.
Traditionally, typemaps needed to be written to a separate file,
conventionally called typemap
in a CPAN distribution. With
ExtUtils::ParseXS (the XS compiler) version 3.12 or better which
comes with perl 5.16, typemaps can also be embedded directly into
XS code using a HERE-doc like syntax:
TYPEMAP: <<HERE ... HERE
where HERE
can be replaced by other identifiers like with normal
Perl HERE-docs. All details below about the typemap textual format
remain valid.
The TYPEMAP
section should contain one pair of C type and
XS type per line as follows. An example from the core typemap file:
TYPEMAP # all variants of char* is handled by the T_PV typemap char * T_PV const char * T_PV unsigned char * T_PV ...
The INPUT
and OUTPUT
sections have identical formats, that is,
each unindented line starts a new in- or output map respectively.
A new in- or output map must start with the name of the XS type to
map on a line by itself, followed by the code that implements it
indented on the following lines. Example:
INPUT T_PV $var = ($type)SvPV_nolen($arg) T_PTR $var = INT2PTR($type,SvIV($arg))
We'll get to the meaning of those Perlish-looking variables in a little bit.
Finally, here's an example of the full typemap file for mapping C
strings of the char *
type to Perl scalars/strings:
TYPEMAP char * T_PV
INPUT T_PV $var = ($type)SvPV_nolen($arg)
OUTPUT T_PV sv_setpv((SV*)$arg, $var);
Here's a more complicated example: suppose that you wanted
struct netconfig
to be blessed into the class Net::Config
.
One way to do this is to use underscores (_) to separate package
names, as follows:
typedef struct netconfig * Net_Config;
And then provide a typemap entry T_PTROBJ_SPECIAL
that maps
underscores to double-colons (::), and declare Net_Config
to be of
that type:
TYPEMAP Net_Config T_PTROBJ_SPECIAL
INPUT T_PTROBJ_SPECIAL if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")){ IV tmp = SvIV((SV*)SvRV($arg)); $var = INT2PTR($type, tmp); } else croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")
OUTPUT T_PTROBJ_SPECIAL sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\", (void*)$var);
The INPUT and OUTPUT sections substitute underscores for double-colons on the fly, giving the desired effect. This example demonstrates some of the power and versatility of the typemap facility.
The INT2PTR
macro (defined in perl.h) casts an integer to a pointer
of a given type, taking care of the possible different size of integers
and pointers. There are also PTR2IV
, PTR2UV
, PTR2NV
macros,
to map the other way, which may be useful in OUTPUT sections.
The default typemap in the lib/ExtUtils directory of the Perl source
contains many useful types which can be used by Perl extensions. Some
extensions define additional typemaps which they keep in their own directory.
These additional typemaps may reference INPUT and OUTPUT maps in the main
typemap. The xsubpp compiler will allow the extension's own typemap to
override any mappings which are in the default typemap. Instead of using
an additional typemap file, typemaps may be embedded verbatim in XS
with a heredoc-like syntax. See the documentation on the TYPEMAP:
XS
keyword.
For CPAN distributions, you can assume that the XS types defined by
the perl core are already available. Additionally, the core typemap
has default XS types for a large number of C types. For example, if
you simply return a char *
from your XSUB, the core typemap will
have this C type associated with the T_PV XS type. That means your
C string will be copied into the PV (pointer value) slot of a new scalar
that will be returned from your XSUB to Perl.
If you're developing a CPAN distribution using XS, you may add your own file called typemap to the distribution. That file may contain typemaps that either map types that are specific to your code or that override the core typemap file's mappings for common C types.
Starting with ExtUtils::ParseXS version 3.13_01 (comes with perl 5.16
and better), it is rather easy to share typemap code between multiple
CPAN distributions. The general idea is to share it as a module that
offers a certain API and have the dependent modules declare that as a
built-time requirement and import the typemap into the XS. An example
of such a typemap-sharing module on CPAN is
ExtUtils::Typemaps::Basic
. Two steps to getting that module's
typemaps available in your code:
ExtUtils::Typemaps::Basic
as a build-time dependency
in Makefile.PL
(use BUILD_REQUIRES
), or in your Build.PL
(use build_requires
).
Include the following line in the XS section of your XS file:
(don't break the line)
INCLUDE_COMMAND: $^X -MExtUtils::Typemaps::Cmd -e "print embeddable_typemap(q{Basic})"
Each INPUT or OUTPUT typemap entry is a double-quoted Perl string that will be evaluated in the presence of certain variables to get the final C code for mapping a certain C type.
This means that you can embed Perl code in your typemap (C) code using
constructs such as
${ perl code that evaluates to scalar reference here }
. A common
use case is to generate error messages that refer to the true function
name even when using the ALIAS XS feature:
${ $ALIAS ? \q[GvNAME(CvGV(cv))] : \qq[\"$pname\"] }
For many typemap examples, refer to the core typemap file that can be found in the perl source tree at lib/ExtUtils/typemap.
The Perl variables that are available for interpolation into typemaps are the following:
:
replaced with
_
.
e.g. for a type of Foo::Bar
, $type is Foo__Bar
$ntype - the supplied type with *
replaced with Ptr
.
e.g. for a type of Foo*
, $ntype is FooPtr
$arg - the stack entry, that the parameter is input from or output
to, e.g. ST(0)
$argoff - the argument stack offset of the argument. ie. 0 for the
first argument, etc.
$pname - the full name of the XSUB, with including the PACKAGE
name, with any PREFIX
stripped. This is the non-ALIAS name.
$Package - the package specified by the most recent PACKAGE
keyword.
$ALIAS - non-zero if the current XSUB has any aliases declared with
ALIAS
.
Each C type is represented by an entry in the typemap file that is responsible for converting perl variables (SV, AV, HV, CV, etc.) to and from that type. The following sections list all XS types that come with perl by default.
Note that this typemap does not decrement the reference count when returning the reference to an SV*. See also: T_SVREF_REFCOUNT_FIXED
Note that this typemap does not decrement the reference count when returning an AV*. See also: T_AVREF_REFCOUNT_FIXED
Note that this typemap does not decrement the reference count when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED
Note that this typemap does not decrement the reference count when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED
This is a fixed variant of T_HVREF that decrements the refcount appropriately when returning an HV*. Introduced in perl 5.15.4.
System calls return -1 on error (setting ERRNO with the reason)
and (usually) 0 on success. If the return value is -1 this typemap
returns undef
. If the return value is not -1, this typemap
translates a 0 (perl false) to ``0 but true'' (which
is perl true) or returns the value itself, to indicate that the
command succeeded.
The POSIX module makes extensive use of this type.
int
type on the current platform). When returning
the value to perl it is processed in the same way as for T_IV.
Its behaviour is identical to using an int
type in XS with T_IV.
unsigned int
.
The default type for unsigned int
is T_UV.
short
. The default typemap for short
is T_IV.
unsigned short
. The default typemap for
unsigned short
is T_UV.
T_U_SHORT is used for type U16
in the standard typemap.
long
. The default typemap for long
is T_IV.
unsigned long
. The default typemap for
unsigned long
is T_UV.
T_U_LONG is used for type U32
in the standard typemap.
float
.
double
.
void *
type.
The typemap checks that a scalar reference is passed from perl to XS.
The pointer is blessed into a class that is derived from the name of type of the pointer but with all '*' in the name replaced with 'Ptr'.
For DESTROY
XSUBs only, a T_PTROBJ is optimized to a T_PTRREF. This means
the class check is skipped.
The pointer is blessed into a class that is derived from the name of type of the pointer but with all '*' in the name replaced with 'Ptr'.
For DESTROY
XSUBs only, a T_REF_IV_PTR is optimized to a T_PTRREF. This
means the class check is skipped.
Only the INPUT part of this is implemented (Perl to XSUB) and there are no known users in core or on CPAN.
For DESTROY
XSUBs only, a T_REFOBJ is optimized to a T_REFREF. This means
the class check is skipped.
length()
will report a value of 8). This entry is similar to T_OPAQUE.
In principle the unpack()
command can be used to convert the bytes
back to a number (if the underlying type is known to be a number).
This entry can be used to store a C structure (the number
of bytes to be copied is calculated using the C sizeof
function)
and can be used as an alternative to T_PTRREF without having to worry
about a memory leak (since Perl will clean up the SV).
The data may be retrieved using the unpack
function if the
underlying type of the byte stream is known.
T_OPAQUE supports input and output of simple types. T_OPAQUEPTR can be used to pass these bytes back into C if a pointer is acceptable.
array(type, nelem)
xsubpp will copy the contents of nelem * sizeof(type)
bytes from
RETVAL to an SV and push it onto the stack. This is only really useful
if the number of items to be returned is known at compile time and you
don't mind having a string of bytes in your SV. Use T_ARRAY to push a
variable number of arguments onto the return stack (they won't be
packed as a single string though).
This is similar to using T_OPAQUEPTR but can be used to process more than one element.
OUTPUT
(XSUB to Perl), a function named XS_pack_$ntype
is called
with the output Perl scalar and the C variable to convert from.
$ntype
is the normalized C type that is to be mapped to
Perl. Normalized means that all *
are replaced by the
string Ptr
. The return value of the function is ignored.
Conversely for INPUT
(Perl to XSUB) mapping, the
function named XS_unpack_$ntype
is called with the input Perl
scalar as argument and the return value is cast to the mapped
C type and assigned to the output C variable.
An example conversion function for a typemapped struct
foo_t *
might be:
static void XS_pack_foo_tPtr(SV *out, foo_t *in) { dTHX; /* alas, signature does not include pTHX_ */ HV* hash = newHV(); hv_stores(hash, "int_member", newSViv(in->int_member)); hv_stores(hash, "float_member", newSVnv(in->float_member)); /* ... */
/* mortalize as thy stack is not refcounted */ sv_setsv(out, sv_2mortal(newRV_noinc((SV*)hash))); }
The conversion from Perl to C is left as an exercise to the reader, but the prototype would be:
static foo_t * XS_unpack_foo_tPtr(SV *in);
Instead of an actual C function that has to fetch the thread context
using dTHX
, you can define macros of the same name and avoid the
overhead. Also, keep in mind to possibly free the memory allocated by
XS_unpack_foo_tPtr
.
INPUT
(Perl
to XSUB) typemap is identical, but the OUTPUT
typemap passes
an additional argument to the XS_pack_$ntype
function. This
third parameter indicates the number of elements in the output
so that the function can handle C arrays sanely. The variable
needs to be declared by the user and must have the name
count_$ntype
where $ntype
is the normalized C type name
as explained above. The signature of the function would be for
the example above and foo_t **
:
static void XS_pack_foo_tPtrPtr(SV *out, foo_t *in, UV count_foo_tPtrPtr);
The type of the third parameter is arbitrary as far as the typemap is concerned. It just has to be in line with the declared variable.
Of course, unless you know the number of elements in the
sometype **
C array, within your XSUB, the return value from
foo_t ** XS_unpack_foo_tPtrPtr(...)
will be hard to decipher.
Since the details are all up to the XS author (the typemap user),
there are several solutions, none of which particularly elegant.
The most commonly seen solution has been to allocate memory for
N+1 pointers and assign NULL
to the (N+1)th to facilitate
iteration.
Alternatively, using a customized typemap for your purposes in the first place is probably preferable.
The usual calling signature is
@out = array_func( @in );
Any number of arguments can occur in the list before the array but the input and output arrays must be the last elements in the list.
When used to pass a perl list to C the XS writer must provide a
function (named after the array type but with 'Ptr' substituted for
'*') to allocate the memory required to hold the list. A pointer
should be returned. It is up to the XS writer to free the memory on
exit from the function. The variable ix_$var
is set to the number
of elements in the new array.
When returning a C array to Perl the XS writer must provide an integer
variable called size_$var
containing the number of elements in the
array. This is used to determine how many elements should be pushed
onto the return argument stack. This is not required on input since
Perl knows how many arguments are on the stack when the routine is
called. Ordinarily this variable would be called size_RETVAL
.
Additionally, the type of each element is determined from the type of
the array. If the array uses type intArray *
xsubpp will
automatically work out that it contains variables of type int
and
use that typemap entry to perform the copy of each element. All
pointer '*' and 'Array' tags are removed from the name to determine
the subtype.
FILE *
structures.
PerlIO *
structures. The file handle can used for reading and
writing. This corresponds to the +<
mode, see also T_IN
and T_OUT.
See the perliol manpage for more information on the Perl IO abstraction
layer. Perl must have been built with -Duseperlio
.
There is no check to assert that the filehandle passed from Perl
to C was created with the right open()
mode.
Hint: The the perlxstut manpage tutorial covers the T_INOUT, T_IN, and T_OUT XS types nicely.
<
).
+>
.
perlxstypemap - Perl XS C/Perl type mapping |