During the course of this quickstart we will take a look at defining an additional data schema, constructing a fitting additional-data.xml file, as well as attaching and extracting it from an existing pdf file.

defining a simple additional data schema

For our example we take a look at the ZUGFeRD [1] 1.0 standard. This data format allows for you to send your invoice data in a hybrid file format consisting of a pdf file and an embedded xml file. Giving your recipients the opportunity to process said invoice either by viewing the rendered pages of the pdf or they may process the embedded structured metadata file. The used example pdf files as well as a formal and technical description of the ZUGFeRD data standard can be downloaded here .

To fire up our example, let’s say we have partners that need individual order numbers for every invoice position, to help them clear the invoice data with regards to their order database. On top they would like you to split special charges like tolls etc. on the positions as well. That makes up for 2 newly needed indexes. We add those into our spreadsheet.


Before we start defining how we model this information technically, we ask our partners and our co-workers how these data do indeed look like. After all, we want to define a data structure that’s not only for our own use. It shows that all order numbers in our field of work alphanumeric constructs, mostly digits, maybe some letters are in the mix. As for special charges we find that everybody measures them as a decimal number, although decimal places used for the fraction do vary, combined with a currency identifier. Both metadata occur only once on a position.

Now we are ready to fill in the technical description of our data. As first step, we come up with two technical identifiers. The technical alias for the order number on position shall be onpos and pscpos shall identify the partial special charges on position. Both may occur at most once. To be a little more precise, we define that, order numbers shall be represented by a String type with at least one and at most 40 characters. Whereas the special charges shall be given as the combined information of an amount and a currency [2] .


Both data items are considered additions not alternatives.


Lastly we define the valid references for our additional data. In this case we only care about ZUGFeRD 1.0 and facture-X reference points, thus we only define those two in the first draft of our standard. If the need for other basic data structures arises later on, we can define reference points for other basic structures later on. Our finished spreadsheet now looks like this


constructing a simple ad xml for a ZUGFeRD-invoice.xml

We consider the basic xsd as sufficient for our purposes and leave it untouched. The last thing we need to do now is construct an example file.

from additional_data import Additional_data_xml
from lxml import etree

def __main__():
    # fill data structure with different types of additional and alternative
    # information
    my_xml = Additional_data_xml(schema_identifier="http://4s4u.de/additional_data/adcollection/base_all_1.0")
    my_xml.insert_additional_data("onpos", "123456", "{urn:ferd:CrossIndustryDocument:invoice:1p0}SpecifiedSupplyChainTradeTransaction/{urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:12}IncludedSupplyChainTradeLineItem[2]")
    as_xml = etree.fromstring("<pscpos><currency>EUR</currency><amount>1.23</amount></pscpos>").getroottree()
    my_xml.insert_additional_data("pscpos", as_xml, "{urn:ferd:CrossIndustryDocument:invoice:1p0}SpecifiedSupplyChainTradeTransaction/{urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:12}IncludedSupplyChainTradeLineItem[2]")
    print("is valid against the schema:", my_xml.check_schema('additional_data_base_schema.xsd'))

if __name__ == '__main__':

This constructs our first simple additional data xml, tests it against the base schema and prints the resulting structure to console. What we see in stdout, also get’s dumped to a xml file.

$ python3 example1_create_ad.py
is valid against the schema: True
<adt:additionalOrAlternateData xmlns:adt="http://4s4u.de/additional_data/adcollection/base_all_1.0">
  <adt:referencedData type="URI">test.pdf</adt:referencedData>
    <adt:data type="{http://4s4u.de/additional_data/adcollection/base_all_1.0}onpos">123456</adt:data>
    <adt:referencedElement type="XPath">{urn:ferd:CrossIndustryDocument:invoice:1p0}SpecifiedSupplyChainTradeTransaction/{urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:12}IncludedSupplyChainTradeLineItem[2]</adt:referencedElement>
    <adt:data type="{http://4s4u.de/additional_data/adcollection/base_all_1.0}pscpos">
    <adt:referencedElement type="XPath">{urn:ferd:CrossIndustryDocument:invoice:1p0}SpecifiedSupplyChainTradeTransaction/{urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:12}IncludedSupplyChainTradeLineItem[2]</adt:referencedElement>

We can now embed the neat little file in a pdf of our liking (although it should be named “test.pdf” since we set the basic object reference that way. With the package comes a handy shell command, that fits our needs here. Assuming you have a test.pdf file at hand (e.g. one from the ZUGFeRD-1.0 package examples).

$ additional_data_util test.pdf additional_data-logistics-invoice-1.0-uniquepart.xml additional_data_base_schema.xsd

You will find your self-created additional data xml file embedded in the test.pdf, ready to be sent to your business partner. This concludes the quickstart, you are now ready to explore and experiment with the package. Any comments, errors or bugs may be reported to mail_andreas

[1]To learn more about the ZUGFeRD data structure, please visit https://www.ferd-net.de/zugferd/definition/index.html .
[2]Complex types should not use attributes but store all qualified data in the text of their own tag, which makes it easy to define a fitting xsd for validation purposes.