What if you have HTTP API based on protocol buffers and you want to add XML support to the API? You'd like to have one source of truth and you already have much of protocol buffer message definitions? You have unwanted option to write XSD Schema manually which is error-prone. Here I'll show how you can automate XSD schema generation from existing protocol buffers message definitions.
First step is to parse protocol buffer message definitions to more convenient format for processing.
Fortunately, protoc
compiler has switch --descriptor_set_out=FILE
which writes a FileDescriptorSet
(a protocol buffer, defined in descriptor.proto) containing all of the input files to file.
FileDescriptorSet
will contains all parsed message definitions in structured way. But protocol buffer is not very convenient format for processing. We can translate it to XML. As C# developer, I can:
- generate C# classes from descriptor.proto
- deserialize
FileDescriptorSet
output fromprotoc
compiler to this classes - serialize them to XML with System.Xml.Serialization.XmlSerializer class
With XML representation of protocol buffer message definitions we have all the possibilities of XSLT transformations.
All this stuff has already implemented in great protobuf-net's utiltiy, called protogen. Unfortunately, this utility is out of support, but you can find it included into protobuf-net v1.0.0.280 nuget package. You can download this package, you'll find protogen
in tools
subdirectory.
If you want to receive XML-view of protocol buffer message definitions, call protogen
with xml.xslt
like:
protogen -i:descriptor.proto -o:descriptor.xml -t:xml -d
The output file will looks like:
<FileDescriptorSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<file>
<FileDescriptorProto>
<name>descriptor.proto</name>
<package>google.protobuf</package>
<dependency />
<message_type>
<DescriptorProto>
<name>FileDescriptorSet</name>
<field>
<FieldDescriptorProto>
<name>file</name>
<number>1</number>
<label>LABEL_REPEATED</label>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.FileDescriptorProto</type_name>
</FieldDescriptorProto>
</field>
<extension />
<nested_type />
<enum_type />
<extension_range />
</DescriptorProto>
<DescriptorProto>
<name>FileDescriptorProto</name>
<field>
<FieldDescriptorProto>
<name>name</name>
<number>1</number>
<type>TYPE_STRING</type>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>package</name>
<number>2</number>
<type>TYPE_STRING</type>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>dependency</name>
<number>3</number>
<label>LABEL_REPEATED</label>
<type>TYPE_STRING</type>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>message_type</name>
<number>4</number>
<label>LABEL_REPEATED</label>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.DescriptorProto</type_name>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>enum_type</name>
<number>5</number>
<label>LABEL_REPEATED</label>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.EnumDescriptorProto</type_name>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>service</name>
<number>6</number>
<label>LABEL_REPEATED</label>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.ServiceDescriptorProto</type_name>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>extension</name>
<number>7</number>
<label>LABEL_REPEATED</label>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.FieldDescriptorProto</type_name>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>options</name>
<number>8</number>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.FileOptions</type_name>
</FieldDescriptorProto>
<FieldDescriptorProto>
<name>source_code_info</name>
<number>9</number>
<type>TYPE_MESSAGE</type>
<type_name>.google.protobuf.SourceCodeInfo</type_name>
</FieldDescriptorProto>
</field>
<extension />
<nested_type />
<enum_type />
<extension_range />
</DescriptorProto>
...
</message_type>
</FileDescriptorProto>
</file>
</FileDescriptorSet>
This mechanism is fully extensible: you can write custom XSLT transformation and execute it with -t:<transformation>
switch. Thanks to Marc Gravell!
I needed to create XSD schema from message definitions with the requirements:
- use XML attributes for simple types (like strings)
- correct handling of
required
andoptional
for attributes withuse="required"
anduse="optional"
XSD attributes - correct handling of
optional
for elements withminOccurs="0"
attribute - specify elements in any order (using
<xs:all />
instead of common<xs:sequential />
) - correct handling of
repeated
withmaxOccurs="unbounded"
The result is simple XSLT transformation which you can see on github.
Note that I tried it on Windows only. It is possible to run .NET under Mono, but for now XSLT contains Microsoft-specific extension to execute msxsl:node-set() function, just to make
XslCompiledTransform
happy with this XSLT. I'm not sure that msxsl extensions is included in Mono.
If you apply it to descriptor.proto
with
protogen -i:descriptor.proto -o:descriptor.xsd -t:xsd-attributes -d
you'll get the XSD which will looks like:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<!--Generated from: descriptor.proto-->
<!--Namespace: google.protobuf-->
<xs:complexType name="google.protobuf.FileDescriptorSet">
<xs:all>
<xs:element name="files">
<xs:complexType>
<xs:sequence>
<xs:element name="file" type="google.protobuf.FileDescriptorProto" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:all>
</xs:complexType>
<xs:complexType name="google.protobuf.FileDescriptorProto">
<xs:all>
<xs:element name="message_types">
<xs:complexType>
<xs:sequence>
<xs:element name="message_type" type="google.protobuf.DescriptorProto" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="enum_types">
<xs:complexType>
<xs:sequence>
<xs:element name="enum_type" type="google.protobuf.EnumDescriptorProto" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="services">
<xs:complexType>
<xs:sequence>
<xs:element name="service" type="google.protobuf.ServiceDescriptorProto" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="extensions">
<xs:complexType>
<xs:sequence>
<xs:element name="extension" type="google.protobuf.FieldDescriptorProto" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="options" type="google.protobuf.FileOptions" />
<xs:element name="source_code_info" type="google.protobuf.SourceCodeInfo" />
</xs:all>
<xs:attribute name="name" use="optional" type="xs:string" />
<xs:attribute name="package" use="optional" type="xs:string" />
</xs:complexType>
...
</xs:schema>
Update: I also have added xsd.xslt transformation which will generate XSD schemas for C# protobuf classes (generated by protogen.exe) serialized to XML with XmlSerializer class.
Hope this helps someone! You can customize XSLT for your needs.
Happy coding!
Comments