There is no right or wrong way to create an XML Schema that represents a data format, but this page gives some tips from my own personal experience that might help. If you are fuzzing XML then very often you already have an XML Schema, however if you don't then you can use a tool that ships with Microsoft Visual Studio called xsd.exe to create an XML Schema from an XML file. If you are not fuzzing XML then you may need to create the XML Schema yourself and this page goes through an example of how to do this.
Lets try and create an XML Schema for the Bitmap file format. Conveniently, Wikipedia gives an excellent description of the file format here.
Since I typically use Microsoft Visual Studio to create my XML Schema, I am going to walk through that as an example. So open Visual Studio and from the menu select File->New->File. Select the General Category and then select the XML Schema Template. Below is what you should see:
By default Visual Studio sets the namespace for the XML Schema to be "http://tempuri.org/XMLSchema.xsd", however this should be changed to be relevant to the data format that this XSD will describe. The namespace can be any URI, I tend to use the format urn:Fuzzware.Project.FormatName (where FormatName is the name of the format). So go ahead and replace "http://tempuri.org/XMLSchema.xsd" with "urn.Fuzzware.Projects.BMP". We should also use a meaningful namespace prefix for our namespace, so change 'mstns' to 'bmp'.
A note on what the above attributes mean as well:
- targetNamespace - this is the namespace that all the elements, attributes and types of this schema will belong to
- elementFormDefault - Fuzzware will work best if this always has the value 'qualified'. Basically using 'qualified' allows Fuzzware to easily know the namespace of an element regardless of where it is declared.
- xmlns="…" - This declares the default namespace of the schema, that is any element, attribute or type that is referenced without a prefix will be assumed to come from this namespace.
- xmlns:mstns="…" - This declares a namespace and associates a prefix with it that can be used as shorthand reference to the namespace. The prefix in this case is 'mstns' but it can be any value.
- xmlns:xs="http://www.w3.org/2001/XMLSchema" - The namespace for XML Schema itself. The prefix should always be either 'xs' or 'xsd'.
After our changes we should have (feel free to save the XSD at this stage).
Now we are ready to start defining the elements of our XML schema. We will always require a root element for the schema, this is the element that will be the root node of our XML document and I tend to give it the same name as the format I am defining. We know our root node will contain other nodes so it needs to be a complex type, and the nodes it contains will usually occur in a sequence.
So that is the basics before we have even looked at our data format.
On the Bitmap format definition page, it says there are 4 main parts to a BMP file:
- BMP File Header
- Bitmap Information
- Colour Palette
- Bitmap Data
We will need to define each of these in turn.
The BMP File Header is defined as
| Offset# || Size || Purpose |
| 0000h|| 2 bytes || the magic number used to identify the BMP file: 0x42 0x4D (Hex code points for B and M). The following entries are possible: |
• BM - Windows 3.1x, 95, NT, ... etc
• BA - OS/2 Bitmap Array
• CI - OS/2 Color Icon
• CP - OS/2 Color Pointer
• IC - OS/2 Icon
PT - OS/2 Pointer
| 00002h|| 4 bytes|| the size of the BMP file in bytes|
| 0006h|| 2 bytes|| reserved; actual value depends on the application that creates the image|
| 0008h|| 2 bytes|| reserved; actual value depends on the application that creates the image|
| 000Ah|| 4 bytes|| the offset, i.e. starting address, of the byte where the bitmap data can be found.|
So lets define a new element called 'FileHeader' and add child elements for each part of the header above. We are actually going to cheat a little here and make the magic number a string (because it's easier to read) and combine the two lots of 2 byte reserved parts into one 4 byte reserved part. Generally speaking though your XML schemas should mirror the parts of the format and types of these element should be appropriate for the number of bytes required.
So we set the Signature type to string and gave it a fixed value, this will produce the required 2 byte output but only if the output format specified in Fuzzware (Project->Project Properties) is set to 'us-ascii'. We could have set the signature to be of type 'xs:hexBinary' and given it a fixed value of "424D", but it's easier to read a string.
If you are not sure what XML Schema type to use for a given byte size, Fuzzware uses .Net so the mapping is
| Byte Size || XML Schema Type |
| 1|| xs:byte or xs:unsignedByte |
| 2|| xs:short or xs:unsignedShort |
| 4|| xs:int or xs:unsignedInt|
xs:long or xs:unsignedLong
XML Schema also has a type xs:integer but this should never be used because it represent an arbitrarily large integer and hence does not correspond to a fixed number of bytes.
The next part of the BMP format is Bitmap Information, and this is defined on the BMP definition page as:
| Offset# || Size || Purpose |
| 000Eh|| 4 bytes || the size of this header (40 bytes) |
| 0012h|| 4 bytes|| the bitmap width in pixels (signed integer).|
| 0016h|| 4 bytes|| the bitmap height in pixels (signed integer).|
| 001Ah|| 2 bytes|| the number of color planes being used. Must be set to 1.|
| 001Ch|| 2 bytes|| the number of bits per pixel, which is the color depth of the image. Typical values are 1, 4, 8, 16, 24 and 32.|
| 001Eh|| 4 bytes|| the compression method being used. See the next table for a list of possible values.|
| 0022h|| 4 bytes|| the image size. This is the size of the raw bitmap data (see below), and should not be confused with the file size.|
| 0026h|| 4 bytes|| the horizontal resolution of the image. (pixel per meter, signed integer)|
| 002Ah|| 4 bytes|| the vertical resolution of the image. (pixel per meter, signed integer)|
| 002Eh|| 4 bytes|| the number of colors in the color palette, or 0 to default to 2n.|
| 0032h|| 4 bytes|| the number of important colors used, or 0 when every color is important; generally ignored.|
This part of the format is a sequence of 2 or 4 byte, signed and unsigned integers so we add them in the same way we did before. Create a new element called InfoHeader to hold these elements.
The next element in BMP format is the Colour Palette. This is a variable length array of bytes where each byte can take on a full range of values, 0 to 255. Since we are defining this format for the purposes of fuzzing and it doesn't seem like it is going to be worthwhile fuzzing that data, we will just define the Colour Palette part as opaque binary data. In fact exactly the same thing is true for the final part of the BMP format, the Bitmap Data, so we define that as opaque binary data as well.
And that's it! That is how to define a data format using XML Schema!
It is important to note that we have just defined the data format here and not added the required information to the XML schema to ensure that Fuzzware can faithfully reproduce the data format byte for byte. These are two different steps in the process of using Fuzzware, and the first step is always to define the data format, from there we add Schema Attribute Commands (see Tutorial 2) to control how Fuzzware serialises the data format to file.
It is also important to never forget that you are defining the XML schema for the purpose of fuzzing, and that Fuzzware will never violate the schema definition. Hence the more loosely you define the format the greater range of fuzzing Fuzzware will be able to try, however the fuzzing will take longer. It is generally better to accurately define the elements of the format, but loosely define the types of those elements.
XML Schema offers a very rich language for describing a data format, however the vast majority of the time you only need to use a small subset of this language. The examples that come with Fuzzware provide a good starting point for understanding the different ways you can use XML Schema to define the data format, and if you are trying to define a similar format to one of the examples, it is probably worthwhile understanding the approach the example uses.
Here are some other common parts of XML schema you may end up using in your defining your data format:
- xs:choice - sometimes only 1 child element from a range of child elements can occur
- xs:minOccurs and xs:maxOccurs - a child element may be repeated a number of times, either a specific amount or within some range
- xs:enumeration - these have the form
<xs:element name="elementname"> Where 'base' can be any simple type but all the enumeration values must be of that type.
<xs:enumeration value="1" />
<xs:enumeration value="2" />
<xs:enumeration value="3" />
Don't forget that you don't have to define your schema as one large blob like we have done here, you can separate out each element and define it above or below the root node (but still under xs:schema) and then use <xs:element ref="elementname" /> to reference it from the root node.
- xs:restriction - The above enumeration is only one type of restriction, there are others such as xs:length, or xs:maxLength or xs:minLength. There is even xs:pattern that lets you define a regular expression.
- xs:type - allows you to define your own types, that can be re-used throughout your schema (or in other schemas)
- xs:import - declare at the beginning of the schema to import elements, attributes or types of a schema located in a different file.