Although logically it would seem to make sense to understand input before output, the conversion abilities of Fuzzware are intricately linked to how it converts XML during output to a different format, hence if you haven't gone through Tutorial 2, this Tutorial will not make much sense.
Tutorial 3 is all about how we can use one of the most powerful features of Fuzzware, automatically converting a data format to its XML representation. This is a slightly complicated process, but if you understand how Fuzzware converts XML to a data format, and with some extra explanation, then it's not too bad. This process also has some limitations, but to understand these you need to understand how the process works. Remember though, you can always do this process manually and an introduction on how to do this is given on the Create an XML Schema page.
Fuzzware uses the XSD file with it's description of the data format to read in a data format and assign the bytes it reads in to XML nodes. Fuzzware needs to be told the root node of the XML since it needs a starting point from which to try to match the data to the data format described in the XSD file. Given the root node Fuzzware will begin trying to match the XSD elements to the data, starting at the beginning of the file. The matching process basically involves looking at the current XSD element, and sometimes the element immediately after it, and trying to determine from the type of the current element, how many bytes of data to read.
This gives rise to the major requirement that the XSD must fulfil, Fuzzware must be able to determine the size of every element in the XSD file. Fuzzware attempts to be smart about this requirement and the ways to satisfy it are listed below (and in the order tried)
1. Does the element have a fixed value? If so, generate that value and do a byte comparison to the current location in the data.
2. Does the element have an enumeration value? If so, generate all enumeration values and do a byte comparison to the current location in the data.
3. Does the element have a length specified by another element we have read in already? If so, read in the appropriate number of bytes.
4. Does the element have a fixed length (via a restriction)? If so, read in the appropriate number of bytes.
5. Is the element a number type that does not have an 'ouptputAs' attribute? If so, read in characters and try to convert it to a number until the conversion fails.
6. Is the current element followed by an element that has a fixed value or has an enumerated list of values? If so, search for the possible values and assign to this element the in-between bytes.
7. Does this element have a regular expression defining it? If so, add bytes until we get a match, then keep on adding more bytes until we don’t get a match (we try to match the longest expression possible).
8. Does the parent of this element have a bounded length? If so, read in the remaining bytes of the length into this element.
9. If the current element is a string and the next element value is a number, read in ASCII up until a non-ASCII character is found.
Lets look at some examples of these matching rules in action. Consider the following XSD file
<?xml version="1.0" encoding="utf-8"?>
<xs:element name="RootNode" sac:markup="removeIncludingChildNodes">
<xs:element name="FixedString" type="xs:string" fixed="Hello" />
<xs:enumeration value="World" />
<xs:enumeration value="Universe" />
Let our input data file have the following contents:
Create a new project (Tutorial3 in the Tutorials directory) and select 'Fuzz a non-XML file' as input. Add the schema (Tutorial3.xsd), specify RootNode as the root node and choose Tutorial3.bin as the input file. Click Test Conversion and run the conversion process (click Initialise and then Start). You should see the following output:
Here Fuzzware has successfully converted the raw file using rules 1 & 2 from the list.
Lets add some more elements to the sequence under RootNode
<xs:length value="3" />
<xs:element name="StringNumber" type="xs:unsignedInt" />
<xs:element name="NullTerminatedString" type="xs:string" />
<xs:element name="NullTerminator" type="xs:hexBinary" fixed="00" sac:outputAs="Decoded" />
Do the conversion process again with input
HelloUniverseAbC180This is a string
(Note there needs to be a 0x00 at the end of this string)
You should see this output:
FixedLengthString was converted by rule 4, StringNumber by rule 5, NullTerminatedString by rule 6 and NullTerminator by rule 1 (note that sac:outputAs="Decoded" is required as Fuzzware looks for the value in the data that corresponds to how it would output the value, which is controlled by the outputAs attribute).
So how does Fuzzware handle nodes that need to be updated. Well we can't use XML Processing Instructions (PI) as they apply to XML but not XSD, in fact we want Fuzzware to add PI to certain nodes so they are updated. Since all we can change is the XSD file, Fuzzware relies on a naming convention to identify nodes that need to be updated. The naming convention uses the functions in Schemas\XmlProcInstCommands (listed here), and combines it with the name of the source node, e.g. ByteLengthOfNodeName. Note, we add the word 'Of' in-between the function and source node. Let's look at an example:
<xs:element name="ByteLengthOfArbitraryString" type="xs:unsignedInt" />
<xs:element name="ArbitraryString" type="xs:string" />
If we add this to our existing schema and run the test conversion on this input
HelloUniverseAbC180This is a string 58Some arbitrary string with its length given somewhere else
(Note that needs to be a 0x00 before the 58, not a space)
This causes the following output and is an example of rule 3 at work. (Note the XML PI that have been automatically added to the XML)
The rules 7, 8 and 9 I'll leave as an exercise for the reader (they will be used far less often).
Ok, now that we have covered how the conversion process works, lets take a look at some of the limitations of this process
○ The XML PI give us ultimate flexibility, however using the naming convention does not, if Fuzzware comes across a node that obeys the naming convention, like ByteLengthOfArbitraryString, it will pair this with the next node called ArbitraryString, regardless of whether or not that’s the correct ArbitraryString node to get the length of. Compare this to adding XML PI directly to XML, we can link any 2 nodes, no matter where they are in the XML.
○ The XML PI function target nodes (the ones that get updated) cannot come after the source node (the input to the function), but they can occur in the source node (like referencing the entire file length). If the data format stores data blob lengths at the end of its format in a table, Fuzzware will not be able to convert this.
The other aspect not covered is how Fuzzware handles minOccurs and maxOccurs, and deeply nested sequences and choices. Basically Fuzzware respects minOccurs and maxOccurs, and can handle arbitrary nesting of sequences and choices. Note also in the Conversion window there is a slider, this affects the speed of the conversion process. It is sometimes useful if the conversion process is failing to slow the process down to see which node the process is failing to convert.
The ultimate test for the conversion process is not really whether or not it can convert an input file to XML, but whether or not the output file created is identical to the input file. It is therefore usually a good idea on the 'Configure and Run fuzzer' page to specify the original input file, run Fuzzware in Test Mode and confirm the output is identical to the input (bear in mind that byte equivalence is not always necessary depending on the format and its tolerance of things like whitespace characters for instance).
When the fuzzer is started the conversion process will output the result to a file (given the input file name but with an '.xml' extension). As long as 'Fuzz a non-XML file' is chosen as the input source then the conversion process occurs every time the fuzzer runs, but once the conversion process has been successful the input source can be change to 'Fuzz an XML file' where the input XML can be set to the result of the conversion.
1. Change Tutorial3.xsd so the ByteLengthOf node refers to the whole file rather than just the next node. Can you figure out which rule applies since the conversion still works, even though there is no obvious length given for the ArbitraryString node?
2. Change the StringNumber node in Tutorial3.xsd and to sac:outputAs="BinaryLittleEndian" and update Tutorial.bin so the conversion still succeeds. (Hint: Make sure you get the number of bytes right).
3. Add other integer types e.g. shorts, unsignedBytes etc, to both Tutorial3.xsd and Tutorial3.bin, and experiment with sac:outputAs="BinaryLittleEndian" and sac:outputAs="BinaryBigEndian".
4.Open the PDF example and try a test conversion to see a complicated example in practice. Note the time difference in the conversion process between the test conversion and the conversion when the fuzzer is executed.