将Xml文件从文本格式转换为二进制格式能够划分为六个步骤

时间 2020-07-19 标签将 xml 文件从文本格式转换二进制格式能够划分六个步骤

Step 1. 收集有资源ID的属性的名称字符串
　　这一步除了收集那些具备资源ID的Xml元素属性的名称字符串以外，还会将对应的资源ID收集起来放在一个数组中。这里收集到的属性名称字符串保存在一个字符串资源池中，它们与收集到的资源ID数组是一一对应的。
　　对于main.xml文件来讲，具备资源ID的Xml元素属性的名称字符串有“orientation”、“layout_width”、“layout_height”、“gravity”、“id”和“text”，假设它们对应的资源ID分别为0x010100c四、0x010100f四、0x010100f五、0x010100af、0x010100d0和0x0101014f，那么最终获得的字符串资源池的前6个位置和资源ID数组的对应关系如图11所示：
　　

　　图11 属性名称字符串与属性资源ID的对应关系
　　Step 2. 收集其它字符串
　　这一步收集的是Xml文件中的其它全部字符串。因为在前面的Step 1中，那些具备资源ID的Xml元素属性的名称字符串已经被收集过了，所以，它们在一步中不会被重复收集。对于main.xml文件来讲，这一步收集到的字符串如图12所示：
　　

　　图12 其它字符串
　　其中，“android”是android命名空间前缀，“http://schemas.android.com/apk/res/android”是android命名空间uri，“LinearLayout”是LinearLayout元素的标签，“Button”是Button元素的标签。
　　Step 3. 写入Xml文件头
　　最终编译出来的Xml二进制文件是一系列的chunk组成的，每个chunk都有一个头部，用来描述chunk的元信息。同时，整个Xml二进制文件又能够当作一块总的chunk，它有一个类型为ResXMLTree_header的头部。
　　ResXMLTree_header定义在文件frameworks/base/include/utils/ResourceTypes.h中，以下所示：

　　[cpp] view plaincopy/**
　　* Header that appears at the front of every data chunk in a resource.
　　*/
　　struct ResChunk_header
　　{
　　// Type identifier for this chunk. The meaning of this value depends
　　// on the containing chunk.
　　uint16_t type;
　　// Size of the chunk header (in bytes). Adding this value to
　　// the address of the chunk allows you to find its associated data
　　// (if any).
　　uint16_t headerSize;
　　// Total size of this chunk (in bytes). This is the chunkSize plus
　　// the size of any data associated with the chunk. Adding this value
　　// to the chunk allows you to completely skip its contents (including
　　// any child chunks). If this value is the same as chunkSize, there is
　　// no data associated with the chunk.
　　uint32_t size;
　　};
　　/**
　　* XML tree header. This appears at the front of an XML tree,
　　* describing its content. It is followed by a flat array of
　　* ResXMLTree_node structures; the hierarchy of the XML document
　　* is described by the occurrance of RES_XML_START_ELEMENT_TYPE
　　* and corresponding RES_XML_END_ELEMENT_TYPE nodes in the array.
　　*/
　　struct ResXMLTree_header
　　{
　　struct ResChunk_header header;
　　};

复制代码

　　ResXMLTree_header内嵌有一个类型为ResChunk_header的头部。事实上，每一种头部类型都会内嵌有一个类型为ResChunk_header的基础头部，而且这个ResChunk_header都是做为第一个成员变量出现的。这样在解析二进制Xml文件的时候，只须要读出前面大小为sizeof(ResChunk_header)的数据块，而且经过识别其中的type值，就能够知道实际正在处理的chunk的具体类型。
　　对于ResXMLTree_header头部来讲，内嵌在它里面的ResChunk_header的成员变量的值以下所示：
　　--type：等于RES_XML_TYPE，描述这是一个Xml文件头部。
　　--headerSize：等于sizeof(ResXMLTree_header)，表示头部的大小。
　　--size：等于整个二进制Xml文件的大小，包括头部headerSize的大小。
　　Step 4. 写入字符串资源池
　　原来定义在Xml文件中的字符串已经在Step 1和Step 2中收集完毕，所以，这里就能够将它们写入到最终收集到二进制格式的Xml文件中去。注意，写入的字符串是严格按照它们在字符串资源池中的顺序写入的。例如，对于main.xml来讲，依次写入的字符串为“orientation”、“layout_width”、“layout_height”、“gravity”、“id”、"text"、"android"、“http://schemas.android.com/apk/res/android”、“LinearLayout”和“Button”。之因此要严格按照这个顺序来写入，是由于接下来要将前面Step 1收集到的资源ID数组也写入到二进制格式的Xml文件中去，而且要保持这个资源ID数组与字符串资源池前六个字符串的对应关系。
　　写入的字符串池chunk一样也是具备一个头部的，这个头部的类型为ResStringPool_header，它定义在文件frameworks/base/include/utils/ResourceTypes.h中，以下所示：

　　[cpp] view plaincopy/**
　　* Definition for a pool of strings. The data of this chunk is an
　　* array of uint32_t providing indices into the pool, relative to
　　* stringsStart. At stringsStart are all of the UTF-16 strings
　　* concatenated together; each starts with a uint16_t of the string's
　　* length and each ends with a 0x0000 terminator. If a string is >
　　* 32767 characters, the high bit of the length is set meaning to take
　　* those 15 bits as a high word and it will be followed by another
　　* uint16_t containing the low word.
　　*
　　* If styleCount is not zero, then immediately following the array of
　　* uint32_t indices into the string table is another array of indices
　　* into a style table starting at stylesStart. Each entry in the
　　* style table is an array of ResStringPool_span structures.
　　*/
　　struct ResStringPool_header
　　{
　　struct ResChunk_header header;
　　// Number of strings in this pool (number of uint32_t indices that
follow
　　// in the data).
　　uint32_t stringCount;
　　// Number of style span arrays in the pool (number of uint32_t indices
　　// follow the string indices).
　　uint32_t styleCount;
　　// Flags.
　　enum {
　　// If set, the string index is sorted by the string values (based
　　// on strcmp16()).
　　SORTED_FLAG = 1<<0,
　　// String pool is encoded in UTF-8
　　UTF8_FLAG = 1<<8
　　};
　　uint32_t flags;
　　// Index from header of the string data.
　　uint32_t stringsStart;
　　// Index from header of the style data.
　　uint32_t stylesStart;
　　};

复制代码

　　内嵌在ResStringPool_header里面的ResChunk_header的成员变量的值以下所示：
　　--type：等于RES_STRING_POOL_TYPE，描述这是一个字符串资源池。
　　--headerSize：等于sizeof(ResStringPool_header)，表示头部的大小。
　　--size：整个字符串chunk的大小，包括头部headerSize的大小。
　　ResStringPool_header的其他成员变量的值以下所示：
　　--stringCount：等于字符串的数量。
　　--styleCount：等于字符串的样式的数量。
　　--flags：等于0、SORTED_FLAG、UTF8_FLAG或者它们的组合值，用来描述字符串资源串的属性，例如，SORTED_FLAG位等于1表示字符串是通过排序的，而UTF8_FLAG位等于1表示字符串是使用UTF8编码的，不然就是UTF16编码的。
　　--stringsStart：等于字符串内容块相对于其头部的距离。
　　--stylesStart：等于字符串样式块相对于其头部的距离。
　　不管是UTF8，仍是UTF16的字符串编码，每个字符串的前面都有2个字节表示其长度，并且后面以一个NULL字符结束。对于UTF8编码的字符串来讲，NULL字符使用一个字节的0x00来表示，而对于UTF16编码的字符串来讲，NULL字符使用两个字节的0x0000来表示。
　　若是一个字符串的长度超过32767，那么就会使用更多的字节来表示。假设字符串的长度超过32767，那么前两个字节的最高位就会等于0，表示接下来的两个字节仍然是用来表示字符串长度的，而且前两个字表示高16位，然后两个字节表示低16位。
　　除了ResStringPool_header头部、字符串内容块和字符串样式内容块以外，还有两个偏移数组，分别是字符串偏移数组和字符串样式偏移数组，这两个偏移数组的大小就分别等于字符串的数量stringCount和styleCount的值，而每个元素都是一个无符号整数。整个字符中资源池的组成就如图13所示：
　　

　　图13 字符串资源池结构
　　注意，字符串偏移数组和字符串样式偏移数组的值分别是相对于stringStart和styleStart而言的。在解析二进制Xml文件的时候，经过这两个偏移数组以及stringsStart和stylesStart的值就能够迅速地定位到第i个字符串。
　　接下来，咱们就重点说说什么是字符串样式。假设有一个字符串资源池，它有五个字符串，分别是"apple"、“banana”、“orange”、“mango”和“pear”。注意到第四个字符串“mango”，它实际表示的是一个字符串“mango”，不过它的前三个字符“man”经过b标签来描述为粗体的，然后两个字符经过i标签来描述为斜体的。这样实际上在整个字符串资源池中，包含了七个字符串，分别是"apple"、“banana”、“orange”、“mango”、“pear”、“b”和“i”，其中，第四个字符串“mango”来有两个sytle，第一个style表示第1到第3个字符是粗体的，第二个style表示第4到第5个字符是斜体的。
　　字符串与其样式描述是一一对应的，也变是说，若是第i个字符串是带有样式描述的，那么它的样式描述就位于样式内容块第i个位置上。以上面的字符串资源池为例，因为第4个字符中带有样式描述，为了保持字符串与样式描述的一一对应关系，那么也须要假设前面3个字符串也带有样式描述的，不过须要将这3个字符串的样式描述的个数设置为0。也就是说，在这种状况下，字符串的个数等于7，而样式描述的个数等于4，其中，第1到第3个字符串的样式描述的个数等于0，而第4个字符串的样式描述的个数等于2。
　　假设一个字符串有N个样式描述，那么它在样式内容块中，就对应有N个ResStringPool_span，以及一个ResStringPool_ref，其中，N个ResStringPool_span位于前面，用来描述每个样式，而ResStringPool_ref表示一个结束占位符。例如，对于上述的“mango”字符串来讲，它就对应有2个ResStringPool_span，以及1个ResStringPool_ref，而对于"apple"、“banana”和“orange”这三个字符串来讲，它们对应有0个ResStringPool_span，可是对应有1个ResStringPool_ref，最后三个字符串“pear”、“b”和"i"对应有0个ResStringPool_span和0个ResStringPool_ref。
　　ResStringPool_span和ResStringPool_ref定义在文件frameworks/base/include/utils/ResourceTypes.h中，以下所示：

　　[cpp] view plaincopy/**
　　* Reference to a string in a string pool.
　　*/
　　struct ResStringPool_ref
　　{
　　// Index into the string pool table (uint32_t-offset from the indices
　　// immediately after ResStringPool_header) at which to find the
location
　　// of the string data in the pool.
　　uint32_t index;
　　};
　　/**
　　* This structure defines a span of style information associated with
　　* a string in the pool.
　　*/
　　struct ResStringPool_span
　　{
　　enum {
　　END = 0xFFFFFFFF
　　};
　　// This is the name of the span -- that is, the name of the XML
　　// tag that defined it. The special value END (0xFFFFFFFF) indicates
　　// the end of an array of spans.
　　ResStringPool_ref name;
　　// The range of characters in the string that this span applies to.
　　uint32_t firstChar, lastChar;
　　};

复制代码

　　因为ResStringPool_ref在这里出现的做用就是充当样式描述结束占位符，所以，它惟一的成员变量index的取值就固定为ResStringPool_span::END。
　　再来看ResStringPool_span是如何表示一个样式描述的。以字符串“mango”的第一个样式描述为例，对应的ResStringPool_span的各个成员变量的取值为：
　　--name：等于字符串“b”在字符串资源池中的位置。
　　--firstChar：等于0，即指向字符“m”。
　　--lastChar：等于2，即指向字符"n"。
　　综合起来就是表示字符串“man”是粗体的。
　　再以字符串“mango”的第二个样式描述为例，对应的ResStringPool_span的各个成员变量的取值为：
　　--name：等于字符串“i”在字符串资源池中的位置。
　　--firstChar：等于3，即指向字符“g”。
　　--lastChar：等于4，即指向字符“o”。
　　综合起来就是表示字符串“go”是斜体的。
　　另外有一个地方须要注意的是，字符串样式内容的最后会有8个字节，每4个字节都被填充为ResStringPool_span::END，用来表达字符串样式内容结束符。这个结束符能够在解析过程当中用做错误验证。
　　Step 5. 写入资源ID
　　在前面的Step 1中，咱们把属性的资源ID都收集起来了。这些收集起来的资源ID会做为一个单独的chunk写入到最终的二进制Xml文件中去。这个chunk位于字符串资源池的后面，它的头部使用ResChunk_header来描述。这个ResChunk_header的各个成员变量的取值以下所示：
　　--type：等于RES_XML_RESOURCE_MAP_TYPE，表示这是一个从字符串资源池到资源ID的映射头部。
　　--headerSize：等于sizeof(ResChunk_header)，表示头部大小。
　　--size：等于headerSize的大小再加上sizeof(uint32_t) * count，其中，count为收集到的资源ID的个数。
　　以main.xml为例，字符串资源池的第一个字符串为“orientation”，而在资源ID这个chunk中记录的第一个数据为0x010100c4，那么就表示属性名称字符串“orientation”对应的资源ID为0x010100c4。
　　Step 6. 压平Xml文件
　　压平Xml文件其实就是指将里面的各个Xml元素中的字符串都替换掉。这些字符串要么是被替换成到字符串资源池的一个索引，要么是替换成一个具备类型的其它值。咱们以main.xml为例来讲这个压平的过程。