Lua Xml

lua-users home
wiki

The following is some sample code for handling XML. It is divided into four sections:

Credit to the authors is mentioned where appropriate.

Toolkits

LazyKit is a collection of XML processing tools. Its primary purpose is to provoke discussion of XML tools in Lua.

PenlightLibraries provides an XML module [See Docs] which uses the LOM defined by LuaExpat? and provides pretty-printing, template matching and Orbit-style 'htmlfication'. It will use LuaExpat? if available, otherwise falls back on a pure Lua parser based on Roberto's (see below).

Lua-only XML parsers

Lua XML Parser

LuaXML-0.0.0, From Paul Chakravarti [1] (original link broken; see: [2]) (Lua 5 version available at [3])

There is a bug with parsing DTD. Replace the line with

self:_parseDTD(string,pos)
to
self:_parseDTD(str,pos)

The module implements a non-validating XML stream parser with a handler based event api (conceptually similar to SAX) which can be used to post-process the event data as required (eg into a tree).

The current functionality is -

The limitations are -

The distribution also includes sample event handlers to convert the SAX event stream into a Lua table -

SLAXML

Another pure-Lua non-validating SAX-like streaming processor. It also includes an implementation of a simple DOM parser (parse to hierarchy of tables).

https://github.com/Phrogz/SLAXML

Features:

Classic Lua-only version

From: Roberto Ierusalimschy

I have this basic skeleton that parses the "main" part of an XML string (it does not handle meta-data like "<?" and "<!"...). -- Roberto

[!] VersionNotice: The below code pertains to an older Lua version, Lua 4. It does not run as is under Lua 5.
[!] This implementation apparently does not correctly parse colons (:), as used for XML namespaces. See below for a proposed fix.

function parseargs (s)

  local arg = {}

  gsub(s, "(%w+)=([\"'])(.-)%2", function (w, _, a)

    %arg[w] = a

  end)

  return arg

end



function collect (s)

  local stack = {n=0}

  local top = {n=0}

  tinsert(stack, top)

  local ni,c,label,args, empty

  local i, j = 1, 1

  while 1 do

    ni,j,c,label,args, empty = strfind(s, "<(%/?)([%w:]+)(.-)(%/?)>", i)

    if not ni then break end

    local text = strsub(s, i, ni-1)

    if not strfind(text, "^%s*$") then

      tinsert(top, text)

    end

    if empty == "/" then  -- empty element tag

      tinsert(top, {n=0, label=label, args=parseargs(args), empty=1})

    elseif c == "" then   -- start tag

      top = {n=0, label=label, args=parseargs(args)}

      tinsert(stack, top)   -- new level

    else  -- end tag

      local toclose = tremove(stack)  -- remove top

      top = stack[stack.n]

      if stack.n < 1 then

        error("nothing to close with "..label)

      end

      if toclose.label ~= label then

        error("trying to close "..toclose.label.." with "..label)

      end

      tinsert(top, toclose)

    end 

    i = j+1

  end

  local text = strsub(s, i)

  if not strfind(text, "^%s*$") then

    tinsert(stack[stack.n], text)

  end

  if stack.n > 1 then

    error("unclosed "..stack[stack.n].label)

  end

  return stack[1]

end





-- example



x = collect[[

     <methodCall kind="xuxu">

      <methodName>examples.getStateName</methodName>

      <params>

         <param>

            <value><i4>41</i4></value>

            </param>

         </params>

      </methodCall>

]]




updated for 5.1
[!] This implementation apparently does not correctly parse colons (:), as used for XML namespaces. See below for a proposed fix.
function parseargs(s)

  local arg = {}

  string.gsub(s, "([%-%w]+)=([\"'])(.-)%2", function (w, _, a)

    arg[w] = a

  end)

  return arg

end

    

function collect(s)

  local stack = {}

  local top = {}

  table.insert(stack, top)

  local ni,c,label,xarg, empty

  local i, j = 1, 1

  while true do

    ni,j,c,label,xarg, empty = string.find(s, "<(%/?)([%w:]+)(.-)(%/?)>", i)

    if not ni then break end

    local text = string.sub(s, i, ni-1)

    if not string.find(text, "^%s*$") then

      table.insert(top, text)

    end

    if empty == "/" then  -- empty element tag

      table.insert(top, {label=label, xarg=parseargs(xarg), empty=1})

    elseif c == "" then   -- start tag

      top = {label=label, xarg=parseargs(xarg)}

      table.insert(stack, top)   -- new level

    else  -- end tag

      local toclose = table.remove(stack)  -- remove top

      top = stack[#stack]

      if #stack < 1 then

        error("nothing to close with "..label)

      end

      if toclose.label ~= label then

        error("trying to close "..toclose.label.." with "..label)

      end

      table.insert(top, toclose)

    end

    i = j+1

  end

  local text = string.sub(s, i)

  if not string.find(text, "^%s*$") then

    table.insert(stack[#stack], text)

  end

  if #stack > 1 then

    error("unclosed "..stack[#stack].label)

  end

  return stack[1]

end


Colon parsing fix (proposal)
The code above uses (%w+) to capture tag names and parameter names. However, in XML namespaces are frequently used which introduces a colon (:) in the name of the tag/parameter. To make this implementation able to handle these colons, try replacing the "(%w+)" part in the function parseargs with "([%w:]+)".

Original version

From: Yutaka Ueno

That is my test program, which is now revised [4] (link broken) But it is only tested for a few XML files used in biology. Probably Roberto's code provides a better skelton than mine, but there is a difference in xml-tag descriptions with Lua tables.


XML :           <methodCall kind="xuxu">

Lua by Roberto: { label="methodCall", args={kind="xuxu"} }

Lua by Ueno:    { xml="methodCall", kind="xuxu" }

Because the property name "xml" never appears in XML. This method is a bit better for terribly deep XML tags proposed in biology.


C bindings

Kino

From: Eckhart Koeppen

Well, there is something I programmed which might help, the Kino XML processor. It has wrappers for Tcl and Lua via SWIG. A Xt and an experimental Gtk widget for displaying XML with CSS are also available. Take a look at it at: [5] (link broken)

It is under constant development but tries to stick to the DOM, so I hope that the interface changes remain small.

libxml

[luagnome] (link broken, use [6]) includes the wrapping of libxml-1.8.x, as it is considered as a part of Gnome. It allows to parse and to generate XML files, with a simple api (object oriented).

[lua-xmlreader] is an implementation of the XmlReader API using libxml2.

Expat

For Lua 5.0/5.1, use [LuaExpat], which is full-featured.

For Lua 4.0:

From: Jay Carlson

I've put a simple binding of expat, James Clark's C stream-based XML parser up at [7]. No, not everything is bound, but it should be obvious how to bind more stuff to it.

LuaXML

[LuaXML] is a lean yet complete module for the direct mapping between XML data and Lua tables.

XML-DOM parser

PugXML is a C++ small, fast, non-validating DOM XML parser, contained in a single header, having no dependencies other than the standard C libraries, and <iostream> (KERNEL32.DLL with WIN32). This XML parser segments a given string in situ (like strtok), performing scanning/tokenization, and parsing in a single pass.

Here is an example of the parser use in Lua:

-- create xml_parser object

parser = pug.xml_parser( pug.xml_parser.parse_default, true, 4);



-- parse string

xml_string = '<xml><child>some data </child><child2 attr="value"/></xml>';

print('parsing string: ' .. xml_string );

parser:parse(xml_string, pug.xml_parser.parse_noset);

print( tostring(parser:document()) );



-- Testing xml_node

-- getting root

root=parser:document();



-- add a element child

child=root:append_child( pug.xml_node_type.element );

print( tostring(root) );



-- rename child to child

child:name('child');

print('child name is ' .. child:name() );

print( tostring(root) );



-- adding attributes

child:append_attribute('attribute','value');

child:append_attribute('attribute2','value2');



-- adding on children

child2=child:append_child( pug.xml_node_type.element );

child2:name('child2');

print( tostring(root) );

A wrapper around this parser has been written with [LuaBind] and is available at [8] (link broken). The original article about PugXML is located at [9].

TinyXML

For Lua 5.0:

From: Robert Noll

Just a plain "Parse File to lua array" function in c++, using the [TinyXML] (2.4.3) lib.


// header



class lua_State;

	

/// register parser functions to lua

void	RegisterLuaXML (lua_State *L);





// sourcefile



#include "tinyxml.h"



extern "C" {

	#include "lua.h"

	#include "lauxlib.h"

	#include "lualib.h"

}



void LuaXML_ParseNode (lua_State *L,TiXmlNode* pNode) { PROFILE

	if (!pNode) return;

	// resize stack if neccessary

	luaL_checkstack(L, 5, "LuaXML_ParseNode : recursion too deep");

	

	TiXmlElement* pElem = pNode->ToElement();

	if (pElem) {

		// element name

		lua_pushstring(L,"name");

		lua_pushstring(L,pElem->Value());

		lua_settable(L,-3);

		

		// parse attributes

		TiXmlAttribute* pAttr = pElem->FirstAttribute();

		if (pAttr) {

			lua_pushstring(L,"attr");

			lua_newtable(L);

			for (;pAttr;pAttr = pAttr->Next()) {

				lua_pushstring(L,pAttr->Name());

				lua_pushstring(L,pAttr->Value());

				lua_settable(L,-3);

				

			}

			lua_settable(L,-3);

		}

	}

	

	// children

	TiXmlNode *pChild = pNode->FirstChild();

	if (pChild) {

		int iChildCount = 0;

		for(;pChild;pChild = pChild->NextSibling()) {

			switch (pChild->Type()) {

				case TiXmlNode::DOCUMENT: break;

				case TiXmlNode::ELEMENT: 

					// normal element, parse recursive

					lua_newtable(L);

					LuaXML_ParseNode(L,pChild);

					lua_rawseti(L,-2,++iChildCount);

				break;

				case TiXmlNode::COMMENT: break;

				case TiXmlNode::TEXT: 

					// plaintext, push raw

					lua_pushstring(L,pChild->Value());

					lua_rawseti(L,-2,++iChildCount);

				break;

				case TiXmlNode::DECLARATION: break;

				case TiXmlNode::UNKNOWN: break;

			};

		}

		lua_pushstring(L,"n");

		lua_pushnumber(L,iChildCount);

		lua_settable(L,-3);

	}

}



static int LuaXML_ParseFile (lua_State *L) { PROFILE

	const char* sFileName = luaL_checkstring(L,1);

	TiXmlDocument doc(sFileName);

	doc.LoadFile();

	lua_newtable(L);

	LuaXML_ParseNode(L,&doc);

	return 1;

}



void	RegisterLuaXML (lua_State *L) {

	lua_register(L,"LuaXML_ParseFile",LuaXML_ParseFile);

}

pugilua

a binding to the neat [pugixml], supporting DOM and XPath 1.0. The binding is made using [LuaBridge]

https://github.com/d-led/pugilua

Example:

require 'pugilua'





---- reading ----

local doc=pugi.xml_document()

local res=doc:load_file [[..\..\scripts\pugilua\pugilua.vcxproj]]



print(res.description)



local node1=doc:root():child('Project')

local query1=doc:root():select_nodes('Project/PropertyGroup')



local n=query1.size

for i=0,n-1 do

	local node=query1:get(i):node()

	local attribute=query1:get(i):attribute()

	print(node.valid,node.path)

	local a=node:first_attribute()

	while a.valid do

		print(a.name)

		a=a:next_attribute()

	end

end



---- creating ----

doc:reset()

--- from the tutorial

-- add node with some name

local node = doc:root():append_child("node")



-- add description node with text child

local descr = node:append_child("description")

descr:append(pugi.node_pcdata):set_value("Simple node")



-- add param node before the description

local param = node:insert_child_before("param", descr)



-- add attributes to param node

param:append_attribute("name"):set_value("version")

param:append_attribute("value"):set_value(1.1)

param:insert_attribute_after("type", param:attribute("name")):set_value("float")



doc:save_file("tutorial.xml")

xerceslua

As a supplement to pugilua there's an effort to provide a minimal binding to [xerces.apache.org/xerces-c/ Xerces-C++] to be able to validate xml documents:

https://github.com/d-led/xerceslua

assert(require 'xerceslua')

Example:

local parser=xerces.XercesDOMParser()

parser:loadGrammar("Employee.dtd",xerces.GrammarType.DTDGrammarType)

parser:setValidationScheme(xerces.ValSchemes.Val_Auto)

local log=parser:parse("Employeexy.xml")

print(log.Ok)

if not log.Ok then

    print(log.Count)

    for i=0,log.Count-1 do

        local err=log:GetLogEntry(i)

        print(err.SystemId..', l:'..err.LineNumber..', c:'..err.ColumnNumber..', e:'..err.Message,err.LogType)

    end

end


XML-based protocols

XML-RPC

For Lua 5.0/5.1, use [LuaXMLRPC] library developed by [The Kepler Project].

For Lua 4.0:

From: Jay Carlson

I've put an initial release of client/server bindings for Lua for XML-RPC at [10]. It contains my lxp expat binding, and uses LuaSocket for client transport.

For more information on XML-RPC, see [11].

Although the packaging and documentation are scant, this package successfully passes the validation tests at [12].

SOAP

[LuaSOAP] is a Lua library to ease the use of SOAP.

Lua only XmlParser?

For Lua 5.1:

From: Alexander Makeev

This XmlParser? allows build object like C# XmlDocument? with XmlNodes?. See example for details.

-----------------------------------------------------------------------------------------

-- LUA only XmlParser from Alexander Makeev

-----------------------------------------------------------------------------------------

XmlParser = {};



function XmlParser:ToXmlString(value)

	value = string.gsub (value, "&", "&amp;");		-- '&' -> "&amp;"

	value = string.gsub (value, "<", "&lt;");		-- '<' -> "&lt;"

	value = string.gsub (value, ">", "&gt;");		-- '>' -> "&gt;"

	--value = string.gsub (value, "'", "&apos;");	-- '\'' -> "&apos;"

	value = string.gsub (value, "\"", "&quot;");	-- '"' -> "&quot;"

	-- replace non printable char -> "&#xD;"

   	value = string.gsub(value, "([^%w%&%;%p%\t% ])",

       	function (c) 

       		return string.format("&#x%X;", string.byte(c)) 

       		--return string.format("&#x%02X;", string.byte(c)) 

       		--return string.format("&#%02d;", string.byte(c)) 

       	end);

	return value;

end



function XmlParser:FromXmlString(value)

  	value = string.gsub(value, "&#x([%x]+)%;",

      	function(h) 

      		return string.char(tonumber(h,16)) 

      	end);

  	value = string.gsub(value, "&#([0-9]+)%;",

      	function(h) 

      		return string.char(tonumber(h,10)) 

      	end);

	value = string.gsub (value, "&quot;", "\"");

	value = string.gsub (value, "&apos;", "'");

	value = string.gsub (value, "&gt;", ">");

	value = string.gsub (value, "&lt;", "<");

	value = string.gsub (value, "&amp;", "&");

	return value;

end

   

function XmlParser:ParseArgs(s)

  local arg = {}

  string.gsub(s, "(%w+)=([\"'])(.-)%2", function (w, _, a)

    	arg[w] = self:FromXmlString(a);

  	end)

  return arg

end



function XmlParser:ParseXmlText(xmlText)

  local stack = {}

  local top = {Name=nil,Value=nil,Attributes={},ChildNodes={}}

  table.insert(stack, top)

  local ni,c,label,xarg, empty

  local i, j = 1, 1

  while true do

    ni,j,c,label,xarg, empty = string.find(xmlText, "<(%/?)([%w:]+)(.-)(%/?)>", i)

    if not ni then break end

    local text = string.sub(xmlText, i, ni-1);

    if not string.find(text, "^%s*$") then

      top.Value=(top.Value or "")..self:FromXmlString(text);

    end

    if empty == "/" then  -- empty element tag

      table.insert(top.ChildNodes, {Name=label,Value=nil,Attributes=self:ParseArgs(xarg),ChildNodes={}})

    elseif c == "" then   -- start tag

      top = {Name=label, Value=nil, Attributes=self:ParseArgs(xarg), ChildNodes={}}

      table.insert(stack, top)   -- new level

      --log("openTag ="..top.Name);

    else  -- end tag

      local toclose = table.remove(stack)  -- remove top

      --log("closeTag="..toclose.Name);

      top = stack[#stack]

      if #stack < 1 then

        error("XmlParser: nothing to close with "..label)

      end

      if toclose.Name ~= label then

        error("XmlParser: trying to close "..toclose.Name.." with "..label)

      end

      table.insert(top.ChildNodes, toclose)

    end

    i = j+1

  end

  local text = string.sub(xmlText, i);

  if not string.find(text, "^%s*$") then

      stack[#stack].Value=(stack[#stack].Value or "")..self:FromXmlString(text);

  end

  if #stack > 1 then

    error("XmlParser: unclosed "..stack[stack.n].Name)

  end

  return stack[1].ChildNodes[1];

end



function XmlParser:ParseXmlFile(xmlFileName)

	local hFile,err = io.open(xmlFileName,"r");

	if (not err) then

		local xmlText=hFile:read("*a"); -- read file content

		io.close(hFile);

        return self:ParseXmlText(xmlText),nil;

	else

		return nil,err;

	end

end

------------------------------------------------------------------------------------------

example:

function dump(_class, no_func, depth)

	if(not _class) then 

		log("nil");

		return;

	end

	

	if(depth==nil) then depth=0; end

	local str="";

	for n=0,depth,1 do

		str=str.."\t";

	end

    

	log(str.."["..type(_class).."]");

	log(str.."{");

    

	for i,field in pairs(_class) do

		if(type(field)=="table") then

			log(str.."\t"..tostring(i).." =");

			dump(field, no_func, depth+1);

		else 

			if(type(field)=="number") then

				log(str.."\t"..tostring(i).."="..field);

			elseif(type(field) == "string") then

				log(str.."\t"..tostring(i).."=".."\""..field.."\"");

			elseif(type(field) == "boolean") then

				log(str.."\t"..tostring(i).."=".."\""..tostring(field).."\"");

			else

				if(not no_func)then

					if(type(field)=="function")then

						log(str.."\t"..tostring(i).."()");

					else

						log(str.."\t"..tostring(i).."<userdata=["..type(field).."]>");

					end

				end

			end

		end

	end

	log(str.."}");

end



--local obj,err = XmlParser:ParseXmlFile("test.xml");

--if(not err) then

--	dump(obj);

--else

--	log("ERROR: "..err);		

--end



local xmlTree=XmlParser:ParseXmlText([[<?xml version="1.0" encoding="utf-8"?>

<Config>

	<EntityList>

		<Entity value="1&quot;2&quot;3">innerText</Entity>	

		<Entity value="456"/>

	</EntityList>

</Config>

]])

for i,xmlNode in pairs(xmlTree.ChildNodes) do

	if(xmlNode.Name=="EntityList") then

		for i,subXmlNode in pairs(xmlNode.ChildNodes) do

			if(subXmlNode.Name=="Entity") then

				log("Entity value=\""..subXmlNode.Attributes.value.."\"");

				if(subXmlNode.Value) then

					log("   Content=\""..subXmlNode.Value.."\"");

				end

			end

		end

	end

end

dump(xmlTree)

result:


<log>Entity value="1"2"3"

<log>   Content="innerText"

<log>Entity value="456"

<log>	[table]

<log>	{

<log>		Attributes =

<log>		[table]

<log>		{

<log>		}

<log>		Name="Config"

<log>		ChildNodes =

<log>		[table]

<log>		{

<log>			1 =

<log>			[table]

<log>			{

<log>				Attributes =

<log>				[table]

<log>				{

<log>				}

<log>				Name="EntityList"

<log>				ChildNodes =

<log>				[table]

<log>				{

<log>					1 =

<log>					[table]

<log>					{

<log>						Value="innerText"

<log>						Attributes =

<log>						[table]

<log>						{

<log>							value="1"2"3"

<log>						}

<log>						Name="Entity"

<log>						ChildNodes =

<log>						[table]

<log>						{

<log>						}

<log>					}

<log>					2 =

<log>					[table]

<log>					{

<log>						Attributes =

<log>						[table]

<log>						{

<log>							value="456"

<log>						}

<log>						Name="Entity"

<log>						ChildNodes =

<log>						[table]

<log>						{

<log>						}

<log>					}

<log>				}

<log>			}

<log>		}

<log>	}


RecentChanges · preferences
edit · history
Last edited February 17, 2014 12:27 pm GMT (diff)