Courtesy of TJ at devnet
function formatXmlString($xml) { // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries) $xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml); // now indent the tags $token = strtok($xml, "\n"); $result = ''; // holds formatted version as it is built $pad = 0; // initial indent $matches = array(); // returns from preg_matches() // scan each line and adjust indent based on opening/closing tags while ($token !== false) : // test for the various tag states // 1. open and closing tags on same line - no change if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) : $indent=0; // 2. closing tag - outdent now elseif (preg_match('/^<\/\w/', $token, $matches)) : $pad--; // 3. opening tag - don't pad this one, only subsequent tags elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) : $indent=1; // 4. no indentation needed else : $indent = 0; endif; // pad the line with the required number of leading spaces $line = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT); $result .= $line . "\n"; // add to the cumulative result, with linefeed $token = strtok("\n"); // get the next token $pad += $indent; // update the pad size for subsequent lines endwhile; return $result; }
Hi,
I just wanted to say that that is a fantastic bit of code. Great stuff
Also, would there be a way to exclude certain tags such as and tags from creating a line break and formatting.
Thanks for the feedback matt – unfortunately there’s currently no way to selectively exclude tags from being formatted and indented… out of interest, what kind of tags did you have in mind? I could possibly put exceptions in if there was a need for them.
Thanks! Exactlyl I was looking for.
Simple and powerful. Tom
Very nice code here!
Nice stuff! Would be great to have tabs instead of spaces. Something like “formatXmlString($xml, $indent = “\t”)”.
Cheers
Enrico
Doesn’t work for empty tags like – you need to remove the newline between them afterwards:
// add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
$xml = preg_replace(’/(>)(<)(\/*)/’, “$1\n$2$3”, $xml);
// remove lines between an empty tag
$xml = preg_replace(”|/ ] )(\s([^>] ))?>\n</\1|”, “”, $xml);
Seems my comment got cut off:
// remove lines between an empty tag
$xml = preg_replace(”|/ ] )(\s([^>] ))?>\n</\1|”, “”, $xml);
Custom padding
1. Modify function declaration this way: function formatXmlString($xml, $padstr = “ “);
2. change $line declaration this way:
$line = str_repeat($padstr, $pad).$token;
You can also use the linux xmllint command as so…
public static function formatXmlString($xml) {
$xml = escapeshellarg($xml);
$result = array();
exec(‘echo ‘.$xml.’ | xmllint —format -’,$result);
$result = join(”\n”,$result);
return $result;
}
very nice, thanks!
good one, some little problems for me :
- i added \s* to get reed of existing whitespaces :
// add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
$xml = preg_replace(’/(>)\s*(<)(\/*)/’, “$1\n$2$3”, $xml);
- i added $indent=0; too because the indent fail sometime for me ! :
// 2. closing tag – outdent now
elseif (preg_match(’/^<\/\w/’, $token, $matches)) :
$pad—;$indent=0;
thanks for this, made me win a lot of time.
Thanks! Just added as a handy php macro for use in EditPlus text editor.
Really great function you wrote here.
Hi!
Your script is very usefull, but I have a question : is there any way to check if the string is already indented and not to re-indent it? Indeed, I have a xml file tha is often modified and reindented on every change… Any solution? ^^’
Thanks!
Excellent. Just what I needed. Why doesn’t PHP do this naturally?!
Thanks.
Rich
Awesome job. A little modification to the function you can have customizable padding:
function formatXmlString($xml, $indentBase = 0, $indentString = ‘ ‘)
{
…
$pad = ($indentBase > 0) ? $indentBase : 0; // initial indent
…
$line = str_pad($token, strlen($token) $pad, $indentString, STR_PAD_LEFT);
}
$indentBase is how may extra indents the xml should get, which is wonderful for combining multiple xml snippets
$indentString is the string to use for indentation, for people (like myself) who enjoy using tabs instead of spaces
muy bueno saludos desde Argentina
Thanks for this function. It saved my lazy ass from having to write it myself. It works perfect!
Thanks! I use it in my work!
Works like shit. You got the idea right, but it’s no good sorry.
This code has been very useful. To make one correction, change option 3 so that it captures singleton tags like as follows (note combining the < and \/ into the 'not' selector):
// 3. opening tag …
elseif( preg_match( '/^\/]*>.*$/’, $token, $matches ) ) : $indent=1;
AND, to remove existing indentation on imported text first (and thus getting a totally clean result), do this FIRST (change $xml to your text variable):
$xml = preg_replace( ‘/(?:(?)|(?))(\s )(?=\<\/?)/', "", $xml );
whoops — my previous post should read:
… singleton tags like …
also in the preg_replace the parser has turned single apostrophe ‘ which wraps the pattern into ` so beware when copying and pasting! d:)
1st result on Google “php format xml” and “php indent xml” and it doesn’t work correctly :(
In case of step sequence [3,2] in the while cycle, $indent carries over and rest of the result is not ‘outdented’. Variable $indent needs to be set to zero at the beginning of the cycle (or in all steps except for 3rd).