Format XML with PHP

Courtesy of TJ at devnet

function formatXmlString($xml) {  
  
  // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
  $xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);
  
  // now indent the tags
  $token      = strtok($xml, "\n");
  $result     = ''; // holds formatted version as it is built
  $pad        = 0; // initial indent
  $matches    = array(); // returns from preg_matches()
  
  // scan each line and adjust indent based on opening/closing tags
  while ($token !== false) : 
  
    // test for the various tag states
    
    // 1. open and closing tags on same line - no change
    if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) : 
      $indent=0;
    // 2. closing tag - outdent now
    elseif (preg_match('/^<\/\w/', $token, $matches)) :
      $pad--;
    // 3. opening tag - don't pad this one, only subsequent tags
    elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) :
      $indent=1;
    // 4. no indentation needed
    else :
      $indent = 0; 
    endif;
    
    // pad the line with the required number of leading spaces
    $line    = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT);
    $result .= $line . "\n"; // add to the cumulative result, with linefeed
    $token   = strtok("\n"); // get the next token
    $pad    += $indent; // update the pad size for subsequent lines    
  endwhile; 
  
  return $result;
}



25 Responses to “Format XML with PHP”  

  1. 1 Matt

    Hi,

    I just wanted to say that that is a fantastic bit of code. Great stuff

  2. 2 Matt

    Also, would there be a way to exclude certain tags such as and tags from creating a line break and formatting.

  3. 3 dave

    Thanks for the feedback matt – unfortunately there’s currently no way to selectively exclude tags from being formatted and indented… out of interest, what kind of tags did you have in mind? I could possibly put exceptions in if there was a need for them.

  4. 4 tomh

    Thanks! Exactlyl I was looking for.
    Simple and powerful. Tom

  5. 5 Rolf

    Very nice code here!

  6. 6 Enrico

    Nice stuff! Would be great to have tabs instead of spaces. Something like “formatXmlString($xml, $indent = “\t”)”.

    Cheers
    Enrico

  7. 7 Zapalenie Łojotokowe

    Doesn’t work for empty tags like – you need to remove the newline between them afterwards:

    // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $xml = preg_replace(’/(>)(<)(\/*)/’, “$1\n$2$3”, $xml);

    // remove lines between an empty tag
    $xml = preg_replace(”|/ ] )(\s([^>] ))?>\n</\1|”, “”, $xml);

  8. 8 Zapalenie Łojotokowe

    Seems my comment got cut off:

    // remove lines between an empty tag
    $xml = preg_replace(”|/ ] )(\s([^>] ))?>\n</\1|”, “”, $xml);

  9. 9 zaker

    Custom padding
    1. Modify function declaration this way: function formatXmlString($xml, $padstr = “ “);

    2. change $line declaration this way:
    $line = str_repeat($padstr, $pad).$token;

  10. 10 kg23

    You can also use the linux xmllint command as so…

    public static function formatXmlString($xml) {
    $xml = escapeshellarg($xml);
    $result = array();
    exec(‘echo ‘.$xml.’ | xmllint —format -’,$result);
    $result = join(”\n”,$result);
    return $result;
    }

  11. 11 emzet

    very nice, thanks!

  12. 12 gg2laba

    good one, some little problems for me :

    - i added \s* to get reed of existing whitespaces :

    // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $xml = preg_replace(’/(>)\s*(<)(\/*)/’, “$1\n$2$3”, $xml);

    - i added $indent=0; too because the indent fail sometime for me ! :

    // 2. closing tag – outdent now
    elseif (preg_match(’/^<\/\w/’, $token, $matches)) :
    $pad—;$indent=0;

    thanks for this, made me win a lot of time.

  13. 13 Jamie

    Thanks! Just added as a handy php macro for use in EditPlus text editor.

  14. 14 David

    Really great function you wrote here.

  15. 15 Zephur

    Hi!
    Your script is very usefull, but I have a question : is there any way to check if the string is already indented and not to re-indent it? Indeed, I have a xml file tha is often modified and reindented on every change… Any solution? ^^’

    Thanks!

  16. 16 Richard Plant

    Excellent. Just what I needed. Why doesn’t PHP do this naturally?!
    Thanks.
    Rich

  17. 17 zzzzBov

    Awesome job. A little modification to the function you can have customizable padding:

    function formatXmlString($xml, $indentBase = 0, $indentString = ‘ ‘)
    {

    $pad = ($indentBase > 0) ? $indentBase : 0; // initial indent

    $line = str_pad($token, strlen($token) $pad, $indentString, STR_PAD_LEFT);
    }
    $indentBase is how may extra indents the xml should get, which is wonderful for combining multiple xml snippets

    $indentString is the string to use for indentation, for people (like myself) who enjoy using tabs instead of spaces

  18. 18 Javier

    muy bueno saludos desde Argentina

  19. 19 Eric

    Thanks for this function. It saved my lazy ass from having to write it myself. It works perfect!

  20. 20 Timur

    Thanks! I use it in my work!

  21. 21 Solow

    Works like shit. You got the idea right, but it’s no good sorry.

  22. 22 ekpyrotic

    This code has been very useful. To make one correction, change option 3 so that it captures singleton tags like as follows (note combining the < and \/ into the 'not' selector):
    // 3. opening tag …
    elseif( preg_match( '/^\/]*>.*$/’, $token, $matches ) ) : $indent=1;

    AND, to remove existing indentation on imported text first (and thus getting a totally clean result), do this FIRST (change $xml to your text variable):

    $xml = preg_replace( ‘/(?:(?)|(?))(\s )(?=\<\/?)/', "", $xml );

  23. 23 ekpyrotic

    whoops — my previous post should read:

    … singleton tags like

    also in the preg_replace the parser has turned single apostrophe ‘ which wraps the pattern into ` so beware when copying and pasting! d:)

  24. 24 hon2a

    1st result on Google “php format xml” and “php indent xml” and it doesn’t work correctly :(

    In case of step sequence [3,2] in the while cycle, $indent carries over and rest of the result is not ‘outdented’. Variable $indent needs to be set to zero at the beginning of the cycle (or in all steps except for 3rd).

  1. 1 ??????? > Blog Archive > php?xml???????????


Leave a Reply