Format XML with PHP

Courtesy of TJ at devnet

function formatXmlString($xml) {  
  
  // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
  $xml = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $xml);
  
  // now indent the tags
  $token      = strtok($xml, "\n");
  $result     = ''; // holds formatted version as it is built
  $pad        = 0; // initial indent
  $matches    = array(); // returns from preg_matches()
  
  // scan each line and adjust indent based on opening/closing tags
  while ($token !== false) : 
  
    // test for the various tag states
    
    // 1. open and closing tags on same line - no change
    if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) : 
      $indent=0;
    // 2. closing tag - outdent now
    elseif (preg_match('/^<\/\w/', $token, $matches)) :
      $pad--;
    // 3. opening tag - don't pad this one, only subsequent tags
    elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) :
      $indent=1;
    // 4. no indentation needed
    else :
      $indent = 0; 
    endif;
    
    // pad the line with the required number of leading spaces
    $line    = str_pad($token, strlen($token)+$pad, ' ', STR_PAD_LEFT);
    $result .= $line . "\n"; // add to the cumulative result, with linefeed
    $token   = strtok("\n"); // get the next token
    $pad    += $indent; // update the pad size for subsequent lines    
  endwhile; 
  
  return $result;
}



9 Responses to “Format XML with PHP”  

  1. 1 Matt

    Hi,

    I just wanted to say that that is a fantastic bit of code. Great stuff

  2. 2 Matt

    Also, would there be a way to exclude certain tags such as and tags from creating a line break and formatting.

  3. 3 dave

    Thanks for the feedback matt – unfortunately there’s currently no way to selectively exclude tags from being formatted and indented… out of interest, what kind of tags did you have in mind? I could possibly put exceptions in if there was a need for them.

  4. 4 tomh

    Thanks! Exactlyl I was looking for.
    Simple and powerful. Tom

  5. 5 Rolf

    Very nice code here!

  6. 6 Enrico

    Nice stuff! Would be great to have tabs instead of spaces. Something like “formatXmlString($xml, $indent = ”\t”)”.

    Cheers
    Enrico

  7. 7 Zapalenie Łojotokowe

    Doesn’t work for empty tags like – you need to remove the newline between them afterwards:

    // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $xml = preg_replace(’/(>)(< )(\/*)/’, ”$1\n$2$3”, $xml);

    // remove lines between an empty tag
    $xml = preg_replace(”|/ ] )(\s([^>] ))?>\n

  8. 8 Zapalenie Łojotokowe

    Seems my comment got cut off:

    // remove lines between an empty tag
    $xml = preg_replace(”|/ ] )(\s([^>] ))?>\n

  9. 9 zaker

    Custom padding
    1. Modify function declaration this way: function formatXmlString($xml, $padstr = ” ”);

    2. change $line declaration this way:
    $line = str_repeat($padstr, $pad).$token;

Leave a Reply