Saturday, March 9, 2013

Instant Crappy XML in TextWrangler with Applescript

One of the things I've been thinking about recently is JanusNode-style output in Quartz Composer, so that, I dunno, streams of nonsense can come out of goatse's ass or something. The JanusNode comes with a whole bunch of ready-made material in text files, so it'd save a whole lot of work if we could get those into QC.

QC will do a lot of stuff, but one of the things it won't do without massaging is get the contents of a text file. There are third-party plugins out there that'll do such things, but I like being able to export to a movie that'll play in Quicktime 7: from there those will render to honest-to-god video instead of remaining a collection of widgets executing graphical instructions in a proprietary environment.

XML files, as opposed to text files, are a different matter; .dae files are just big XML files and that annoying Mac screensaver that displays RSS feeds is parsing the same kind of thing. I know nearly nothing about XML, but it seems to me that the XML desired by QC is pretty damned dumb, and this works in an otherwise blank file:

<XML>
     <words>
          <data>goatse</data>
     </words>
     <words>
          <data>pygmy goat</data>
     </words>
</XML>


From there it's not a problem to get QC to choose between the two items, because it can count the amount of "words" tags and get what's in the "data" tags (or whatever other tag you wanna specify at that third level) according to some number-choosing operation: which would you like?

The problem here is that some of these lists are long. Like this one:



There are about 7000 lines in that file, many of them varieties of zombie.

Fortunately, though, that screenshot is of wonderful Mac freebie TextWrangler, and it can do shit, like crazy grep replacements across multiple files. First I thought I could be a smarty-pants and write one grep for a whole file, but I'm not that good, so I broke it up. If you do a find for "^.*$" and replace that result with "\t<words>\r\t\t\t<data>&</data>\r\t</words>" then every line in the file gets wrapped in the "words" and "data" tags and appropriately indented.



And, quite wonderfully, TextWrangler can save that search pattern in the little g-for-grep drop-down menu on the right and reapply it to anything you wanna deal with in future.

Then, since going to the start and end of a file and typing a few characters is backbreaking labour worthy only of the salt mines, you add saved find/replaces like so: find "\A^" and replace with "<XML>\r&", then find "\Z$" and replace with "&\r</XML>" and voila, list to XML file in three steps.

But wait! You say you are as unskilled as I am and EVEN LAZIER? Why then you use AppleScript, you lazy person, and you can use TextWrangler's capabilities to keep a script in the script menu, making it a one-step process. This last thing is a pain in the ass to get running in a satisfactory manner, thus this post for slugabeds everywhere. Who have Macs. And TextWrangler. And want to make word lists into XML files. Hello, possible person who may be me forgetting something! Remember to be less boring.

With a little bit of cribbing from this post and snippets of a script recorded within TextWrangler, I managed this very satisfying piece of work:

tell application "TextWrangler"
activate

replace "^.*$" using "\\t<words>\\r\\t\\t\\t<data>&</data>\\r\\t</words>" searching in document 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false, showing results:no} saving no

replace "\\A^" using "<XML>\\r&" searching in document 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false, showing results:no} saving no

replace "\\Z$" using "&\\r</XML>" searching in document 1 options {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false, showing results:no} saving no
end tell


And now that script (note the double-slashes in the AppleScript grep) lives in here in TextWrangler and works fine:





I might make it a drag-and-drop converter thing later if HEY IF I UPGRADE I GET PONIES.

11 comments:

mikey said...

XML is an awesomely powerful export or staging format. The problem that the enterprise has discovered it that dumping databases to xml (at one point considered the DUH answer to realtime ad-hoc queries) produces a file many times LARGER than the binary/Compact native SQL file. Storage isn't the end of the world, but efficiency is a bit a nightmare.

But for specific applications (from microsoft file formats to getting word lists into Quartz) xml is hard to beat. And, of course, in the web services world, you have everything from SOAP to REST to use your xml in conjunction with your java apps on Tomcat and your data in MySQL (or Cassandra if you're so inclined) to make the future...

M. Bouffant said...

Where my eyes glazed over:
but I like being able to export to a movie that'll play in Quicktime 7: from there those will render to honest-to-god video instead of remaining a collection of widgets executing graphical instructions in a proprietary environment.
"Proprietary environment" gets me every time.

Substance McGravitas said...

Man, I figured you'd be gone well before then.

In a way I think it's pretty weird that QC hasn't busted out into the Linux world.

OBS said...

You could do this a helluvalot easier with PERL.

Big Bad Bald Bastard said...

That accursed rug really brought that accursed room together.

Substance McGravitas said...

You could do this a helluvalot easier with PERL.

It's clear that OTHER people can do this a hell of a lot easier with PERL.

I gave PERL a shot once and it just didn't work out for some reason.

This code is mostly a recording via AppleScript, so the effort is in accord with my laziness principles.

OBS said...

Helpful beer snob is helpful.

Bonus: my solution handles multiple files at once. I hope that also appeals to your laziness principles.

Substance McGravitas said...

Thank you! I may bug you for a PERL script from time to time. It's still fairly opaque to me.

TextWrangler is awesome enough to do finds and replaces across files and folders; once I'd figured out and saved the proper three greps there wasn't really a need to go through the step of consolidation; I was just irked that I couldn't type a list and one-click it into XML.

OBS said...

I'd be glad to help with PERL stuff, 'specially if it's for profane and/or goatse-esque purposes.

The great thing about PERL is that there are a million different ways to accomplish stuff with it. That is also the worst thing about it.

Substance McGravitas said...

Yeah, the vast array of typos I could make seemed like my own personal time-bomb.

zombie rotten mcdonald said...

I'd be glad to help with PERL stuff, 'specially if it's for profane and/or goatse-esque purposes.

Great. Now Substance has an accomplice.