Posting del.icio.us Links to WordPress: Finishing Up

On Wednesday, I posted more info about how to clean up my weekly del.icio.us links. There are a few things I’d like to do before I wrap this up.

  1. change all tags and attributes to lowercase
  2. close every dt element
  3. close every dd element
  4. make things a bit more automatic

If we take a closer look at the code for each entry we will see a pattern.


One line has a <DL> followed by the anchor. The next line has a <DD> followed by my comments.

<DT><A HREF="url" LAST_VISIT="1238086010" ADD_DATE="1238086010" TAGS="tagone,tagtwo">Link text</A>
<DD>comments

The only thing that makes this tricky at all is that sometimes the comments span more than one line. We can get around this fairly easily though. All we need to do is put the closing </dd> before all the <DT> tags except the first one. Let’s make that easier by changing the first one to lowercase. We’ll change part of what we did yesterday to accomplish this. Instead of replacing

<DL><p>

with

<dl>

we will replace

<DL><p><DT><A HREF=

with

<dl><dt><a href=

The rest is of the cleanup is pretty straightforward.


Replace

<DT><A HREF="

with

</dd><dt><a href="

and

</A>

with

</a></dt>

and then

LAST_VISIT=[^<]*TAGS="

with

tags="

since I don’t need two of those attributes anyway.

And I almost forgot

<DD>

with

<dd>

Wrap it all up and we have

grep '^> ' < links.diff |awk '{sub(/<DL><p><DT><A HREF=/,"<dl><dt><a href=")};{sub(/<\/A>/,"</a></dt>")};{sub(/<DT><A HREF=/,"</dd><dt><a href=")};{sub(/<DD>/,"<dd>")}{sub(/LAST_VISIT[^<]*TAGS=/,"tags=")};{sub(/^> /, "")};!/<\/DL>/{print}' > foo.html;echo "</dl>" >> foo.html

All we need now is to make the whole process more automatic. Since we have to add that line break in the old export file we can change things up once again to do that automatically. And since we will probably want to save this as a shell script, we can go ahead and make it more readable. I changed a couple of things I didn’t detail here and this is what I ended up with:

First I generalize a bit so I can change things later if I want to

diff $OLDLINKS $NEWLINKS |grep '^> ' |awk '{sub(/<\/A>/,"</a></dt>")};{sub(/<DL><p><DT><A HREF=/,"<dl><dt><a href=")}{sub(/<DT><A HREF=/,"</dd><dt><a href=")};{sub(/<DD>/,"<dd>")}{sub(/LAST_VISIT[^<]*TAGS=/,"tags=")};{sub(/^> /, "")};!/<\/DL>/{print}' > $MYLINKS;echo "</dl>" >> $MYLINKS

then decide on path names (I like to let FireFox save in Downloads automatically and I’m going to delete the new links file anyway, so I set the pathname accordingly.)

export LINKSDIR=$HOME/Documents/Personal/blogging
export OLDLINKS=$LINKSDIR/old-delicious.htm
export NEWLINKS=$HOME/Downloads/delicious-`date "+%Y%m%d"`.htm
export MYLINKS=$LINKSDIR/mylinks.html

then we make our new links file the old one for next week. We should also add that line break while we’re at it (and remove the new links file)

awk '{sub(/<DL><p>/,"<dl>\n")};{print}' < $NEWLINKS > $OLDLINKS
rm $NEWLINKS

and I like to go ahead and open my links file so I can make any quick edits and then post

mate $MYLINKS

I save it and then put it in PATH and make executable

sudo mv preplinks /usr/bin/
sudo chmod 755 /usr/bin/preplinks

You can grab the script here and do the same.

Now every week I go to del.icio.us and export my bookmarks as html and then I run

preplinks

and TextMate launches with my html all ready to be checked and posted.

Works for me.

This is the last in a series of posts. The first two posts are here and here.

Cleaning Up My del.icio.us Links

On Monday, I posted some info about how I am thinking of posting my weekly links.

Today I want to make one correction to the process, talk details about how to clean up the diff file, and then put together a quick script to do that part automatically. Once again, I am going to do this for the first time as I write this. I will summarize the process below.

First, the correction. After my first use of this method I discovered that one more quick edit to the html export will make the parsing of the diff file much easier. Before I move ~/delicious.htm to ~/delicious-old.htm I need to add a line break just after <DL><p>. It may not seem like much but it makes a big difference.

Actually, as it turns out, this is fairly easy to do with awk and grep. Let’s take a look at exactly what I want to do first.

I am only interested in lines that start with > and a space so I start with

grep '^> ' < links.diff

I want to replace the <DL> with <dl> and I don’t need the <p> at all. So now I have

grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")}

Now we get rid of the > and the space at the beginning of each line.

grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")};{sub(/^> /, "")}

Then we don’t print the last line at all.

grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")};{sub(/^> /, "")};!/<\/DL>/{print}'

This gives me everything I need but I still have uppercase tags and attributes, some attributes I don’t really care about, and none of the elements are closed. We can take care of closing the <dl> with a simple echo “</dl>” after it.

echo "</dl>"

So, if we want to save all this to a file we can do this.

grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")};{sub(/^> /, "")};!/<\/DL>/{print}' > foo.html;echo "</dl>" >> foo.html

Now all I need to do is clean up those uppercase letters and close all the other elements. I’ll take a look at that on Friday.

This is the second in a series of posts. The first post is here and the next one is here.

Experience