Posting del.icio.us Links to WordPress: Finishing Up
On Wednesday, I posted more info about how to clean up my weekly del.icio.us links. There are a few things I’d like to do before I wrap this up.
- change all tags and attributes to lowercase
- close every dt element
- close every dd element
- make things a bit more automatic
If we take a closer look at the code for each entry we will see a pattern.
One line has a <DL> followed by the anchor. The next line has a <DD> followed by my comments.
<DT><A HREF="url" LAST_VISIT="1238086010" ADD_DATE="1238086010" TAGS="tagone,tagtwo">Link text</A>
<DD>comments
The only thing that makes this tricky at all is that sometimes the comments span more than one line. We can get around this fairly easily though. All we need to do is put the closing </dd> before all the <DT> tags except the first one. Let’s make that easier by changing the first one to lowercase. We’ll change part of what we did yesterday to accomplish this. Instead of replacing
<DL><p>
with
<dl>
we will replace
<DL><p><DT><A HREF=
with
<dl><dt><a href=
The rest is of the cleanup is pretty straightforward.
Replace
<DT><A HREF="
with
</dd><dt><a href="
and
</A>
with
</a></dt>
and then
LAST_VISIT=[^<]*TAGS="
with
tags="
since I don’t need two of those attributes anyway.
And I almost forgot
<DD>
with
<dd>
Wrap it all up and we have
grep '^> ' < links.diff |awk '{sub(/<DL><p><DT><A HREF=/,"<dl><dt><a href=")};{sub(/<\/A>/,"</a></dt>")};{sub(/<DT><A HREF=/,"</dd><dt><a href=")};{sub(/<DD>/,"<dd>")}{sub(/LAST_VISIT[^<]*TAGS=/,"tags=")};{sub(/^> /, "")};!/<\/DL>/{print}' > foo.html;echo "</dl>" >> foo.html
All we need now is to make the whole process more automatic. Since we have to add that line break in the old export file we can change things up once again to do that automatically. And since we will probably want to save this as a shell script, we can go ahead and make it more readable. I changed a couple of things I didn’t detail here and this is what I ended up with:
First I generalize a bit so I can change things later if I want to
diff $OLDLINKS $NEWLINKS |grep '^> ' |awk '{sub(/<\/A>/,"</a></dt>")};{sub(/<DL><p><DT><A HREF=/,"<dl><dt><a href=")}{sub(/<DT><A HREF=/,"</dd><dt><a href=")};{sub(/<DD>/,"<dd>")}{sub(/LAST_VISIT[^<]*TAGS=/,"tags=")};{sub(/^> /, "")};!/<\/DL>/{print}' > $MYLINKS;echo "</dl>" >> $MYLINKS
then decide on path names (I like to let FireFox save in Downloads automatically and I’m going to delete the new links file anyway, so I set the pathname accordingly.)
export LINKSDIR=$HOME/Documents/Personal/blogging
export OLDLINKS=$LINKSDIR/old-delicious.htm
export NEWLINKS=$HOME/Downloads/delicious-`date "+%Y%m%d"`.htm
export MYLINKS=$LINKSDIR/mylinks.html
then we make our new links file the old one for next week. We should also add that line break while we’re at it (and remove the new links file)
awk '{sub(/<DL><p>/,"<dl>\n")};{print}' < $NEWLINKS > $OLDLINKS
rm $NEWLINKS
and I like to go ahead and open my links file so I can make any quick edits and then post
mate $MYLINKS
I save it and then put it in PATH and make executable
sudo mv preplinks /usr/bin/
sudo chmod 755 /usr/bin/preplinks
You can grab the script here and do the same.
Now every week I go to del.icio.us and export my bookmarks as html and then I run
preplinks
and TextMate launches with my html all ready to be checked and posted.
Works for me.
This is the last in a series of posts. The first two posts are here and here.
Cleaning Up My del.icio.us Links
On Monday, I posted some info about how I am thinking of posting my weekly links.
Today I want to make one correction to the process, talk details about how to clean up the diff file, and then put together a quick script to do that part automatically. Once again, I am going to do this for the first time as I write this. I will summarize the process below.
First, the correction. After my first use of this method I discovered that one more quick edit to the html export will make the parsing of the diff file much easier. Before I move ~/delicious.htm to ~/delicious-old.htm I need to add a line break just after <DL><p>. It may not seem like much but it makes a big difference.
Actually, as it turns out, this is fairly easy to do with awk and grep. Let’s take a look at exactly what I want to do first.
I am only interested in lines that start with > and a space so I start with
grep '^> ' < links.diff
I want to replace the <DL> with <dl> and I don’t need the <p> at all. So now I have
grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")}
Now we get rid of the > and the space at the beginning of each line.
grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")};{sub(/^> /, "")}
Then we don’t print the last line at all.
grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")};{sub(/^> /, "")};!/<\/DL>/{print}'
This gives me everything I need but I still have uppercase tags and attributes, some attributes I don’t really care about, and none of the elements are closed. We can take care of closing the <dl> with a simple echo “</dl>” after it.
echo "</dl>"
So, if we want to save all this to a file we can do this.
grep '^> ' < links.diff |awk '{sub(/<DL><p>/,"<dl>")};{sub(/^> /, "")};!/<\/DL>/{print}' > foo.html;echo "</dl>" >> foo.html
Now all I need to do is clean up those uppercase letters and close all the other elements. I’ll take a look at that on Friday.
This is the second in a series of posts. The first post is here and the next one is here.
