Tomboy: Bulk import files with the D-Bus interface and Python
Last month I mentioned I wanted to import a bunch of notes from my old PIM into Tomboy, but expected a lot of copying and pasting busywork since I didn't know how to do a mass import. Fortunately, a real live Tomboy developer dropped by to clue me in on the D-Bus interface with which I could use Python to script something up. (Thanks, Sandy!)
Having a starting point, my first search turned up a great Ars Technica article by Ryan Paul that gave me all the information I needed: "Using the Tomboy D-Bus interface." Read it for an explanation of what DBus is about, and some good tips for using Tomboy's API.
With this post I'm just going to focus on the simple task of loading a bunch of flat files in to Tomboy, with some elaboration on character set issues I ran in to along the way.
(Python was great for this stuff. I'm just getting started with learning the language, but was able to experiment and figure out a lot of things in the interactive shell on the way to the rewarding dbus.Boolean(True) in response to tomboy.SetNoteContents(note, s).)
Tomboy?
Tomboy is a popular (and awesome) GNOME note-taking application for GNU/Linux, by the way, which may have been helpful to mention earlier for the readers who have dropped out by now because they had no idea what I'm talking about. (Although I guess they might have clicked on a link or two.) For you that remain, obviously you are familiar with the program and just want me to get on with it.
The D-Bus interface is available in version 0.8, which is included with Ubuntu 7.10/Gutsy Gibbon. I have 7.04/Feisty Fawn and Tomboy 0.6.3 on my main machine, but was able to do the import on my 7.10 laptop and then manually copy the *.note files in to ~/.tomboy for the older version, with no apparent problems.
Get the magic tomboy object
From Ars:
import dbus, gobject, dbus.glib import os # get the d-bus session bus bus = dbus.SessionBus() # access the tomboy d-bus object obj = bus.get_object("org.gnome.Tomboy", "/org/gnome/Tomboy/RemoteControl") # access the tomboy remote control interface tomboy = dbus.Interface(obj, "org.gnome.Tomboy.RemoteControl")
(Except import os was added by me for the file system stuff below.)
Import your files
My meager contribution (which I'm placing in the Public Domain for simplicity's sake):
# some directory/folder... path = os.path.expanduser('~/Desktop/notable-files/') dirlist = os.listdir(path) dirlist.sort() for fname in dirlist: print(fname) f = open(path + fname) # d-bus complains if string params aren't valid UTF-8 title = unicode(f.readline(), 'iso8859_1') # reset to start of file and read whole file f.seek(0) s = f.read(); # replace left and right curly single quotes with ' s = s.replace('\x91', "'").replace('\x92', "'") # replace left and right curly double quotes with " s = s.replace('\x93', '"').replace('\x94', '"') # replace en and emdash with -- s = s.replace('\x96', '--').replace('\x97', '--') s = unicode(s, 'iso8859_1') # creating named notes seems to prevent notes # from showing up as "New Note NNN" note = tomboy.CreateNamedNote(title) tomboy.SetNoteContents(note, s)
Notes about the notes
My files happened to be in a state where I could use the first line of the file as a title. You might alternatively use the name of the file as the title, but make sure to prepend it to the data when setting the note contents, probably followed by \n\n.
SetNoteContents will overwrite the title with the first line of the data passed to it, so this may seem redundant to first set the title with CreateNamedNote and then set the contents where my first line is the same as the title, but in my experience, setting note contents after CreateNote results in notes named something like "New Note 539," and this doesn't get corrected even after restarting Tomboy. I've seen other odd behavior in 0.6 with note titles, where a note shows up as "New Note #" in search results even though the first line is different. I've had to change the title to force the correct display in listings.
Character set stuff
So what's the deal with the string replacements and unicode conversions?
When I first tried importing my four hundred files, I ran in to an error like this:
>>> tomboy.SetNoteContents(note, s)
ERROR:dbus.connection:Unable to set arguments (dbus.String(u'note://tomboy/afe
70879-5b43-455d-8a28-352ff4c3d806'), 'char test \n\nI\x92m testing stuff.\n\n
\x93Blah blah blah\x94. \n') according to signature u'ss':
<type 'exceptions.UnicodeError'>: String parameters to be sent over D-Bus must be valid UTF-8
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/var/lib/python-support/python2.5/dbus/proxies.py", line 135, in __call__
**keywords)
File "/var/lib/python-support/python2.5/dbus/connection.py", line 593, in call_blocking
message.append(signature=signature, *args)
UnicodeError: String parameters to be sent over D-Bus must be valid UTF-8
That's from a test file I created later, but the first instance of this had to do with a file that contained the "Registered" trademark symbol. These were files I had originally created in Windows, so I poked around and learned something about converting to unicode. It seemed likely that my Windows files were ISO-8859-1. The R symbol showed up as hex in a Python string: '\xae'. I could get it to print correctly with print u'\xae'. To convert a string variable holding the whole file, I found that the unicode conversion unicode(s, 'iso8859_1') worked out for the trademark and copyright symbols. (A table of Python standard encodings was helpful.)
The conversion to unicode worked fine for the Registered and Copyright symbols, but not so great for curly quotes. They went through D-Bus without complaint, but turned in to these funny little boxes when viewed in Tomboy:


With the enlarged view, you can see the numbers associated with this character set mismatch. So, a single right curly quote (otherwise known as an apostrophe) is 92, whatever that means.
Let's look at the Tomboy .note XML file.
- With
catand Python interactiveprint, these characters show up as blanks. - In
vi, the apostrophe shows up as<92>. (With the other squares following suit.) - In Python interactive mode "non" print (e.g.
>>> s):\xc2\x92. - Out of curiosity, I later copied a curly apostrophe from a web page and pasted it into Tomboy (so, bypassing D-Bus and removing Windows-created files from the equation) and it shows up in Python as
\xe2\x80\x99.
Rather than dig further in to this behavior, I added the replace statements in the code above.
Something else to look out for: if your string ("s") is already unicode, you may get an error like this:
>>> s.replace('\x92', 'YYZ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 0: ordinal not in range(128)
In that case, you want this instead: s.replace(u'\x92', 'YYZ').
Finally, here is another screenshot demonstrating what things might look like in Python interactive mode when experimenting with this stuff:

Question marks are a common placeholder for character set hiccups. I've also experienced some headaches with Windows filenames that didn't cross over to GNU/Linux very well, with ??? as a symptom.
I don't know if this post should properly be categorized in the internationalization ("i18n") department, but I'm going to use those terms to potentially ensnare future searchers, in the hopes that this may be of some benefit to them/you. :-)
by Scott Carpenter on 23 February 2008 at 4:11 pm
permalink |
comments (4) | filed under python
tagged: tomboy
Comments
-
Thank you! The import script dies if it has subdirectory (e.g. I am trying to import from MS Outlook, which allows nested Notes folders). Gives error like so:
----------------
nickj@redux:~/move to linux box/mail export$ python import-notes.py
DreamHost
Traceback (most recent call last):
File "import-notes.py", line 20, in
f = open(path + fname)
IOError: [Errno 21] Is a directory
----------------... Could it perhaps create a new Notebook (named after the subdirectory), and put the subdirectory's notes into that Notebook?
Also, I am currently getting blank notes when I do the import (the title comes through okay), so I may be doing something wrong there....
It would be great if this functionality could be built into Tomboy, so that people could easily import and export notes.
-- All the best,
Nick.Posted by Nick Jenkins on 2 September 2008 at 3:43 am
-
Hey thanks a heap for this, it just saved me a ton of time! I've been through the basics of learning Python, and its good to be able to use it like this.
Posted by Sam Hassell on 3 November 2008 at 7:50 am
You can follow any responses to this entry through the
comments feed.


Richard Stallman: