Hagen Fritsch

Announcing Osmatravel

Illustration of a map created using osmatravelQuite frequently I refer to Wikitravel for information on places which are not covered by my guidebook, but a key-element that is pretty much always missing is: maps. Of course there is OpenStreetMap, but when I see a listing in a Wikitravel article, I also want to know where it is.
So someone else came up with a great idea, based on two facts:

  1. The listings in Wikitravel articles are mostly in a xml-format, i.e. machine-readable.
  2. Most names of these objects are also found as entries in OSM data.

Thus, he created a set of scripts and magic that is able to semiautomatically generate a map to a wikitravel article and put all the listings there, so that these can be easily located. The project is based on osmarender for rendering of raw osm-data into an image, thus the name osmatravel. Unfortunately there are some issues and the project did not receive any attention since 2009 making it practically unusable.

Illustration of a map created using osmatravelHowever, I invested some time, did a fork (github:osmatravel) and a lot of changes and adjustments, allowing anyone to create cool maps again (instructions).

See also:

What’s next?
During my time working on the project I discovered some limitations. The main one is that osmarender just uses xslt and maps don’t look as nice as mapnik ones. However, mapnik requires a database to work, which is a drawback concerning easy setup and usability.
Another main problem is data quality since some articles don’t use the xml-listings yet or the names don’t quite match the OSM ones, thus before being able to actually produce maps, cleanup needs to be done.
I imaging though that the ideas could be used to implement a webservice that automatically generates the svg-files for articles. This would make it much easier for people to actually make use of this project.

Hagen Fritsch

Dropbox Bytecode Decryption Tool

Dropbox is actually just a python application, so it is shipping the bytecode of its modules which one could theoretically use in other applications. Also building a more lightweight dropbox-client, that does not come with its own interpreter, might be a goal. Apparently though, dropbox does not want this and makes it slightly harder to get to the bytecode.
So here is a project I’ve been working on quite some time ago, which converts the encrypted python modules of dropbox to real python-2.5 modules usable in a normal interpreter. This works just fine, but as I don’t have the time to pursue this any further I’ll just provide the results (or the source) and hope that others use this as a base to continue.

Background
The encryption scheme is actually quite simple. It uses the TEA cipher along with an RNG seeded by some values in the code object of each python module. They adjusted the interpreter accordingly so that it a) decrypts the modules and b) prevents access to the decrypted code-objects. This would have been the straightforward path just letting dropbox decrypt everything and dump the modules using the builtin marshaller.
Another trick used is the manual scrambling of the opcodes. Unfortunately this could only be fixed semiautomatically thus their monoalphabetic substitution cipher proved quite effective in terms of winning some time.

Usage
You’ll find the source at github/dropboxdec

Grab and unpack the prerequisites::

wget -nv https://github.com/rumpeltux/dropboxdec/tarball/master -O - | tar xzv
wget -nv http://dl-web.dropbox.com/u/17/dropbox-lnx.x86-1.1.45.tar.gz -O - | tar xzv
# use dropbox-lnx.x86_64-1.1.45.tar.gz if you're running a 64bit os
cd .dropbox-dist; unzip library.zip; chmod a+rw -R .; cd ..

Run the decryption tool::

python dropboxdec*/dec.py .dropbox-dist

From here
The decrypted modules are python-2.5 bytecode, thus will only work in a 2.5 bytecode interpreter. There are some decompilers for other python-versions which will need some adjustments to be able to decompile the code, if anyone wants to dive deeper into the protocol.
The decryption also only works for the 1.1.45 version of dropbox. In the 1.2 branch the simple RNG was exchanged to the Mersenne Twister, so the decryption program would need to be adjusted accordingly.

If you do anything cool with it, I’d very much appreciate if you’d drop me a line and let me know :) Other than that, have fun hacking!

Hagen Fritsch

Protein Design

Protein Design Cycle
In my efforts towards world domination, I discovered that the key technology to master is biology. In a recent seminar I had to dive quite deep into protein design trying to figure how it works and what the fundamental concepts of this bioinformatic technique are.

So here are:

which I hope provide some insight into this fantastic technique and its promises.

The slides were actually done in html5 using html5slides, but the project does not yet use all the power html5 has to offer, so the slides remain pretty basic and a lot of layout overhead was needed.

Hagen Fritsch

lxml-based BeautifulSoup loader

With ElementSoup there is already a tool, that allows you to create an etree Document using the more fault-tolerant BeautifulSoup-parser. However, looking for the oposite direction (i.e. creating a BeautifulSoup document using the lxml-parser was not yet possible).
In my experience, I discover BeautifulSoup’s API much more intuitive and useful, especially for quick scraping and data manipulation tasks. So the only reason to use lxml in the first place, is that its parser is much quicker and consumes less memory.
Recently I had a workflow made for BeautifulSoup based documents, but found, that BeautifulSoup was too slow to parse my several MB document. So here is lxmlsouper, a tool, that uses lxml to parse the document and creates the BeautifulSoup DOM from it, which is at least way quicker than the native way.

Notes: feel free to exchange the etree-Implementation with whatever you like best. Also this does not emulate the BeautifulSoup-API on top of etree, but uses the etree data to create a BeautifulSoup document from scratch, copying everything.

Files: lxmlsouper.py

Usage:
import lxmlsouper
data = unicode(open("bigfile.html").read(), "utf8")
soup = lxmlsouper.fastSoupLoader(data)

Hagen Fritsch

Goodbye studiVZ

Goodbye-studiVZ-LogoUnser allseits geliebtes studiVZ ist ja schon lange nicht viel mehr als ein großer Friedhof voller Datenleichen. Immer mehr Leute entscheiden sich zurecht, die eigene Datenleiche endlich zu beerdigen und löschen ihren Account. Das hat jedoch leider immer den Wermutstropfen, dass man die ganzen Sachen, die sich da so im Laufe der Zeit angesammelt haben, seien es Nachrichten, Pinnwand-Einträge, Foto-Alben oder die Gruppenliste der Freunde, hinter sich lassen muss.
Nun ja, nicht ganz: Es gibt ein studiVZ-Plugin für den generischen POP3-Wrapper freepops mit dem man mit seinem Lieblingsemailprogramm schonmal alle eigenen Nachrichten und die eigene Pinnwand herunterladen kann. Super Sache!
Nun wäre allerdings auch noch toll, wenn man sich irgendwie ein Archiv seiner Daten dort basteln könnte. Genau das habe ich nun gemacht, indem ich ein Skript geschrieben habe, dass einem seine dort gelagerten Daten herunterlädt und rudimentär parst. Das Ergebnis ist ein strukturiertes JSON-File mit allen wichtigen Informationen, die man dort so gelassen hat. Auf Wunsch lassen sich zusätzlich zu den Profilen der Freunde auch noch deren Pinnwandeinträge, Verlinkungen und Fotoalben herunterladen, was dann etwas länger dauert und einen beachtlich großen Datenberg generiert.

Um den Ausstieg komplett gefahrlos zu machen, bietet sich nun noch an allen Leuten in der Freundesliste, die man noch irgendwie kennt, eine Nachricht mit den neuen Kontaktoptionen zu schicken.

Wer das Skript selbst benutzen möchte kann dies sehr leicht tun:

$ git clone git://github.com/rumpeltux/vz-backup.git
$ cd vz-backup
$ ./studivz email password

Und dann hinsetzen und warten, falls Fehler auftreten, bin ich natürlich an einer möglichst genauen Diagnose interessiert :)
Am Ende entsteht ein zip-File mit allen gescrapten HTML-Seiten (das auch benutzt wird, falls man nach einem Fehler den Vorgang wiederholt um nicht nochmal alle Seiten herunterladen zu müssen), sowie eine .img-list Datei mit den urls aller Bilder. Diese müssen noch seperat, z.B. mit wget heruntergeladen werden:

$ mkdir images; cd images
$ sort -u ../*.img-list | wget -i -

Für alle denen das zu kompliziert ist, gibt es auch einen Webservice, der das für euch erledigt. Geht dazu einfach auf http://studivz.irgendwo.org/goodbye/. Der Vorgang kann je nach Auslastung des Servers mehr oder weniger lange dauern und ihr bekommt eine Email mit einem Link zu eurem Archiv.

Was ich mir nun noch wünsche:

  • Mit den Daten kann man momentan noch nicht viel anfangen. Sie sind zwar „da“, aber werden nicht schön repräsentiert. Wenn jemand etwas Elan aufbringt und eine idealerweise html-only Anzeigeseite für die JSON-Daten bastelt, wäre das großartig!
  • Wenn jemand ein Diaspora-import tool schreibt, wär das auch prima. Dann könnte man eine Art migrier-Service draus machen.

So ich hoffe, das Tool nützt euch was und könnt ihr nun auch endlich studiVZ „Goodbye“ sagen :)

Nächste Einträge »