Alpha Maps: a mashup of Wolfram|Alpha and Google Maps

Hands down, Wolfram is awesome. Mathematica has been my favorite computer algebra system* from the start (okay, there were some wrong ways towards Maple in the very beginning, but hey, everybody makes mistakes); and Wolfram|Alpha is a big step (and still stepping) towards a huge semantic web.

Now I just came across the pretty new Wolfram|Alpha API and thought I had to do something with it. Having been a maps guy since the tripedia days, I had the idea of mashing up W|A with Google Maps.

A user can enter an arbitrary Wolfram|Alpha query (usually a city), the query is sent to W|A and the result is parsed on the client side. All items in the result that refer to geolocations are queried also to get a set of markers on the map. With nice info windows showing the data that W|A returned for them.

W|A sends quite a lot of information, but for now I was only interested in plain facts. These can be accessed using the plaintext field for every so-called “subpod”. Each of these plaintexts actually represents a table encoded by newlines and | separators. One could argue about the elegance of this representation in the API, but whatever.

The W|A API is a server-side-only API, as it doesn’t set the Access-Control-Allow-Origin HTTP header to *, so accessing it directly from any modern browser will violate the same-origin policy and thus fail. Therefore, a thin PHP wrapper on a server is needed. The good thing is that this wrapper can also cache queries to avoid reaching the free API limit of 2000 monthly calls too soon.

I don’t want to go into the details of the implementation here—it’s basically a bit of JavaScript using jQuery. If you’re interested, the code is online at github, and the resulting site is Alpha Maps.

It’s been a nice day of hacking. Any thoughts on how to turn this into something (even more) useful?

* I also like Sage, of course, and—surprise!—Mathics.

Converting text files to UTF-8

In a rather old project I’m working on again now, there used to be a lot of Latin-1-encoded files. Yuck! I don’t even want to know why anybody ever created or used a character encoding other than UTF-8. So I thought, let’s get these old-school files a decent encoding.

iconv can do the job:

iconv -f L1 -t UTF-8 filename >filename.converted

This will convert the file filename from Latin-1 to UTF-8 and save it as filename.converted.

To find all relevant files in the project directory, we use find, of course. The only issue with this is that a simple for x in `find ...` loop will not handle filenames containing spaces correctly, so we apply while read to it, as in:

find . -name '*.php' | while read x; #...

This will execute the rest with a variable x being assigned every PHP filename in the current directory. (There are other approaches to this as well, of course.)

Now there’s only one problem left to deal with: Some files in the directory are already UTF-8-encoded. Of course, we don’t want to re-encode them again. (Decoding from Latin-1 and encoding to UTF-8 is not idempotent for characters beyond ASCII.) There might be other solutions, but I decided to use Python and the chardet package to determine whether a file is already UTF-8-encoded:

import chardet
if chardet.detect(str)['encoding'].lower() == 'utf-8':
    print ('UTF-8')
    print ('L1')

This will print UTF-8 if the string str is encoded in UTF-8 and L1 otherwise.

Adding some code to output the current file and to remove the original file and replace it by the converted one, we get the following script:

find . -name '*.php' | while read x; do
    e=$(python -c "import chardet; print ('UTF-8' if chardet.detect(file('$x').read())['encoding'].lower() == 'utf-8' else 'L1')")
    echo "converting $x: $e"
    iconv -f $e -t UTF-8 "$x" > "$x.utf8"
    rm "$x"
    mv "$x.utf8" "$x"

We can also assemble this into a bash one-liner if we prefer:

find . -name '*.php' | while read x; do e=$(python -c "import chardet; print ('UTF-8' if chardet.detect(file('$x').read())['encoding'].lower() == 'utf-8' else 'L1')"); echo "converting $x: $e"; iconv -f $e -t UTF-8 "$x" > "$x.utf8"; rm "$x"; mv "$x.utf8" "$x"; done