Death and reincarnation of some bitten fruit

Usually, when I turn on my computer, I don’t spend a second thinking about whether it will actually turn on. From now on I probably will.

I just wanted to resume from sleep (which is the usual mode of non-operation for it), but the screen remained black and nothing happened. That’s not totally unusual, as sometimes something in the computer decides it doesn’t want to wake up, in which case you have to turn it off and on again. That’s what I did, but in this unique case, it still remained black. The front light was on and their was some noise from the fan, but nothing more. No Mac startup sound, which I never particularly liked, but in this case would have loved to hear.

Letting it rest for a while, removing the battery, resetting the PRAM, switching RAM modules, nothing helped. Finally, I made an appointment with the Apple Genius Bar, conveniently located a few blocks away in the Stanford Shopping Center.

After a few experiments by my nice “genius,” he told me it would be $310 to have my notebook (a mid-2008 15″ MacBook Pro 2.4 GHz Intel Core 2 Duo with 4 GB RAM running Leopard, by the way) repaired. Not too cheap, but I would have paid anything to be able to work on my stuff again. Luckily though, the guy continued his research and finally told me the repair would be free—even though warranty had expired about two years ago!

Apparently, it was another case of this well-known graphics issue. Actually, I had occasionally noticed some of the described symptoms and was aware of this issue, I just had not thought that it could also cause the computer not to start up at all.

Even better, the guy told me I could probably get my computer back in one or two days—which would have been a Sunday, amazingly enough. And I would not have to worry about my data, the hard drive was safe.

As if that had not been good enough, they called me on the same day at 6.30 pm that my computer was ready for pickup! (I had been in the store at 10.45 am before.) I got there at around 8 pm (yes, my Austrian friends, they’re still open at that time, usually even on Sundays!) and got back my ready-to-rock shiny-new MacBook again. Apparently, they had even polished it.

One could argue that this problem shouldn’t have occurred in the first place, but still, well done, Apple, awesome support! Even if I had concerns about Apple’s policy recently (and not only recently), right now, I love you again.

Had some good steaks to celebrate this quick reincarnation.

Working and networking

With the slowly-becoming-usual delay, here’s what happened last week: On Monday, I was invited to the Innovation Ecosystems Summit by Martha Russell. Have a look at the authors of Semantic Analysis of Energy-Related Conversations in Social Media: A Twitter Case Study and you’ll know why I was invited. ;)

As expected, the conference was targeted more towards economics, which is of course not my primary scientific field of interest, but nevertheless it was very interesting. Listening to people like the director of an IBM research center at the world-wide hub of high-tech innovation feels just right. I particularly liked the session about data mining and visualization, starring charismatic Sean Gourley from Quid (a nice combination of data mining and visualization to locate innovative startups) and Mathieu Bastian from LinkedIn (who’s working on the awesome graph exploration tool Gephi). The food was pretty good as well. :)

On Tuesday, I continued working on stuff we (Mark, Tania, Markus, me) would discuss in our meeting on Wednesday. Basically, my work here aims at two things:

  1. understanding the collaboration of experts in the development of ICD-11, for instance by finding models to predict which concepts are likely to be changed (thereby measuring their “maturity”), and
  2. creating tools to examine and ideally improve collaboration.

While the first part naturally involves all sorts of statistics and machine learning stuff, the second part is rather of an engineering type. Interestingly enough, my TwitterExplorer (“twex”) comes into play again here, as it already provides the basics for browsing the ontology of diseases, which basically is a huge graph of diseases and categories they are in. People seem to like it.

I kept talking about it on Thursday, when I attended a presentation at the Triple Helix Conference by Camilla Yu, who used twex to analyze “branding and reputation of innovation hubs”. Moreover, I had a nice conversation with Jukka Huhtamäki from Finland about the potential future of a general interactive, Web-based network explorer. I’m really excited about working on it—as soon as I have time.

Batch-converting CSV to Google Maps KML to illustrate Slovenian name signs in Carinthia

Inspired by a tweet by DaddyD (which is totally true: the “graphic” by Kleine Zeitung is just a failed table), I figured there should be an easy way of generating maps from a set of places. In this case, to illustrate the places in Carinthia affected by the new agreements concerning Slovenian name signs. Google Maps to the rescue! It allows you to create custom maps with markers (and lines, polygons, etc.) at arbitrary positions—the problem is just that it’s pretty complicated to do that by hand for a bunch of places.

Being a geek, I wanted to automate that, of course. The workflow would be as follows:

  1. create a spreadsheet with columns for the place name, additional text (in this case, the Slovenian place name), additional information for the Google geocoder (region etc.), and the desired marker color, using OpenOffice or whatever,
  2. export that spreadsheet to CSV,
  3. have a Python script batch-geocode all the places and put corresponding markers in a KML file,
  4. import the KML file to Google Maps.

You can download the UTF-8-encoded CSV file for the Slovenian place names.
However, the interesting part is the Python script, of course:

from geopy import geocoders
import csv

geocoder = geocoders.Google()
data = csv.reader(open('data.csv', 'r'), delimiter='\t', quotechar='"')
kml = []
print "Geocoding places"
for line in data:
	place, place_extra, sub_region, region, country, color = [value.decode('utf-8') for value in line]
	if not color:
		color = 'blue'
	query = place
	if sub_region: query += u", " + sub_region
	if region: query += u", " + region
	if country: query += u", " + country
	try:
		name, (lat, lng) = geocoder.geocode(query.encode('utf-8'), exactly_one=True)
		kml.append((u"<Placemark><name>%s/%s</name>" + \
			u"<Style><IconStyle><Icon><href>http://www.google.com/intl/en_us/mapfiles/ms/icons/%s-dot.png</href></Icon></IconStyle></Style>" + \
			u"<Point><coordinates>%s,%s</coordinates></Point></Placemark>") % \
			(place, place_extra, color, lng, lat))
	except ValueError:
		print u"Place not found: %s/%s" % (place, place_extra)
print "Writing data.kml"
kml = u"\n".join(kml)
kml = u"""<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>Exported KML data</name>
%s
</Document>
</kml>
""" % kml
out = open('data.kml', 'w')
out.write(kml.encode('utf-8'))
out.close()
print "Done"

You can also download the whole Python script. Given the file data.csv, it looks up all the place names using the geopy Google geocoder and exports the places to a KML file data.kml. You can download the resulting KML file.

The KML file can then be easily imported into a custom Google Maps map, which results in this map:

Kärntner Ortstafeln auf einer größeren Karte anzeigen

Blue markers denote places with Slovenian names according to the old regulation and new VfGH decisions since 2001; red markers represent places with a Slovenian population of more than 17.5%; and yellow markers are considered “room for negotation”.

The general CSV-to-KML converter could be used for many more beautiful maps, of course.

UPDATE: Made some manual corrections to the markers, thanks to comments on Twitter and derStandard.at.

Mathics goes open source

I’ve been working a lot on Mathics again during the last weeks. All towards the goal that I’ve had in mind for a long time: to release it as open source. After all, that’s the only thing that could set Mathics apart from Mathematica—the only thing that’s not so good about Mathematica is that it’s not free, neither as in freedom nor as in free beer.

If you wonder what Mathics is: it’s a general-purpose computer algebra system implementing the awesome Mathematica mathematics/programming language. Some of its most important features are

  • a powerful functional programming language,
  • a system driven by pattern matching and rules application,
  • rationals, complex numbers, and arbitrary-precision arithmetic,
  • lots of list and structure manipulation routines,
  • an interactive graphical user interface right in the Web browser using MathML (apart from a command line interface),
  • creation of graphics (e.g. plots) and display in the browser using SVG,
  • an online version at www.mathics.net for instant access,
  • export of results to LaTeX (using Asymptote for graphics),
  • a very easy way of defining new functions in Python,
  • an integrated documentation and testing system.

Read the manual to learn more about the over 350 built-in functions and symbols in Mathics.

The actual heavy math stuff (like integration) is mostly done by the Python package SymPy. There is optional support for functions depending on Sage as well. Despite “out-sourcing” most mathematical functions, Mathics has more than 20,000 lines of Python code already, dealing with much non-trivial stuff such as parsing Mathematica input, pattern matching, graphics generation, etc.

Unfortunately, Firefox is the only browser supported so far, because no other browser supports MathML yet. However, this is expected to change pretty soon when Webkit (used by Safari and Chrome) adds MathML support. I wonder whether Internet Explorer will ever get that far.

I set up a project homepage at www.mathics.org for organizational stuff, while www.mathics.net is still the place for the online interface of Mathics. The source code is hosted at github.

I hope that one day there will be developers joining me. Contact me if you want to get involved!


TwitterExplorer

Finished my work on TwitterExplorer (as far as work can ever be finished). It now features a huge, interactive clustered network of Twitter #hashtags. There are detailed statistics, timelines, and a classification (attempt) for each hashtag, plus a separate clustered network of the 3-hop network around it.

This is one more Python/Django application (for details, see the about page). Especially the interactive graph involved a lot of JavaScript tweaks with the JavaScript InfoVis Toolkit, although I’m not really sure whether it was really worth using it—in the end, I might have needed less code by starting from scratch, using just jQuery and and the HTML5 canvas element.

This is also the first time I placed a flattr button somewhere. Let’s see how much it generates—I don’t have any expectations.

iPhone SDK for MacOS X 10.5 Leopard

Thanks to Drop The Nerd, found a way of installing a version of the iPhone SDK (3.1.3) that still supports Leopard: Download from apple.com. Maybe I should’ve already switched to Snow Leopard, but actually I’m not missing much in Leopard (at least not much that Snow Leopard provides, I think), so I feel like I can still wait for “Lion”.

Now I will first create a GUI mockup for our software engineering project (“car sharing application”), and then, some day, there might be a tripedia iPhone app…

Code line count

I just asked myself the question, how much code have I written so far in my life? I wanted to break it down into individual projects and programming languages. A shell script using wc -l could have done it, but I decided to write a short script in my “mother tongue” Python.

To setup projects and programming languages, the following variables are used:

PROJECTS = {
	'tripedia.org': '/Users/Jan/Projekte/tripedia.org/tripedia',
	'-BatchWorks': ['/Users/Jan/Projekte/BatchWorks/Src', '/Users/Jan/Projekte/BatchWorks/Components'],
	'-Mathador': '/Users/Jan/Projekte/Mathador',
}

EXCLUDE_DIRS = set(['_Kopien_', 'prototype', 'scriptaculous', 'innerdom', 'livepipe', 'gmapsutil', 'TBP', ])
EXCLUDE_FILES = set(['prototype.js', 'carousel.js', 'printf.js'])

TYPES = {
	'Pascal': 'pas',
	'C/C++': ['h', 'hpp', 'cpp', 'c'],
	'HTML': ['html', 'htm', 'xhtml', 'xhtm'],
	'CSS': 'css',
	'Python': 'py',
	'Java': 'java',
	'JavaScript': 'js',
	'PHP': 'php',
}

A minus at the beginning of a project name indicates that this is an old, discontinued project. An inverse dictionary of file types to programming languages is built using the following code:

def flatten(seq):
	""" flattens lists and sequences to one dimension """

	if isinstance(seq, (list, types.GeneratorType)):
		for item in seq:
			for sub_item in flatten(item):
				yield sub_item
	else:
		yield seq

TYPES_INV = dict(flatten(((ext, type) for ext in ([exts] if isinstance(exts, basestring) else exts)) for type, exts in TYPES.iteritems()))

The core of the script is the following loop over projects, directories, and files:

result = {}
for project, dirs in PROJECTS.iteritems():
	if isinstance(dirs, basestring):
		dirs = [dirs]
	lines = {}
	for dir in dirs:
		for dirpath, dirnames, filenames in os.walk(dir):
			for exclude in EXCLUDE_DIRS:
				try:
					dirnames.remove(exclude)
				except ValueError:
					pass
			for filename in filenames:
				if filename not in EXCLUDE_FILES:
					basename, ext = os.path.splitext(filename)
					ext = ext[1:]	# remove leading '.'
					type = TYPES_INV.get(ext)
					if type is not None:
						lc = linecount(os.path.join(dirpath, filename))
						inc_dict(lines, type, lc)
result[project] = lines

The actual line counting for a single file is done by the following very simple function:

def linecount(filename):
	""" determine the number of lines in a file """

	return sum(1 for line in open(filename, 'r'))

There are probably faster methods, but this is just very easy and “pythonic”, I think. The function inc_dict simply increments a dictionary entry:

def inc_dict(d, k, i):
	""" increment the dictionary entry d[k] by i, or set it to i if not present already """

	try:
		d[k] += i
	except KeyError:
		d[k] = i

Finally, a table with all the line counts is produced:

types_total = dict((type, sum(lines.get(type, 0) for project, lines in result.iteritems())) for type in TYPES)
types = sorted(type for type, count in types_total.iteritems() if count > 0)
table = []
table.append(["Project"] + types + ["Total"])
for project, lines in sorted(result.iteritems(), key=lambda (p, l): project_key(p)):
	table.append([project.lstrip('-')] + [lines.get(type, "") for type in types] + [sum(lines.values())])
table.append(["Total"] + [types_total[type] for type in types] + [sum(types_total.values())])
print html_table(table)
print text_table(table)

The functions for outputting the table in HTML and text format are also pretty simple:


def html_table(table):
	html = ["<table>"]
	html.append("<thead><tr>" + "".join(("<th>%s</th>" % cell) for cell in table[0]) + "</tr></thead>")
	html.append("<tbody>")
	for row in table[1:]:
		html.append("<tr><th>%s</th>" % row[0] + "".join(("<td>%s</td>" % cell) for cell in row[1:]) + "</tr>")
	html.append("</tbody>")
	html.append("</table>")
	return "\n".join(html)

def text_table(table):
	row_widths = []
	for row in table:
		for index, cell in enumerate(row):
			if len(row_widths) <= index:
				row_widths += [0] * (index + 1 - len(row_widths))
			row_widths[index] = max(row_widths[index], len(unicode(cell)))
	text = []
	for row in table:
		text.append(' '.join((('%' + str(row_widths[index]) + 's') % cell) for index, cell in enumerate(row)))
	return "\n".join(text)

You can also download the whole Python script.

Finally, here is my result:

Project C/C++ CSS HTML JavaScript PHP Pascal Python Total
Auro Kubelka 599 19 13350 13968
Exkursionsbauernhoefe 192 3357 3549
Mathics 827 489 1971 24215 27502
oekosozialmarkt.com 3019 9201 2685 24443 39348
Rindfleischfest 240 966 1206
Stanzoptimierung 7976 7976
tripedia.org 1881 7542 4117 17590 31130
BatchWorks 8325 8325
livelynet.net 296 3379 357 8409 12441
Mathador 8857 343 9200
Preisdetektiv 119 277 728 952 2076
RLS-Info 362 23098 325 23785
Total 16833 7535 44329 10202 17673 8325 75609 180506


Over 180,000 lines of code written so far! And that includes only most of my major projects. Smaller stuff written for university assignments (especially much Mathematica code), small tools and tests are not included in this statistic.

Stanzoptimierung

Finished the optimization project I’ve worked on the last few weeks: Together with Eranda Dragoti-Cela, Elisabeth Gassner, Johannes Hatzl and Bettina Klinz I created a C++ program that optimizes the configuration and punching plan of a punching machine. The machine will start to operate in a Polish factory this summer. Below are some pictures of it.

Punching machine 1

Punching machine 2

Punching machine 3

Rindfleischfest

The homepage about the Styrian “Rindfleischfest” for the Chamber of Agriculture is finished and online under www.rindfleischfest.at. It’s a rather simple, but beautiful and hopefully informative PHP website presenting the schedule of the festival and various companies and organisations participating.

Rindfleischfest

New design

Right on time for Christmas, tripedia.org has got its new design and layout! Looks quite sexy, doesn’t it? ;-)

To illustrate the development of tripedia’s style, there are some screenshots from October 2007, March and November 2008, and now.


(Oct 07)


(March 08)


(Nov 08)


(Dec 08)

Moreover, I’ve completely rewritten large parts of the code and “refactored” many features. There are also some new features, e.g. a small map for every photo where you can specify where it has been taken. This particular feature will be extended very much in the future.