Code line count

I just asked myself the question, how much code have I written so far in my life? I wanted to break it down into individual projects and programming languages. A shell script using wc -l could have done it, but I decided to write a short script in my “mother tongue” Python.

To setup projects and programming languages, the following variables are used:

	'': '/Users/Jan/Projekte/',
	'-BatchWorks': ['/Users/Jan/Projekte/BatchWorks/Src', '/Users/Jan/Projekte/BatchWorks/Components'],
	'-Mathador': '/Users/Jan/Projekte/Mathador',

EXCLUDE_DIRS = set(['_Kopien_', 'prototype', 'scriptaculous', 'innerdom', 'livepipe', 'gmapsutil', 'TBP', ])
EXCLUDE_FILES = set(['prototype.js', 'carousel.js', 'printf.js'])

	'Pascal': 'pas',
	'C/C++': ['h', 'hpp', 'cpp', 'c'],
	'HTML': ['html', 'htm', 'xhtml', 'xhtm'],
	'CSS': 'css',
	'Python': 'py',
	'Java': 'java',
	'JavaScript': 'js',
	'PHP': 'php',

A minus at the beginning of a project name indicates that this is an old, discontinued project. An inverse dictionary of file types to programming languages is built using the following code:

def flatten(seq):
	""" flattens lists and sequences to one dimension """

	if isinstance(seq, (list, types.GeneratorType)):
		for item in seq:
			for sub_item in flatten(item):
				yield sub_item
		yield seq

TYPES_INV = dict(flatten(((ext, type) for ext in ([exts] if isinstance(exts, basestring) else exts)) for type, exts in TYPES.iteritems()))

The core of the script is the following loop over projects, directories, and files:

result = {}
for project, dirs in PROJECTS.iteritems():
	if isinstance(dirs, basestring):
		dirs = [dirs]
	lines = {}
	for dir in dirs:
		for dirpath, dirnames, filenames in os.walk(dir):
			for exclude in EXCLUDE_DIRS:
				except ValueError:
			for filename in filenames:
				if filename not in EXCLUDE_FILES:
					basename, ext = os.path.splitext(filename)
					ext = ext[1:]	# remove leading '.'
					type = TYPES_INV.get(ext)
					if type is not None:
						lc = linecount(os.path.join(dirpath, filename))
						inc_dict(lines, type, lc)
result[project] = lines

The actual line counting for a single file is done by the following very simple function:

def linecount(filename):
	""" determine the number of lines in a file """

	return sum(1 for line in open(filename, 'r'))

There are probably faster methods, but this is just very easy and “pythonic”, I think. The function inc_dict simply increments a dictionary entry:

def inc_dict(d, k, i):
	""" increment the dictionary entry d[k] by i, or set it to i if not present already """

		d[k] += i
	except KeyError:
		d[k] = i

Finally, a table with all the line counts is produced:

types_total = dict((type, sum(lines.get(type, 0) for project, lines in result.iteritems())) for type in TYPES)
types = sorted(type for type, count in types_total.iteritems() if count > 0)
table = []
table.append(["Project"] + types + ["Total"])
for project, lines in sorted(result.iteritems(), key=lambda (p, l): project_key(p)):
	table.append([project.lstrip('-')] + [lines.get(type, "") for type in types] + [sum(lines.values())])
table.append(["Total"] + [types_total[type] for type in types] + [sum(types_total.values())])
print html_table(table)
print text_table(table)

The functions for outputting the table in HTML and text format are also pretty simple:

def html_table(table):
	html = ["<table>"]
	html.append("<thead><tr>" + "".join(("<th>%s</th>" % cell) for cell in table[0]) + "</tr></thead>")
	for row in table[1:]:
		html.append("<tr><th>%s</th>" % row[0] + "".join(("<td>%s</td>" % cell) for cell in row[1:]) + "</tr>")
	return "\n".join(html)

def text_table(table):
	row_widths = []
	for row in table:
		for index, cell in enumerate(row):
			if len(row_widths) <= index:
				row_widths += [0] * (index + 1 - len(row_widths))
			row_widths[index] = max(row_widths[index], len(unicode(cell)))
	text = []
	for row in table:
		text.append(' '.join((('%' + str(row_widths[index]) + 's') % cell) for index, cell in enumerate(row)))
	return "\n".join(text)

You can also download the whole Python script.

Finally, here is my result:

Project C/C++ CSS HTML JavaScript PHP Pascal Python Total
Auro Kubelka 599 19 13350 13968
Exkursionsbauernhoefe 192 3357 3549
Mathics 827 489 1971 24215 27502 3019 9201 2685 24443 39348
Rindfleischfest 240 966 1206
Stanzoptimierung 7976 7976 1881 7542 4117 17590 31130
BatchWorks 8325 8325 296 3379 357 8409 12441
Mathador 8857 343 9200
Preisdetektiv 119 277 728 952 2076
RLS-Info 362 23098 325 23785
Total 16833 7535 44329 10202 17673 8325 75609 180506

Over 180,000 lines of code written so far! And that includes only most of my major projects. Smaller stuff written for university assignments (especially much Mathematica code), small tools and tests are not included in this statistic.

Blog continued

This is my first blog entry from Stockholm and the first one since a long time ago. I still have no plans to maintain a personal blog about my experience here. However, I do write a diary so that I don’t forget the many things that happen. So if you want to know what happens in Stockholm, just ask me!

I definitely want to blog more in the future—probably not so much about personal stuff, but rather techie things. So expect more soon!