I just asked myself the question, how much code have I written so far in my life? I wanted to break it down into individual projects and programming languages. A shell script using wc -l could have done it, but I decided to write a short script in my “mother tongue” Python.
To setup projects and programming languages, the following variables are used:
PROJECTS = {
'tripedia.org': '/Users/Jan/Projekte/tripedia.org/tripedia',
'-BatchWorks': ['/Users/Jan/Projekte/BatchWorks/Src', '/Users/Jan/Projekte/BatchWorks/Components'],
'-Mathador': '/Users/Jan/Projekte/Mathador',
}
EXCLUDE_DIRS = set(['_Kopien_', 'prototype', 'scriptaculous', 'innerdom', 'livepipe', 'gmapsutil', 'TBP', ])
EXCLUDE_FILES = set(['prototype.js', 'carousel.js', 'printf.js'])
TYPES = {
'Pascal': 'pas',
'C/C++': ['h', 'hpp', 'cpp', 'c'],
'HTML': ['html', 'htm', 'xhtml', 'xhtm'],
'CSS': 'css',
'Python': 'py',
'Java': 'java',
'JavaScript': 'js',
'PHP': 'php',
}
A minus at the beginning of a project name indicates that this is an old, discontinued project. An inverse dictionary of file types to programming languages is built using the following code:
def flatten(seq): """ flattens lists and sequences to one dimension """ if isinstance(seq, (list, types.GeneratorType)): for item in seq: for sub_item in flatten(item): yield sub_item else: yield seq TYPES_INV = dict(flatten(((ext, type) for ext in ([exts] if isinstance(exts, basestring) else exts)) for type, exts in TYPES.iteritems()))
The core of the script is the following loop over projects, directories, and files:
result = {}
for project, dirs in PROJECTS.iteritems():
if isinstance(dirs, basestring):
dirs = [dirs]
lines = {}
for dir in dirs:
for dirpath, dirnames, filenames in os.walk(dir):
for exclude in EXCLUDE_DIRS:
try:
dirnames.remove(exclude)
except ValueError:
pass
for filename in filenames:
if filename not in EXCLUDE_FILES:
basename, ext = os.path.splitext(filename)
ext = ext[1:] # remove leading '.'
type = TYPES_INV.get(ext)
if type is not None:
lc = linecount(os.path.join(dirpath, filename))
inc_dict(lines, type, lc)
result[project] = lines
The actual line counting for a single file is done by the following very simple function:
def linecount(filename): """ determine the number of lines in a file """ return sum(1 for line in open(filename, 'r'))
There are probably faster methods, but this is just very easy and “pythonic”, I think. The function inc_dict simply increments a dictionary entry:
def inc_dict(d, k, i): """ increment the dictionary entry d[k] by i, or set it to i if not present already """ try: d[k] += i except KeyError: d[k] = i
Finally, a table with all the line counts is produced:
types_total = dict((type, sum(lines.get(type, 0) for project, lines in result.iteritems())) for type in TYPES)
types = sorted(type for type, count in types_total.iteritems() if count > 0)
table = []
table.append(["Project"] + types + ["Total"])
for project, lines in sorted(result.iteritems(), key=lambda (p, l): project_key(p)):
table.append([project.lstrip('-')] + [lines.get(type, "") for type in types] + [sum(lines.values())])
table.append(["Total"] + [types_total[type] for type in types] + [sum(types_total.values())])
print html_table(table)
print text_table(table)
The functions for outputting the table in HTML and text format are also pretty simple:
def html_table(table):
html = ["<table>"]
html.append("<thead><tr>" + "".join(("<th>%s</th>" % cell) for cell in table[0]) + "</tr></thead>")
html.append("<tbody>")
for row in table[1:]:
html.append("<tr><th>%s</th>" % row[0] + "".join(("<td>%s</td>" % cell) for cell in row[1:]) + "</tr>")
html.append("</tbody>")
html.append("</table>")
return "\n".join(html)
def text_table(table):
row_widths = []
for row in table:
for index, cell in enumerate(row):
if len(row_widths) <= index:
row_widths += [0] * (index + 1 - len(row_widths))
row_widths[index] = max(row_widths[index], len(unicode(cell)))
text = []
for row in table:
text.append(' '.join((('%' + str(row_widths[index]) + 's') % cell) for index, cell in enumerate(row)))
return "\n".join(text)
You can also download the whole Python script.
Finally, here is my result:
Project
C/C++
CSS
HTML
JavaScript
PHP
Pascal
Python
Total
Auro Kubelka
599
19
13350
13968
Exkursionsbauernhoefe
192
3357
3549
Mathics
827
489
1971
24215
27502
oekosozialmarkt.com
3019
9201
2685
24443
39348
Rindfleischfest
240
966
1206
Stanzoptimierung
7976
7976
tripedia.org
1881
7542
4117
17590
31130
BatchWorks
8325
8325
livelynet.net
296
3379
357
8409
12441
Mathador
8857
343
9200
Preisdetektiv
119
277
728
952
2076
RLS-Info
362
23098
325
23785
Total
16833
7535
44329
10202
17673
8325
75609
180506
Over 180,000 lines of code written so far! And that includes only most of my major projects. Smaller stuff written for university assignments (especially much Mathematica code), small tools and tests are not included in this statistic.
