I just asked myself the question, how much code have I written so far in my life? I wanted to break it down into individual projects and programming languages. A shell script using wc -l could have done it, but I decided to write a short script in my “mother tongue” Python.
To setup projects and programming languages, the following variables are used:
PROJECTS = {
'tripedia.org': '/Users/Jan/Projekte/tripedia.org/tripedia',
'-BatchWorks': ['/Users/Jan/Projekte/BatchWorks/Src', '/Users/Jan/Projekte/BatchWorks/Components'],
'-Mathador': '/Users/Jan/Projekte/Mathador',
}
EXCLUDE_DIRS = set(['_Kopien_', 'prototype', 'scriptaculous', 'innerdom', 'livepipe', 'gmapsutil', 'TBP', ])
EXCLUDE_FILES = set(['prototype.js', 'carousel.js', 'printf.js'])
TYPES = {
'Pascal': 'pas',
'C/C++': ['h', 'hpp', 'cpp', 'c'],
'HTML': ['html', 'htm', 'xhtml', 'xhtm'],
'CSS': 'css',
'Python': 'py',
'Java': 'java',
'JavaScript': 'js',
'PHP': 'php',
}
A minus at the beginning of a project name indicates that this is an old, discontinued project. An inverse dictionary of file types to programming languages is built using the following code:
def flatten(seq):
""" flattens lists and sequences to one dimension """
if isinstance(seq, (list, types.GeneratorType)):
for item in seq:
for sub_item in flatten(item):
yield sub_item
else:
yield seq
TYPES_INV = dict(flatten(((ext, type) for ext in ([exts] if isinstance(exts, basestring) else exts)) for type, exts in TYPES.iteritems()))
The core of the script is the following loop over projects, directories, and files:
result = {}
for project, dirs in PROJECTS.iteritems():
if isinstance(dirs, basestring):
dirs = [dirs]
lines = {}
for dir in dirs:
for dirpath, dirnames, filenames in os.walk(dir):
for exclude in EXCLUDE_DIRS:
try:
dirnames.remove(exclude)
except ValueError:
pass
for filename in filenames:
if filename not in EXCLUDE_FILES:
basename, ext = os.path.splitext(filename)
ext = ext[1:] # remove leading '.'
type = TYPES_INV.get(ext)
if type is not None:
lc = linecount(os.path.join(dirpath, filename))
inc_dict(lines, type, lc)
result[project] = lines
The actual line counting for a single file is done by the following very simple function:
def linecount(filename):
""" determine the number of lines in a file """
return sum(1 for line in open(filename, 'r'))
There are probably faster methods, but this is just very easy and “pythonic”, I think. The function inc_dict simply increments a dictionary entry:
def inc_dict(d, k, i):
""" increment the dictionary entry d[k] by i, or set it to i if not present already """
try:
d[k] += i
except KeyError:
d[k] = i
Finally, a table with all the line counts is produced:
types_total = dict((type, sum(lines.get(type, 0) for project, lines in result.iteritems())) for type in TYPES)
types = sorted(type for type, count in types_total.iteritems() if count > 0)
table = []
table.append(["Project"] + types + ["Total"])
for project, lines in sorted(result.iteritems(), key=lambda (p, l): project_key(p)):
table.append([project.lstrip('-')] + [lines.get(type, "") for type in types] + [sum(lines.values())])
table.append(["Total"] + [types_total[type] for type in types] + [sum(types_total.values())])
print html_table(table)
print text_table(table)
The functions for outputting the table in HTML and text format are also pretty simple:
def html_table(table):
html = ["<table>"]
html.append("<thead><tr>" + "".join(("<th>%s</th>" % cell) for cell in table[0]) + "</tr></thead>")
html.append("<tbody>")
for row in table[1:]:
html.append("<tr><th>%s</th>" % row[0] + "".join(("<td>%s</td>" % cell) for cell in row[1:]) + "</tr>")
html.append("</tbody>")
html.append("</table>")
return "\n".join(html)
def text_table(table):
row_widths = []
for row in table:
for index, cell in enumerate(row):
if len(row_widths) <= index:
row_widths += [0] * (index + 1 - len(row_widths))
row_widths[index] = max(row_widths[index], len(unicode(cell)))
text = []
for row in table:
text.append(' '.join((('%' + str(row_widths[index]) + 's') % cell) for index, cell in enumerate(row)))
return "\n".join(text)
You can also download the whole Python script.
Finally, here is my result:
| Project |
C/C++ |
CSS |
HTML |
JavaScript |
PHP |
Pascal |
Python |
Total |
| Auro Kubelka |
|
599 |
|
19 |
13350 |
|
|
13968 |
| Exkursionsbauernhoefe |
|
192 |
|
|
3357 |
|
|
3549 |
| Mathics |
|
827 |
489 |
1971 |
|
|
24215 |
27502 |
| oekosozialmarkt.com |
|
3019 |
9201 |
2685 |
|
|
24443 |
39348 |
| Rindfleischfest |
|
240 |
|
|
966 |
|
|
1206 |
| Stanzoptimierung |
7976 |
|
|
|
|
|
|
7976 |
| tripedia.org |
|
1881 |
7542 |
4117 |
|
|
17590 |
31130 |
| BatchWorks |
|
|
|
|
|
8325 |
|
8325 |
| livelynet.net |
|
296 |
3379 |
357 |
|
|
8409 |
12441 |
| Mathador |
8857 |
|
343 |
|
|
|
|
9200 |
| Preisdetektiv |
|
119 |
277 |
728 |
|
|
952 |
2076 |
| RLS-Info |
|
362 |
23098 |
325 |
|
|
|
23785 |
| Total |
16833 |
7535 |
44329 |
10202 |
17673 |
8325 |
75609 |
180506 |
Over 180,000 lines of code written so far! And that includes only most of my major projects. Smaller stuff written for university assignments (especially much Mathematica code), small tools and tests are not included in this statistic.