Hey there,
I have been a hobbyist programmer for quite some years and have a few smaller projects under my belt: mostly smaller GUI applications that have a few classes at maximum, make use of one or two external libraries and are very thoroughly documented and commented.
Since I love the free software movement and philosophy, I wanted to start contributing to projects I like and help them out.
The thing is, the jump from “hobbyist” to “being able to understand super-efficient compact established repos”… seems to be very hard?
Like, looking into some of these projects, I see dozens upon dozens of classes, header files, with most of them being totally oblique to me. They use syntactic constructs I cannot decipher very well because they have been optimized to irrecognizability, sometimes I cannot even find the starting point of a program properly. The code bases are decades old, use half the obscure compiler and language features, and the maintainers seem to be intimately familiar with everything to the point where I don’t even know what’s what or where to start. My projects were usually like four source files or so, not massive repositories with hundreds of scattered files, external configurations, edge cases, factories of factories, and so on.
If I want to change a simple thing like a placement of a button or - god knows! - introduce a new feature, I would not even remotely know where to start.
Is it just an extreme difficulty spike at this point that I have to trial-and-error through, or am I doing anything wrong?
I’ve been working with software for 15 years and still feel like this when faced with a new codebase - it simply doesn’t want to make sense to me. As others have stated, codebases are living things, and are as much a map of previous developers minds as the are about being functional. The older a project is, the more convoluted and obscure the structure becomes due to changes, adaptations, new features and changing contributors.
Some developers seem to enjoy making their code obscenely difficult to understand, either because it actually makes sense to them that way, or because it makes them feel smarter. These projects are better left alone for the sake of your own sanity. If you encounter dozens of header files, walk away. C (or C++) are high performance languages, and projects are using that language for a reason. If you have no experience with them, the result is very unlikely to make any sense to you.
I’ve also found it quite difficult to find any project small enough to help on. The large projects have many contributors, and any manageable bugs are quickly fixed, leaving only the stuff that no one wants to touch.
Is there some sort of hobby you enjoy, where an open source tool is (or could be) used? The more obscure the better! Having some prior understanding of the subject usually makes understanding the codebase a little easier.
Be wary about this mindset. This type of explanation sets you up for conflicts with existing developers. Several times I’ve seen developers coming into a team and complain about the code, creating conflicts that can last the entire working relationship for no good reason.
Much of the time the people who constantly work with code are already aware of the problems and may not be happy with it, but there’s no time or big benefit in improving working code. Or it’s complicated for good reasons which may not be immediately apparent. (ie. inherent complexity).
Here are a couple of benign reasons which probably will serve you much better.
It’s much more difficult and time consuming to make code that is easy to understand. Even in open source, there’s a limited amount of time to spend on any particular thing. This explanation is like a variation of Twain’s “I didn’t have time to write a short letter, so I wrote a long one instead.”, or more abrasively Hanlon’s razor “Never attribute to malice that which is adequately explained by
stupiditytime pressure”.When writing the code, the developer has the entire context of his thought process available. You don’t have that, and that’s also the reason why your own code can make no sense a while later. Also it’s just much harder to read code than to write it.
And sometimes coding habits are obtuse to people with different coding habits. These habits aren’t bad per service, but can be difficult to grok.
While I agree with all of the above in principle (and even I have trouble reading my own code at times), this part was specifically in response to the section about ‘code optimized to irrecognizability’ and should not be taken as a general statement on finding other people’s code incomprehensible. Deliberately using non-descriptive naming is unfortunately a thing, although thankfully I rarely seem to encounter it anymore.