How to approach an unfamiliar codebase
When you see a new codebase โ whether it's AI-generated or inherited from a colleague โ there's a systematic way to read it that takes 10 minutes and gives you 80% of the picture.
Step 1: Read the entry point first
Every program has an entry point โ the place where execution begins. In Python, it's usually the if __name__ == "__main__": block, or the main API file in a web service. Start there. Don't read random files โ follow the flow from where it starts.
Step 2: Trace the data
Pick one key piece of data โ a user ID, a flight record, a request body โ and follow it through the system. Where does it come in? What happens to it? Is it validated before use? Does it get stored? Where? This trace reveals the architecture and most of the important bugs.
Step 3: Look at the structure
How many files? How are they named? Is there clear separation of concerns (data access vs. business logic vs. presentation)? Or is everything mixed together in one 800-line file? The structure tells you whether the code will be maintainable.
Step 4: Read the tests
If there are tests, read them before the implementation. Tests tell you what the code is supposed to do โ they're better documentation than comments. If there are no tests, that's already a flag.
What to look for: the flaw checklist
This is the practical checklist you'll use in every code review. Build it into your mental model now.
๐ด Security: the highest-priority category
SQL injection โ any time user input is directly concatenated into a SQL string, the database is vulnerable to being completely compromised.
# โ SQL injection โ attacker can input "'; DROP TABLE users; --"
query = f"SELECT * FROM users WHERE email = '{user_input}'"
cursor.execute(query)
# โ Parameterized query โ input is treated as data, not code
cursor.execute("SELECT * FROM users WHERE email = %s", (user_input,))
Hardcoded secrets โ API keys, passwords, database credentials in the code. Any time you see a string that looks like sk-, Bearer , a connection string with a password, or anything labelled _key, _secret, _token with a real value, flag it immediately.
No authentication check โ an endpoint that takes a user ID and returns their data without checking that the caller is authorized to see that user's data. Classic: GET /users/{id}/profile with no auth check. Any logged-in user could see any other user's data.
Missing input validation โ any user-provided data that goes directly into business logic or a database without being validated for type, length, and allowed values.
# โ No validation โ what if age is -1? Or "drop table"? Or 999999?
def create_user(name: str, age: int) -> User:
return db.insert("users", name=name, age=age)
# โ Validate before using
def create_user(name: str, age: int) -> User:
if not name.strip():
raise ValidationError("name", "cannot be empty")
if not (0 <= age <= 150):
raise ValidationError("age", f"must be 0-150, got {age}")
return db.insert("users", name=name.strip(), age=age)
๐ Reliability: errors that cause silent failures
Bare except / eating errors โ from Module A1. Any except: pass or except Exception: return None without logging is hiding a problem. Future-you will spend hours debugging a failure that left no trace because someone ate the exception.
Resource leaks โ files, database connections, network sockets that are opened but might not be closed if an exception occurs.
# โ File won't be closed if process() raises an exception
f = open("data.txt")
data = f.read()
process(data) # if this raises, f.close() never runs
f.close()
# โ Context manager guarantees closure even on exception
with open("data.txt") as f:
data = f.read()
process(data) # f is closed whether or not this raises
Off-by-one errors โ boundary conditions: is it < or <=? Should the loop go to len(items) or len(items) - 1? Does the range start at 0 or 1? These are responsible for a disproportionate share of real bugs.
None/null dereference โ accessing an attribute or calling a method on something that might be None. Look for result.field where result could be None. Common in AI-generated code that doesn't handle the "not found" case.
# โ Crashes with AttributeError if find_user returns None
user = find_user(user_id)
print(user.name) # boom if user is None
# โ Check first
user = find_user(user_id)
if user is None:
raise NotFoundError("User", user_id)
print(user.name)
๐ก Design: code that works today but creates problems tomorrow
Magic strings and numbers โ bare strings or numbers in logic that have no obvious meaning. What does status == "a" mean? What does if score > 85 mean? Use Enums and named constants.
Functions doing too much โ a 200-line function that validates input, fetches from a database, applies business logic, formats a response, and logs everything. Each function should do one thing. When you see a function this long, ask: what are the natural split points?
Inconsistent error handling โ some functions return None on error, some raise exceptions, some return error strings. A codebase should have one consistent pattern โ pick one and stick to it.
Writing a useful code review
A code review that says "looks good ๐" teaches nothing. A review that lists 20 style nitpicks without addressing real issues is noise. Aim for something in between: a review that addresses the most important things, explains why they matter, and suggests a fix rather than just identifying the problem.
The priority order in a review:
- Correctness bugs โ code that will produce wrong results in certain inputs
- Security issues โ any of the patterns from the security section above
- Error handling โ silent failures, resource leaks
- Design issues โ things that will be painful to change in 3 months
- Readability โ naming, comments, structure. Important, but last.
Use "we" instead of "you." "We should validate the input here" not "you forgot to validate." Ask questions when unsure: "Could this be None if the user doesn't exist?" is better than "this will crash." Distinguish blocking issues ("this is a security bug, must fix") from suggestions ("this could be cleaner as a generator").
The bus factor test
Ask this about any piece of code: "If the person who wrote this got hit by a bus tomorrow, how long would it take someone else to understand it?" Good code minimises that time through clear naming, logical structure, and helpful comments where the "why" isn't obvious.
Apply this test to AI-generated code especially. AI produces code that works but often has poor naming (generic variable names like data, result, temp) and no explanation of non-obvious decisions.