Using Graphify and NetworkX to Map Python Codebase Structure with God Nodes, Communities, and Architecture Visualizations

In this tutorial, we build a fully offline Graphify workflow that turns a realistic multi-module Python application into a knowledge graph. We start by installing Graphify and supporting graph libraries, then generate a small but connected sample application with configuration, database, authentication, service, API, cache, model, and SQL layers. We extract the graph locally using Graphify’s tree-sitter-based analysis, so we do not need an API key or any LLM backend. After loading the generated graph.json into NetworkX, we analyze the codebase’s structure using file types, relationship types, centrality scores, community detection, and shortest paths among important symbols. Also, we create both static and interactive visualizations, making it easier to understand how modules, classes, functions, and database objects connect across the project.

Installing Graphify and NetworkX

import subprocess, sys
def pip(*pkgs):
subprocess.run([sys.executable, “-m”, “pip”, “install”, “-q”, *pkgs], check=False)
pip(“graphifyy[sql]”, “pyvis”, “networkx”, “matplotlib”)
import os, json, glob, textwrap, warnings
import networkx as nx
import matplotlib.pyplot as plt
warnings.filterwarnings(“ignore”)

We install Graphify along with the graph analysis and visualization libraries needed for the tutorial. We import the required Python modules, including NetworkX for graph processing and Matplotlib for static plotting. We also suppress unnecessary warnings so the notebook output stays clean and focused.

Building the Sample Codebase

ROOT = “sample_app”
os.makedirs(ROOT, exist_ok=True)
FILES = {
“config.py”: ”’
# Central settings object — used everywhere (expect this to be a “god node”).
class Settings:
def __init__(self):
self.db_dsn = “postgresql://localhost/app”
self.jwt_secret = “change-me”
self.rate_limit = 100
settings = Settings()
”’,
“database.py”: ”’
from config import settings
class DatabasePool:
“””Connection pool. WHY: reuse sockets instead of reconnecting per query.”””
def __init__(self, dsn):
self.dsn = dsn
self._conns = []
def acquire(self):
return {“dsn”: self.dsn}
pool = DatabasePool(settings.db_dsn)
def get_connection():
return pool.acquire()
”’,
“models.py”: ”’
class User:
def __init__(self, user_id, email):
self.user_id = user_id
self.email = email
class Session:
def __init__(self, user, token):
self.user = user
self.token = token
”’,
“cache.py”: ”’
from config import settings
class RateLimiter:
# NOTE: naive in-memory limiter; swap for Redis in prod.
def __init__(self, limit):
self.limit = limit
self.hits = {}
def allow(self, key):
self.hits[key] = self.hits.get(key, 0) + 1
return self.hits[key] <= self.limit
limiter = RateLimiter(settings.rate_limit)
”’,
“auth.py”: ”’
from config import settings
from database import get_connection
from models import User, Session
def hash_password(raw):
return f”hashed::{raw}”
def verify_password(raw, hashed):
return hash_password(raw) == hashed
class AuthService:
def __init__(self):
self.secret = settings.jwt_secret
def login(self, email, password):
conn = get_connection()
user = User(user_id=1, email=email)
return Session(user=user, token=self.secret + email)
”’,
“services.py”: ”’
from database import get_connection
from models import User
from auth import AuthService
class UserService:
def __init__(self):
self.auth = AuthService()
def register(self, email, password):
conn = get_connection()
return User(user_id=2, email=email)
def authenticate(self, email, password):
return self.auth.login(email, password)
”’,
“api.py”: ”’
from cache import limiter
from services import UserService
from auth import verify_password
svc = UserService()
def signup_route(email, password):
if not limiter.allow(email):
return {“error”: “rate limited”}
return svc.register(email, password)
def login_route(email, password):
if not limiter.allow(email):
return {“error”: “rate limited”}
return svc.authenticate(email, password)
”’,
“main.py”: ”’
from api import signup_route, login_route
from database import pool
def run():
signup_route(“a@x.com”, “pw”)
return login_route(“a@x.com”, “pw”)
if __name__ == “__main__”:
run()
”’,
“schema.sql”: ”’
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL
);
CREATE TABLE sessions (
token TEXT PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(user_id)
);
CREATE VIEW active_sessions AS
SELECT s.token, u.email
FROM sessions s JOIN users u ON s.user_id = u.user_id;
”’,
}
for name, body in FILES.items():
with open(os.path.join(ROOT, name), “w”) as f:
f.write(textwrap.dedent(body).lstrip())
print(f”Wrote {len(FILES)} files to ./{ROOT}/”)

We create a realistic sample application with multiple Python modules and one SQL schema file. We design the files to include meaningful cross-module relationships, such as imports, function calls, service dependencies, authentication logic, database access, and rate limiting. We then write all these files to a local sample_app directory, giving Graphify a complete mini-codebase to analyze.

Extracting the Knowledge Graph

res = subprocess.run(
[sys.executable, “-m”, “graphify”, “extract”, ROOT, “–no-cluster”],
capture_output=True, text=True
)
print(res.stdout[-1500:] or res.stderr[-1500:])
graph_paths = glob.glob(“**/graph.json”, recursive=True)
assert graph_paths, “graph.json not found — check the extract output above.”
GRAPH_JSON = sorted(graph_paths, key=os.path.getmtime)[-1]
print(“Graph file:”, GRAPH_JSON)
def load_graphify(path):
data = json.load(open(path))
ekey = “links” if “links” in data else (“edges” if “edges” in data else None)
G = nx.DiGraph() if data.get(“directed”) else nx.Graph()
for n in data.get(“nodes”, []):
nid = n.get(“id”)
G.add_node(nid, **{k: v for k, v in n.items() if k != “id”})
for e in data.get(ekey or “links”, []):
G.add_edge(e.get(“source”), e.get(“target”),
**{k: v for k, v in e.items() if k not in (“source”, “target”)})
G.graph.update(data.get(“graph”, {}))
return G
G = load_graphify(GRAPH_JSON)
UG = G.to_undirected()
print(f”nGraph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges”)
def label(n):
return G.nodes[n].get(“label”, str(n))

We run Graphify locally on the generated application and extract the project knowledge graph without using any API key or LLM backend. We locate the generated graph.json file and load it into NetworkX using a version-proof node-link loader. We then convert the graph into an undirected form for easier structural analysis and define a helper function to display readable node labels.

Analyzing Centrality and Communities

from collections import Counter
ftypes = Counter(d.get(“file_type”, “?”) for _, d in G.nodes(data=True))
rels = Counter(d.get(“relation”, “?”) for *_ , d in G.edges(data=True))
conf = Counter(d.get(“confidence”, “?”) for *_ , d in G.edges(data=True))
print(“nNodes by file_type :”, dict(ftypes))
print(“Edges by relation :”, dict(rels))
print(“Edges by confidence:”, dict(conf))
deg = nx.degree_centrality(UG)
btw = nx.betweenness_centrality(UG)
print(“nTop ‘god nodes’ by degree centrality:”)
for n, c in sorted(deg.items(), key=lambda x: -x[1])[:8]:
print(f” {label(n):<22} deg={c:.3f} betweenness={btw.get(n,0):.3f}”)
try:
communities = nx.community.louvain_communities(UG, seed=42)
except Exception:
communities = list(nx.community.greedy_modularity_communities(UG))
node_comm = {n: i for i, com in enumerate(communities) for n in com}
print(f”nDetected {len(communities)} communities:”)
for i, com in enumerate(communities):
members = “, “.join(sorted(label(n) for n in com))[:90]
print(f” Community {i}: {members}”)
def find(substr):
for n in G.nodes:
if substr.lower() in label(n).lower():
return n
return None
a, b = find(“api”), find(“DatabasePool”)
if a and b and nx.has_path(UG, a, b):
path = nx.shortest_path(UG, a, b)
print(f”nPath {label(a)} -> {label(b)}:”)
print(” ” + ” → “.join(label(p) for p in path))

We analyze the extracted graph by summarizing node types, edge relationships, and confidence levels. We compute degree centrality and betweenness centrality to identify important “god nodes” that connect many parts of the application. We also detect communities in the graph and trace a shortest path between key components to understand how parts of the codebase are connected.

Visualizing the Code Graph

plt.figure(figsize=(13, 9))
pos = nx.spring_layout(UG, k=0.7, seed=42)
nx.draw_networkx_edges(UG, pos, alpha=0.25)
nx.draw_networkx_nodes(
UG, pos,
node_color=[node_comm.get(n, 0) for n in UG.nodes],
node_size=[300 + 4000 * deg.get(n, 0) for n in UG.nodes],
cmap=plt.cm.tab20, alpha=0.9,
)
top = {n for n, _ in sorted(deg.items(), key=lambda x: -x[1])[:14]}
nx.draw_networkx_labels(UG, pos, {n: label(n) for n in top}, font_size=8)
plt.title(“Graphify knowledge graph — size=centrality, color=community”)
plt.axis(“off”); plt.tight_layout()
plt.savefig(“graph_static.png”, dpi=130); plt.show()
try:
from pyvis.network import Network
net = Network(height=”650px”, width=”100%”, bgcolor=”#111″, font_color=”white”,
notebook=True, cdn_resources=”in_line”, directed=G.is_directed())
palette = [“#e6194B”,”#3cb44b”,”#4363d8″,”#f58231″,”#911eb4″,
“#42d4f4″,”#f032e6″,”#bfef45″,”#fabed4″,”#469990”]
for n, d in G.nodes(data=True):
c = node_comm.get(n, 0)
net.add_node(n, label=label(n), title=f”{d.get(‘file_type’,’?’)} · {d.get(‘source_file’,”)}”,
color=palette[c % len(palette)], size=12 + 60 * deg.get(n, 0))
for s, t, d in G.edges(data=True):
net.add_edge(s, t, title=d.get(“relation”, “”))
net.save_graph(“graph_interactive.html”)
print(“nSaved interactive graph -> graph_interactive.html”)
from IPython.display import HTML, display
display(HTML(open(“graph_interactive.html”).read()))
except Exception as e:
print(“Interactive viz skipped:”, e)
for cmd in (
[“query”, “what connects auth to the database?”, “–graph”, GRAPH_JSON],
[“path”, “AuthService”, “DatabasePool”, “–graph”, GRAPH_JSON],
[“explain”, “RateLimiter”, “–graph”, GRAPH_JSON],
):
print(“n$ graphify ” + ” “.join(cmd))
r = subprocess.run([sys.executable, “-m”, “graphify”, *cmd],
capture_output=True, text=True)
print((r.stdout or r.stderr)[:1200])
print(“nDone. Artifacts: graph_static.png, graph_interactive.html,”,
“and graphify-out/ (graph.json, GRAPH_REPORT.md).”)

We visualize the knowledge graph using both static and interactive methods. We first create a Matplotlib graph where node size represents centrality and node color represents community membership. We then build an interactive Pyvis visualization and run Graphify’s CLI commands to query the graph, find paths, and explain selected symbols.

Conclusion

In conclusion, we have a complete local pipeline for converting source code into a useful knowledge graph and studying it with graph analytics. We saw how Graphify extracts meaningful relationships from a Python and SQL codebase, and we use NetworkX to identify central “god nodes,” detect communities, and trace paths between components such as authentication and database logic. We also generated visual outputs that help us inspect the architecture from both a high-level and interactive perspective. This workflow gives us a pathway to reason about code structure, dependency flow, architectural hotspots, and cross-file connections without relying on external APIs, making it useful for codebase exploration, documentation, refactoring, and software architecture analysis.

Check out the Full Codes hereAlso, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post Using Graphify and NetworkX to Map Python Codebase Structure with God Nodes, Communities, and Architecture Visualizations appeared first on MarkTechPost.

By

Leave a Reply

Your email address will not be published. Required fields are marked *