deepcode
by
hkuds

Description: "DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

View hkuds/deepcode on GitHub ↗

Summary Information

Updated 30 minutes ago
Added to GitGenius on August 31st, 2025
Created on May 14th, 2025
Open Issues/Pull Requests: 36 (+0)
Number of forks: 1,963
Total Stargazers: 14,621 (+1)
Total Subscribers: 134 (+0)
Detailed Description

DeepCode is an open-source static analysis tool developed by HKUST (Hong Kong University of Science and Technology) designed to detect potential bugs and vulnerabilities in C/C++ code. Unlike traditional static analyzers that rely heavily on predefined rules, DeepCode leverages deep learning, specifically graph neural networks (GNNs), to learn patterns from a large corpus of code and identify issues with higher accuracy and fewer false positives. The core idea is to represent code as a graph, where nodes represent code elements (variables, functions, statements) and edges represent relationships between them (data flow, control flow, call graphs). This graph representation allows the GNN to capture complex code semantics.

The repository provides the source code for both the training and inference components of DeepCode. The training pipeline involves collecting a large dataset of C/C++ code, labeling bugs and vulnerabilities (often using publicly available datasets like SARD), and training a GNN model to predict the likelihood of a bug given a code graph. Key components include data preprocessing scripts to extract code graphs, a GNN model implementation (typically using PyTorch), and training/evaluation scripts. The models are trained to identify a variety of bug types, including memory leaks, null pointer dereferences, buffer overflows, and security vulnerabilities. A significant aspect of the project is the ongoing effort to expand the training dataset and improve the model's generalization ability.

The inference component takes C/C++ source code as input, constructs the corresponding code graph, and uses the trained GNN model to predict potential bugs. It outputs a list of identified issues, along with their locations in the code and a confidence score. The repository includes tools for parsing C/C++ code (using clang), building the code graph, and running the inference engine. DeepCode aims to be integrated into existing development workflows, potentially as a pre-commit hook, CI/CD pipeline step, or IDE extension. The project also provides example usage scenarios and documentation to help users get started.

A distinguishing feature of DeepCode is its focus on *learning* bug patterns rather than relying on manually crafted rules. This approach allows it to detect novel bugs that might not be covered by traditional rule-based analyzers. However, it also means that DeepCode requires a substantial amount of training data and computational resources. The repository includes scripts for reproducing the training process, but it can be resource-intensive. Furthermore, the performance of DeepCode is heavily dependent on the quality and diversity of the training data.

The project is actively maintained and includes contributions from researchers and developers. The repository contains detailed documentation, including instructions for installation, usage, and training. It also provides a platform for community contributions, such as bug reports, feature requests, and code improvements. Future development directions include improving the model's accuracy, expanding the range of supported bug types, and optimizing the inference speed. The ultimate goal is to create a robust and reliable static analysis tool that can help developers write more secure and bug-free C/C++ code.

deepcode
by
hkudshkuds/deepcode

Repository Details

Fetching additional details & charts...