Skip to content

feat: include_submodules option #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 3, 2025

Conversation

jpotw
Copy link
Contributor

@jpotw jpotw commented Jun 23, 2025

history: #221

Add support for Git submodules

This PR adds the ability to include and analyze git submodules when ingesting repositories, available through CLI.

Changes

CLI Interface

  • New --include-submodules flag that recursively clones all submodules
  • Defaults to false for backward compatibility
# Include submodules
gitingest https://github.com/user/repo --include-submodules

# Standard behavior (no submodules)
gitingest https://github.com/user/repo

Web UI for this option is not implemented for now (#313 (comment))

Web Interface

- Added "Include submodules" checkbox in repository form

- JavaScript handles state synchronization between form elements

Implementation

The --include-submodules CLI flag's value is propagated through the ingestion pipeline, affecting CloneConfig and IngestionQuery schemas. The core cloning logic has been updated to append --recurse-submodules to the git clone command when this flag is enabled.


Added # pylint: disable=too-many-instance-attributes to the CloneConfig class. This was done to address the R0902: Too many instance attributes warning, as all attributes are necessary and refactoring for fewer would increase unnecessary complexity.


Additionally introduced a helper function (_checkout_partial_clone) to reduce cyclomatic complexity in clone_repo function.
This change was made to resolve a ruff C901 complexity warning, after adding the include_submodules feature. 
Tried to keep the code changes minimal and non-intrusive.

ruff check...............................................................Failed
- hook id: ruff-check
- exit code: 1
warning: `incorrect-blank-line-before-class` (D203) and `no-blank-line-before-class` (D211) are incompatible. Ignoring `incorrect-blank-line-before-class`.
warning: `multi-line-summary-first-line` (D212) and `multi-line-summary-second-line` (D213) are incompatible. Ignoring `multi-line-summary-second-line`.
src/gitingest/clone.py:26:11: C901 `clone_repo` is too complex (11 > 10)
   |
25 | @async_timeout(DEFAULT_TIMEOUT)
26 | async def clone_repo(config: CloneConfig, token: str | None = None) -> None:
   |           ^^^^^^^^^^ C901
27 |     """Clone a repository to a local path based on the provided configuration.
   |
Found 1 error.

This solution is intended as a minimal yet temporary fix and can be revisited for further refactoring in the future if needed.

Testing

Tests have been updated to ensure proper handling and inclusion of submodules.

@jpotw
Copy link
Contributor Author

jpotw commented Jun 23, 2025

I also added an additional commit(fix: disable rate limiting in tests using environment detectionminor) to fix the rate limit exceeded errors, which I will explain in more detail below:

Problem

Integration tests were failing with 429 "Rate limit exceeded" errors:

❯ pytest tests
...
tests/test_flow_integration.py:265: AssertionError
---------------------------------------------------- Captured log call ----------------------------------------------------
WARNING  slowapi:extension.py:510 ratelimit 10 per 1 minute (testclient) exceeded at endpoint: /
================================================= short test summary info =================================================
FAILED tests/test_flow_integration.py::test_repository_with_submodules - AssertionError: Request failed: {"error":"Rate limit exceeded: 10 per 1 minute"}
FAILED tests/test_flow_integration.py::test_include_submodules_true_propagation - AssertionError: Form submission failed: {"error":"Rate limit exceeded: 10 per 1 minute"}
FAILED tests/test_flow_integration.py::test_include_submodules_false_propagation - AssertionError: Form submission failed: {"error":"Rate limit exceeded: 10 per 1 minute"}
============================================= 3 failed, 121 passed in 26.06s ==============================================

Root Cause

The problem was that new submodule integration tests increased total POST requests in the test suite from 10 to 13, exceeding the production rate limit of 10 requests/minute. All test requests appear to come from the same "testclient" IP address.

Solution

Added conditional rate limiter that disables limiting in test environments:

# In server/server_utils.py
if os.getenv("TESTING") or "pytest" in sys.modules:
    class NoOpLimiter:
        def limit(self, *args, **kwargs):
            return lambda func: func  # No-op decorator
    limiter = NoOpLimiter()
else:
    limiter = Limiter(key_func=get_remote_address)

collected 124 items                                                               

tests/query_parser/test_git_host_agnostic.py ...............                [ 12%]
tests/query_parser/test_query_parser.py ................................... [ 40%]
........                                                                    [ 46%]
tests/test_cli.py ....                                                      [ 50%]
tests/test_flow_integration.py .........                                    [ 57%]
tests/test_git_utils.py .................                                   [ 70%]
tests/test_ingestion.py ........                                            [ 77%]
tests/test_notebook_utils.py .........                                      [ 84%]
tests/test_repository_clone.py ...................                          [100%]

============================== 124 passed in 26.16s ===============================

All integration tests now pass reliably.

@jpotw
Copy link
Contributor Author

jpotw commented Jun 29, 2025

Hi @filipchristiansen. Just wanted to follow up on this PR. It's been about five days since the review, and I've incorporated your feedback into the latest commit. 80e818d Could you please let me know if there are any further changes required, or if it's ready for merge?

@filipchristiansen
Copy link
Contributor

Hi @jpotw — thanks for your work!

  • Checkbox: Including submodules is uncommon and typically push digests beyond LLM context limits, so we’d rather leave a UI toggle out for now.

  • Future: We can revisit a checkbox under some Advanced settings later.

  • To-do for this PR:

    1. Keep the --include-submodules flag in the CLI and Python API.
    2. Drop the web-UI checkbox.
    3. Rebase / resolve conflicts with main.

Thanks again — looking forward to the update! 🙂

jpotw added a commit to jpotw/gitingest that referenced this pull request Jul 2, 2025
- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
@jpotw jpotw force-pushed the feature/pr-221-fix branch from 9c5f6aa to 28198d6 Compare July 2, 2025 01:26
jpotw added a commit to jpotw/gitingest that referenced this pull request Jul 3, 2025
- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
@jpotw jpotw force-pushed the feature/pr-221-fix branch from a67ec81 to b89aa0d Compare July 3, 2025 04:30
jpotw added a commit to jpotw/gitingest that referenced this pull request Jul 3, 2025
- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
@jpotw jpotw force-pushed the feature/pr-221-fix branch from b89aa0d to 96e2b21 Compare July 3, 2025 04:52
@jpotw
Copy link
Contributor Author

jpotw commented Jul 3, 2025

Hi @filipchristiansen. I've rebased all the necessary changes into a single commit here b89aa0d
and updated the changes I've made #313 (comment)

Please let me know if you need anything else from my side.

@filipchristiansen
Copy link
Contributor

Thank you for your patience @jpotw! Please see my comment above. After that I will just look at it one last time, do some testing and then we should be good to go! Could you see if you can provide a link to a sample repo (you can just put it here in the comments) which includes submodules – will help me with testing.

Thanks again!

@filipchristiansen filipchristiansen self-requested a review July 3, 2025 05:31
jpotw added a commit to jpotw/gitingest that referenced this pull request Jul 3, 2025
- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
@jpotw jpotw force-pushed the feature/pr-221-fix branch from 96e2b21 to 78e7755 Compare July 3, 2025 06:33
@jpotw
Copy link
Contributor Author

jpotw commented Jul 3, 2025

I have checked and implied to this comit 78e7755! The force push was just to make the commit history cleaner.

@jpotw
Copy link
Contributor Author

jpotw commented Jul 3, 2025

Could you see if you can provide a link to a sample repo (you can just put it here in the comments) which includes submodules – will help me with testing.

Here are some sample repos I used that worked as intended:

https://github.com/unittest-cpp/unittest-cpp

❯ gitingest https://github.com/unittest-cpp/unittest-cpp -o - | grep "Files analyzed"
gitingest https://github.com/unittest-cpp/unittest-cpp --include-submodules -o - | grep "Files analyzed"
Analyzing source, preparing output for stdout...

--- Summary ---
Repository: unittest-cpp/unittest-cpp
Files analyzed: 99

Estimated tokens: 72.8k
--- End Summary ---
Analysis complete! Output sent to stdout.
Analyzing source, preparing output for stdout...

--- Summary ---
Repository: unittest-cpp/unittest-cpp
Files analyzed: 107

Estimated tokens: 77.9k
--- End Summary ---
Analysis complete! Output sent to stdout.

https://github.com/ptrks/cpp_cmake_submodule_example

❯ gitingest https://github.com/ptrks/cpp_cmake_submodule_example -o - | grep "Files analyzed"
gitingest https://github.com/ptrks/cpp_cmake_submodule_example --include-submodules -o - | grep "Files analyzed"
Analyzing source, preparing output for stdout...

--- Summary ---
Repository: ptrks/cpp_cmake_submodule_example
Files analyzed: 3

Estimated tokens: 560
--- End Summary ---
Analysis complete! Output sent to stdout.
Analyzing source, preparing output for stdout...

--- Summary ---
Repository: ptrks/cpp_cmake_submodule_example
Files analyzed: 33

Estimated tokens: 662.0k
--- End Summary ---
Analysis complete! Output sent to stdout.

https://github.com/robocorp/example-use-git-submodule-for-shared-code

❯ gitingest https://github.com/robocorp/example-use-git-submodule-for-shared-code -o - | grep "Files analyzed"
gitingest https://github.com/robocorp/example-use-git-submodule-for-shared-code --include-submodules -o - | grep "Files analyzed"
Analyzing source, preparing output for stdout...

--- Summary ---
Repository: robocorp/example-use-git-submodule-for-shared-code
Files analyzed: 4

Estimated tokens: 915
--- End Summary ---
Analysis complete! Output sent to stdout.
Analyzing source, preparing output for stdout...

--- Summary ---
Repository: robocorp/example-use-git-submodule-for-shared-code
Files analyzed: 15

Estimated tokens: 1.6k
--- End Summary ---
Analysis complete! Output sent to stdout.

or we can use diff command to only see the changed lines:

❯ diff <(gitingest https://github.com/unittest-cpp/unittest-cpp -o - 2>&1 | grep -E "Files analyzed|Estimated tokens") <(gitingest https://github.com/unittest-cpp/unittest-cpp --include-submodules -o - 2>&1 | grep -E "Files analyzed|Estimated tokens")
1,2c1,2
< Files analyzed: 99
< Estimated tokens: 72.8k
---
> Files analyzed: 107
> Estimated tokens: 77.9k

- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
@jpotw jpotw force-pushed the feature/pr-221-fix branch from 78e7755 to 7958427 Compare July 3, 2025 07:26
…idy tests

* **CLI**

  * Add example using `--include-submodules` in help text
  * Shorten `include_submodules` param description in `_async_main`
  * Re-order args in `ingest_async` call for readability

* **Clone**

  * Convert `_checkout_partial_clone` to **async** and update docstring accordingly

* **Schemas / Entrypoint**

  * Tighten wording in docstrings for `include_submodules`

* **Tests**

  * Drop redundant `test_cli` case for options mix
  * Rename `test_clone_commit_without_branch` → `test_clone_commit`
  * Strengthen `test_clone_with_include_submodules` assertions
Copy link
Contributor

@filipchristiansen filipchristiansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @jpotw !

@filipchristiansen filipchristiansen merged commit 38c2317 into coderamp-labs:main Jul 3, 2025
19 checks passed
BareninVitalya pushed a commit to BareninVitalya/gitingest that referenced this pull request Jul 5, 2025
* feat: add optional --include-submodules flag to CLI and ingestion

- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Adds include_submodules example in README.md
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))

---------

Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
ix-56h pushed a commit to ix-56h/gitingest that referenced this pull request Jul 7, 2025
* feat: add optional --include-submodules flag to CLI and ingestion

- Adds --include-submodules CLI flag to control submodule analysis
- Propagates include_submodules through ingestion, schemas, and clone logic
- Updates tests to cover submodule inclusion
- Adds a helper function (_checkout_partial_clone) to avoid repetition
- Adds include_submodules example in README.md
- Web UI for this option is not implemented for now (coderamp-labs#313 (comment))

---------

Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
@coderamp-ci coderamp-ci bot mentioned this pull request Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants