-
Notifications
You must be signed in to change notification settings - Fork 858
feat: include_submodules
option
#313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: include_submodules
option
#313
Conversation
I also added an additional commit( ProblemIntegration tests were failing with 429 "Rate limit exceeded" errors: ❯ pytest tests
...
tests/test_flow_integration.py:265: AssertionError
---------------------------------------------------- Captured log call ----------------------------------------------------
WARNING slowapi:extension.py:510 ratelimit 10 per 1 minute (testclient) exceeded at endpoint: /
================================================= short test summary info =================================================
FAILED tests/test_flow_integration.py::test_repository_with_submodules - AssertionError: Request failed: {"error":"Rate limit exceeded: 10 per 1 minute"}
FAILED tests/test_flow_integration.py::test_include_submodules_true_propagation - AssertionError: Form submission failed: {"error":"Rate limit exceeded: 10 per 1 minute"}
FAILED tests/test_flow_integration.py::test_include_submodules_false_propagation - AssertionError: Form submission failed: {"error":"Rate limit exceeded: 10 per 1 minute"}
============================================= 3 failed, 121 passed in 26.06s ============================================== Root CauseThe problem was that new submodule integration tests increased total POST requests in the test suite from 10 to 13, exceeding the production rate limit of 10 requests/minute. All test requests appear to come from the same "testclient" IP address. SolutionAdded conditional rate limiter that disables limiting in test environments: # In server/server_utils.py
if os.getenv("TESTING") or "pytest" in sys.modules:
class NoOpLimiter:
def limit(self, *args, **kwargs):
return lambda func: func # No-op decorator
limiter = NoOpLimiter()
else:
limiter = Limiter(key_func=get_remote_address) collected 124 items
tests/query_parser/test_git_host_agnostic.py ............... [ 12%]
tests/query_parser/test_query_parser.py ................................... [ 40%]
........ [ 46%]
tests/test_cli.py .... [ 50%]
tests/test_flow_integration.py ......... [ 57%]
tests/test_git_utils.py ................. [ 70%]
tests/test_ingestion.py ........ [ 77%]
tests/test_notebook_utils.py ......... [ 84%]
tests/test_repository_clone.py ................... [100%]
============================== 124 passed in 26.16s =============================== All integration tests now pass reliably. |
Hi @filipchristiansen. Just wanted to follow up on this PR. It's been about five days since the review, and I've incorporated your feedback into the latest commit. 80e818d Could you please let me know if there are any further changes required, or if it's ready for merge? |
Hi @jpotw — thanks for your work!
Thanks again — looking forward to the update! 🙂 |
- Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
- Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
- Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
Hi @filipchristiansen. I've rebased all the necessary changes into a single commit here b89aa0d Please let me know if you need anything else from my side. |
Thank you for your patience @jpotw! Please see my comment above. After that I will just look at it one last time, do some testing and then we should be good to go! Could you see if you can provide a link to a sample repo (you can just put it here in the comments) which includes submodules – will help me with testing. Thanks again! |
- Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
I have checked and implied to this comit 78e7755! The force push was just to make the commit history cleaner. |
Here are some sample repos I used that worked as intended: https://github.com/unittest-cpp/unittest-cpp ❯ gitingest https://github.com/unittest-cpp/unittest-cpp -o - | grep "Files analyzed"
gitingest https://github.com/unittest-cpp/unittest-cpp --include-submodules -o - | grep "Files analyzed"
Analyzing source, preparing output for stdout...
--- Summary ---
Repository: unittest-cpp/unittest-cpp
Files analyzed: 99
Estimated tokens: 72.8k
--- End Summary ---
Analysis complete! Output sent to stdout.
Analyzing source, preparing output for stdout...
--- Summary ---
Repository: unittest-cpp/unittest-cpp
Files analyzed: 107
Estimated tokens: 77.9k
--- End Summary ---
Analysis complete! Output sent to stdout. https://github.com/ptrks/cpp_cmake_submodule_example ❯ gitingest https://github.com/ptrks/cpp_cmake_submodule_example -o - | grep "Files analyzed"
gitingest https://github.com/ptrks/cpp_cmake_submodule_example --include-submodules -o - | grep "Files analyzed"
Analyzing source, preparing output for stdout...
--- Summary ---
Repository: ptrks/cpp_cmake_submodule_example
Files analyzed: 3
Estimated tokens: 560
--- End Summary ---
Analysis complete! Output sent to stdout.
Analyzing source, preparing output for stdout...
--- Summary ---
Repository: ptrks/cpp_cmake_submodule_example
Files analyzed: 33
Estimated tokens: 662.0k
--- End Summary ---
Analysis complete! Output sent to stdout. https://github.com/robocorp/example-use-git-submodule-for-shared-code ❯ gitingest https://github.com/robocorp/example-use-git-submodule-for-shared-code -o - | grep "Files analyzed"
gitingest https://github.com/robocorp/example-use-git-submodule-for-shared-code --include-submodules -o - | grep "Files analyzed"
Analyzing source, preparing output for stdout...
--- Summary ---
Repository: robocorp/example-use-git-submodule-for-shared-code
Files analyzed: 4
Estimated tokens: 915
--- End Summary ---
Analysis complete! Output sent to stdout.
Analyzing source, preparing output for stdout...
--- Summary ---
Repository: robocorp/example-use-git-submodule-for-shared-code
Files analyzed: 15
Estimated tokens: 1.6k
--- End Summary ---
Analysis complete! Output sent to stdout. or we can use ❯ diff <(gitingest https://github.com/unittest-cpp/unittest-cpp -o - 2>&1 | grep -E "Files analyzed|Estimated tokens") <(gitingest https://github.com/unittest-cpp/unittest-cpp --include-submodules -o - 2>&1 | grep -E "Files analyzed|Estimated tokens")
1,2c1,2
< Files analyzed: 99
< Estimated tokens: 72.8k
---
> Files analyzed: 107
> Estimated tokens: 77.9k |
- Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Web UI for this option is not implemented for now (coderamp-labs#313 (comment))
…idy tests * **CLI** * Add example using `--include-submodules` in help text * Shorten `include_submodules` param description in `_async_main` * Re-order args in `ingest_async` call for readability * **Clone** * Convert `_checkout_partial_clone` to **async** and update docstring accordingly * **Schemas / Entrypoint** * Tighten wording in docstrings for `include_submodules` * **Tests** * Drop redundant `test_cli` case for options mix * Rename `test_clone_commit_without_branch` → `test_clone_commit` * Strengthen `test_clone_with_include_submodules` assertions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @jpotw !
* feat: add optional --include-submodules flag to CLI and ingestion - Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Adds include_submodules example in README.md - Web UI for this option is not implemented for now (coderamp-labs#313 (comment)) --------- Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
* feat: add optional --include-submodules flag to CLI and ingestion - Adds --include-submodules CLI flag to control submodule analysis - Propagates include_submodules through ingestion, schemas, and clone logic - Updates tests to cover submodule inclusion - Adds a helper function (_checkout_partial_clone) to avoid repetition - Adds include_submodules example in README.md - Web UI for this option is not implemented for now (coderamp-labs#313 (comment)) --------- Co-authored-by: Filip Christiansen <22807962+filipchristiansen@users.noreply.github.com>
history: #221
Add support for Git submodules
This PR adds the ability to include and analyze git submodules when ingesting repositories, available through CLI.
Changes
CLI Interface
--include-submodules
flag that recursively clones all submodulesfalse
for backward compatibilityWeb UI for this option is not implemented for now (#313 (comment))
Web Interface- Added "Include submodules" checkbox in repository form- JavaScript handles state synchronization between form elementsImplementation
The
--include-submodules
CLI flag's value is propagated through the ingestion pipeline, affectingCloneConfig
andIngestionQuery
schemas. The core cloning logic has been updated to append--recurse-submodules
to thegit clone
command when this flag is enabled.Added
# pylint: disable=too-many-instance-attributes
to theCloneConfig
class. This was done to address theR0902: Too many instance attributes
warning, as all attributes are necessary and refactoring for fewer would increase unnecessary complexity.Additionally introduced a helper function (
_checkout_partial_clone
) to reduce cyclomatic complexity in clone_repo function.This change was made to resolve a
ruff C901
complexity warning, after adding theinclude_submodules
feature.Tried to keep the code changes minimal and non-intrusive.
This solution is intended as a minimal yet temporary fix and can be revisited for further refactoring in the future if needed.
Testing
Tests have been updated to ensure proper handling and inclusion of submodules.