Skip to content

Initialize CodeM8 project#1

Open
b4prog wants to merge 14 commits into
mainfrom
codem8-init
Open

Initialize CodeM8 project#1
b4prog wants to merge 14 commits into
mainfrom
codem8-init

Conversation

@b4prog

@b4prog b4prog commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features
    • Added the CodeM8 Rust CLI for duplicate-block detection with file extension filtering, optional explicit file lists, and deterministic plain-text reporting.
  • Bug Fixes
    • Improved duplicate detection with correct “block-only” line handling, enhanced line normalization, and prevented overlapping matches within the same file.
  • Documentation
    • Expanded the README with end-to-end CLI usage, heuristics details, and reporting behavior.
  • Chores / Tests
    • Added Rust CI and a review-gate workflow, tightened Clippy thresholds, and introduced unit/integration tests plus repo agent verification guidance.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1512d8ab-b55c-4b81-93c6-6077d94755a3

📥 Commits

Reviewing files that changed from the base of the PR and between 1ea1269 and b9ecef2.

📒 Files selected for processing (1)
  • src/duplicate.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/duplicate.rs

📝 Walkthrough

Walkthrough

Adds a Rust crate that parses CLI flags, discovers files, normalizes and classifies lines, detects duplicate blocks, renders a report, and wires the binary, workflows, and project guidance.

Changes

Rust duplicate-code reporting

Layer / File(s) Summary
Shared types and helpers
Cargo.toml, src/error.rs, src/paths.rs, src/model.rs
Crate metadata, the shared error type, path formatting helpers, and the core duplicate-analysis data model are introduced together.
Language registry and line processing
src/language.rs, src/line.rs
Language-specific block-only classification and per-file line processing turn source files into normalized, hashed line entries.
CLI parsing and source discovery
src/cli.rs, src/discovery.rs
CLI flags select report mode, file extensions, and explicit files before source files are discovered from paths or recursive walks.
Duplicate grouping and report rendering
src/duplicate.rs, src/report.rs
Matching blocks are expanded, weighted, and rendered into the duplicate report with occurrence metadata and normalized code text.
Library run path and repo automation
src/lib.rs, src/main.rs, .github/workflows/*.yml, README.md, AGENTS.md, clippy.toml, .coderabbit.yaml
The top-level run path, binary entrypoint, workflows, and project guidance cover the end-to-end duplicate-report flow and its local and CI checks.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as src/main.rs
  participant Library as src/lib.rs
  participant Discovery as src/discovery.rs
  participant Line as src/line.rs
  participant Duplicate as src/duplicate.rs
  participant Report as src/report.rs
  participant Writer as stdout

  CLI->>Library: run(args, current_dir, writer)
  Library->>Discovery: discover_source_files(...)
  Discovery-->>Library: Vec<SourceFile>
  Library->>Line: process_source_files(...)
  Line-->>Library: Vec<ProcessedFile>
  Library->>Duplicate: detect_duplicate_blocks(...)
  Duplicate-->>Library: Vec<DuplicateBlock>
  Library->>Report: render_duplicate_report(...)
  Report-->>Library: String
  Library->>Writer: write_all(rendered bytes)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hopped through crates and sniffed each line,
then found the dupes and made them shine.
🐰 CI thumps, the report rolls through,
with fuzzy checks and hashes too.
Hop-hop—CodeM8, you're tidy now!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes this as the initial setup of the CodeM8 project.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codem8-init

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)

18-19: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Optional: harden the checkout step.

Static analysis flags two supply-chain hygiene items here: the checkout token is persisted by default, and the action is referenced by tag rather than a commit SHA. Since contents: read already limits the token scope, this is low risk, but pinning to a hash and disabling credential persistence is good practice.

🔒 Suggested hardening
       - name: Checkout repository
-        uses: actions/checkout@v4
+        uses: actions/checkout@<pinned-commit-sha> # v4
+        with:
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 18 - 19, Harden the checkout step in
the workflow by updating the actions/checkout reference in the Checkout
repository step to a pinned commit SHA instead of a tag, and disable persisted
credentials for that step. Use the existing Checkout repository action entry in
the CI workflow as the location to apply these supply-chain safeguards.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/coderabbit-review.yml:
- Around line 28-74: The stdin-piped Node script in the workflow uses top-level
await, which breaks on older Node runtimes because heredoc input is treated as
CommonJS by default. Update the script block that contains fetchReviews,
latestReview, and the review checks by wrapping the whole logic in an async IIFE
(or another async entrypoint) so it runs without requiring --input-type=module.
Keep the existing fetchReviews recursion and CodeRabbit review-state checks
unchanged, just move them inside the async wrapper.

In `@src/discovery.rs`:
- Line 1: The explicit-file handling in discovery should deduplicate resolved
paths before creating SourceFile entries, since the same file can currently be
added twice and processed multiple times. Update the logic in the discovery flow
that resolves explicit files and builds SourceFile values so each unique
resolved file is emitted only once, preserving existing behavior for
non-duplicates while preventing repeated processing in src/lib.rs and related
callers.

In `@src/duplicate.rs`:
- Around line 22-28: `OccurrenceKey::cmp` is missing `file_path`, so `Ord` is
not aligned with the derived `Eq` and `BTreeSet` can incorrectly merge distinct
occurrences. Update the ordering in `OccurrenceKey` to compare `file_path` as
well as `file_path_key`, `start_line`, and `end_line`, using the same fields
that define equality so de-duplication stays consistent.
- Around line 121-153: The overlap check in expand_pair is too weak because it
only rejects identical LineRef values, so same-file expansions can still produce
overlapping candidate ranges and false positives. Update expand_pair to detect
and reject overlapping ranges when left.file_index equals right.file_index,
using the left_start/left_end and right_start/right_end bounds before building
CandidateBlock, while keeping the existing exact-same-line guard.

---

Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 18-19: Harden the checkout step in the workflow by updating the
actions/checkout reference in the Checkout repository step to a pinned commit
SHA instead of a tag, and disable persisted credentials for that step. Use the
existing Checkout repository action entry in the CI workflow as the location to
apply these supply-chain safeguards.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3df5e3b-e828-474d-9ada-941a5385592d

📥 Commits

Reviewing files that changed from the base of the PR and between d7c9da3 and 32ece2e.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .github/workflows/ci.yml
  • .github/workflows/coderabbit-review.yml
  • Cargo.toml
  • README.md
  • src/cli.rs
  • src/discovery.rs
  • src/duplicate.rs
  • src/error.rs
  • src/language.rs
  • src/lib.rs
  • src/line.rs
  • src/main.rs
  • src/model.rs
  • src/paths.rs
  • src/report.rs

Comment thread .github/workflows/coderabbit-review.yml
Comment thread src/discovery.rs
Comment thread src/duplicate.rs
Comment thread src/duplicate.rs
@b4prog

b4prog commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

@CodeRabbit resume

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Reviews resumed.

@b4prog

b4prog commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

@CodeRabbit resume

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Reviews resumed.

@b4prog

b4prog commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

@CodeRabbit do you approve this pull request?

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== AGENTS.md =="
sed -n '1,220p' AGENTS.md

echo
echo "== Cargo.lock presence =="
fd '^Cargo\.lock$' .

echo
echo "== Outline: src/lib.rs =="
ast-grep outline src/lib.rs --view expanded || true

echo
echo "== src/lib.rs =="
sed -n '1,240p' src/lib.rs

echo
echo "== Outline: src/discovery.rs =="
ast-grep outline src/discovery.rs --view expanded || true

echo
echo "== src/discovery.rs (1-260) =="
sed -n '1,260p' src/discovery.rs

echo
echo "== Outline: src/duplicate.rs =="
ast-grep outline src/duplicate.rs --view expanded || true

echo
echo "== src/duplicate.rs (1-420) =="
sed -n '1,420p' src/duplicate.rs

echo
echo "== .github/workflows/coderabbit-review.yml =="
sed -n '1,220p' .github/workflows/coderabbit-review.yml

echo
echo "== .github/workflows/ci.yml =="
sed -n '1,220p' .github/workflows/ci.yml

Length of output: 33967



@b4prog Yes — based on a static review of the higher-risk areas, I’m comfortable approving this PR.

I checked the CLI/report wiring in src/lib.rs, file discovery in src/discovery.rs, duplicate detection in src/duplicate.rs, and both workflows. I didn’t spot any blocking issues, and the CI setup is consistent with AGENTS.md and the committed Cargo.lock.

I wasn’t able to execute the Rust verification commands in this sandbox, so this approval is based on code inspection rather than a local run of cargo fmt, cargo clippy, and tests.

[approve]

@b4prog

b4prog commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

@CodeRabbit approve

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@b4prog

b4prog commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

@CodeRabbit resume

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Reviews resumed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/duplicate.rs`:
- Around line 313-344: The test in detect_duplicate_blocks should verify sorting
by weight rather than encounter order. In sorts_duplicate_blocks_by_weight,
adjust the fixture so the lighter duplicate block appears first and is separated
by unique lines, while the heavier duplicate block is encountered later; then
keep asserting that detect_duplicate_blocks returns the heavier block at
blocks[0] and that its weight is at least blocks[1].weight.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4fdea58c-a683-4da5-80b0-2328cffb6236

📥 Commits

Reviewing files that changed from the base of the PR and between 5e77570 and 1ea1269.

📒 Files selected for processing (6)
  • .coderabbit.yaml
  • src/cli.rs
  • src/discovery.rs
  • src/duplicate.rs
  • src/line.rs
  • src/paths.rs
✅ Files skipped from review due to trivial changes (1)
  • .coderabbit.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/paths.rs
  • src/line.rs
  • src/discovery.rs
  • src/cli.rs

Comment thread src/duplicate.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant