Initialize CodeM8 project by b4prog · Pull Request #1 · b4prog/CodeM8

b4prog · 2026-06-25T10:38:15Z

Summary by CodeRabbit

New Features
- Added the CodeM8 Rust CLI for duplicate-block detection with file extension filtering, optional explicit file lists, and deterministic plain-text reporting.
Bug Fixes
- Improved duplicate detection with correct “block-only” line handling, enhanced line normalization, and prevented overlapping matches within the same file.
Documentation
- Expanded the README with end-to-end CLI usage, heuristics details, and reporting behavior.
Chores / Tests
- Added Rust CI and a review-gate workflow, tightened Clippy thresholds, and introduced unit/integration tests plus repo agent verification guidance.

coderabbitai · 2026-06-25T10:38:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1512d8ab-b55c-4b81-93c6-6077d94755a3

📥 Commits

Reviewing files that changed from the base of the PR and between 1ea1269 and b9ecef2.

📒 Files selected for processing (1)

src/duplicate.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/duplicate.rs

📝 Walkthrough

Walkthrough

Adds a Rust crate that parses CLI flags, discovers files, normalizes and classifies lines, detects duplicate blocks, renders a report, and wires the binary, workflows, and project guidance.

Changes

Rust duplicate-code reporting

Layer / File(s)	Summary
Shared types and helpers `Cargo.toml`, `src/error.rs`, `src/paths.rs`, `src/model.rs`	Crate metadata, the shared error type, path formatting helpers, and the core duplicate-analysis data model are introduced together.
Language registry and line processing `src/language.rs`, `src/line.rs`	Language-specific block-only classification and per-file line processing turn source files into normalized, hashed line entries.
CLI parsing and source discovery `src/cli.rs`, `src/discovery.rs`	CLI flags select report mode, file extensions, and explicit files before source files are discovered from paths or recursive walks.
Duplicate grouping and report rendering `src/duplicate.rs`, `src/report.rs`	Matching blocks are expanded, weighted, and rendered into the duplicate report with occurrence metadata and normalized code text.
Library run path and repo automation `src/lib.rs`, `src/main.rs`, `.github/workflows/*.yml`, `README.md`, `AGENTS.md`, `clippy.toml`, `.coderabbit.yaml`	The top-level run path, binary entrypoint, workflows, and project guidance cover the end-to-end duplicate-report flow and its local and CI checks.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as src/main.rs
  participant Library as src/lib.rs
  participant Discovery as src/discovery.rs
  participant Line as src/line.rs
  participant Duplicate as src/duplicate.rs
  participant Report as src/report.rs
  participant Writer as stdout

  CLI->>Library: run(args, current_dir, writer)
  Library->>Discovery: discover_source_files(...)
  Discovery-->>Library: Vec<SourceFile>
  Library->>Line: process_source_files(...)
  Line-->>Library: Vec<ProcessedFile>
  Library->>Duplicate: detect_duplicate_blocks(...)
  Duplicate-->>Library: Vec<DuplicateBlock>
  Library->>Report: render_duplicate_report(...)
  Report-->>Library: String
  Library->>Writer: write_all(rendered bytes)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I hopped through crates and sniffed each line,
then found the dupes and made them shine.
🐰 CI thumps, the report rolls through,
with fuzzy checks and hashes too.
Hop-hop—CodeM8, you're tidy now!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes this as the initial setup of the CodeM8 project.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codem8-init

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

.github/workflows/ci.yml (1)
18-19: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Optional: harden the checkout step.

Static analysis flags two supply-chain hygiene items here: the checkout token is persisted by default, and the action is referenced by tag rather than a commit SHA. Since contents: read already limits the token scope, this is low risk, but pinning to a hash and disabling credential persistence is good practice.
🔒 Suggested hardening
       - name: Checkout repository
-        uses: actions/checkout@v4
+        uses: actions/checkout@<pinned-commit-sha> # v4
+        with:
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 18 - 19, Harden the checkout step in
the workflow by updating the actions/checkout reference in the Checkout
repository step to a pinned commit SHA instead of a tag, and disable persisted
credentials for that step. Use the existing Checkout repository action entry in
the CI workflow as the location to apply these supply-chain safeguards.
Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/coderabbit-review.yml:
- Around line 28-74: The stdin-piped Node script in the workflow uses top-level
await, which breaks on older Node runtimes because heredoc input is treated as
CommonJS by default. Update the script block that contains fetchReviews,
latestReview, and the review checks by wrapping the whole logic in an async IIFE
(or another async entrypoint) so it runs without requiring --input-type=module.
Keep the existing fetchReviews recursion and CodeRabbit review-state checks
unchanged, just move them inside the async wrapper.

In `@src/discovery.rs`:
- Line 1: The explicit-file handling in discovery should deduplicate resolved
paths before creating SourceFile entries, since the same file can currently be
added twice and processed multiple times. Update the logic in the discovery flow
that resolves explicit files and builds SourceFile values so each unique
resolved file is emitted only once, preserving existing behavior for
non-duplicates while preventing repeated processing in src/lib.rs and related
callers.

In `@src/duplicate.rs`:
- Around line 22-28: `OccurrenceKey::cmp` is missing `file_path`, so `Ord` is
not aligned with the derived `Eq` and `BTreeSet` can incorrectly merge distinct
occurrences. Update the ordering in `OccurrenceKey` to compare `file_path` as
well as `file_path_key`, `start_line`, and `end_line`, using the same fields
that define equality so de-duplication stays consistent.
- Around line 121-153: The overlap check in expand_pair is too weak because it
only rejects identical LineRef values, so same-file expansions can still produce
overlapping candidate ranges and false positives. Update expand_pair to detect
and reject overlapping ranges when left.file_index equals right.file_index,
using the left_start/left_end and right_start/right_end bounds before building
CandidateBlock, while keeping the existing exact-same-line guard.

---

Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 18-19: Harden the checkout step in the workflow by updating the
actions/checkout reference in the Checkout repository step to a pinned commit
SHA instead of a tag, and disable persisted credentials for that step. Use the
existing Checkout repository action entry in the CI workflow as the location to
apply these supply-chain safeguards.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3df5e3b-e828-474d-9ada-941a5385592d

📥 Commits

Reviewing files that changed from the base of the PR and between d7c9da3 and 32ece2e.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (15)

.github/workflows/ci.yml
.github/workflows/coderabbit-review.yml
Cargo.toml
README.md
src/cli.rs
src/discovery.rs
src/duplicate.rs
src/error.rs
src/language.rs
src/lib.rs
src/line.rs
src/main.rs
src/model.rs
src/paths.rs
src/report.rs

b4prog · 2026-06-25T12:58:07Z

@CodeRabbit resume

coderabbitai · 2026-06-25T12:58:13Z

✅ Action performed

Reviews resumed.

b4prog · 2026-06-25T13:40:10Z

@CodeRabbit resume

coderabbitai · 2026-06-25T13:40:17Z

✅ Action performed

Reviews resumed.

b4prog · 2026-06-25T13:43:28Z

@CodeRabbit do you approve this pull request?

coderabbitai · 2026-06-25T13:44:10Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== AGENTS.md =="
sed -n '1,220p' AGENTS.md

echo
echo "== Cargo.lock presence =="
fd '^Cargo\.lock$' .

echo
echo "== Outline: src/lib.rs =="
ast-grep outline src/lib.rs --view expanded || true

echo
echo "== src/lib.rs =="
sed -n '1,240p' src/lib.rs

echo
echo "== Outline: src/discovery.rs =="
ast-grep outline src/discovery.rs --view expanded || true

echo
echo "== src/discovery.rs (1-260) =="
sed -n '1,260p' src/discovery.rs

echo
echo "== Outline: src/duplicate.rs =="
ast-grep outline src/duplicate.rs --view expanded || true

echo
echo "== src/duplicate.rs (1-420) =="
sed -n '1,420p' src/duplicate.rs

echo
echo "== .github/workflows/coderabbit-review.yml =="
sed -n '1,220p' .github/workflows/coderabbit-review.yml

echo
echo "== .github/workflows/ci.yml =="
sed -n '1,220p' .github/workflows/ci.yml

Length of output: 33967

@b4prog Yes — based on a static review of the higher-risk areas, I’m comfortable approving this PR.

I checked the CLI/report wiring in src/lib.rs, file discovery in src/discovery.rs, duplicate detection in src/duplicate.rs, and both workflows. I didn’t spot any blocking issues, and the CI setup is consistent with AGENTS.md and the committed Cargo.lock.

I wasn’t able to execute the Rust verification commands in this sandbox, so this approval is based on code inspection rather than a local run of cargo fmt, cargo clippy, and tests.

[approve]

b4prog · 2026-06-25T13:45:06Z

@CodeRabbit approve

coderabbitai · 2026-06-25T13:45:13Z

✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

b4prog · 2026-06-25T14:35:51Z

@CodeRabbit resume

coderabbitai · 2026-06-25T14:35:57Z

✅ Action performed

Reviews resumed.

…ases

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/duplicate.rs`:
- Around line 313-344: The test in detect_duplicate_blocks should verify sorting
by weight rather than encounter order. In sorts_duplicate_blocks_by_weight,
adjust the fixture so the lighter duplicate block appears first and is separated
by unique lines, while the heavier duplicate block is encountered later; then
keep asserting that detect_duplicate_blocks returns the heavier block at
blocks[0] and that its weight is at least blocks[1].weight.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4fdea58c-a683-4da5-80b0-2328cffb6236

📥 Commits

Reviewing files that changed from the base of the PR and between 5e77570 and 1ea1269.

📒 Files selected for processing (6)

.coderabbit.yaml
src/cli.rs
src/discovery.rs
src/duplicate.rs
src/line.rs
src/paths.rs

✅ Files skipped from review due to trivial changes (1)

.coderabbit.yaml

🚧 Files skipped from review as they are similar to previous changes (4)

src/paths.rs
src/line.rs
src/discovery.rs
src/cli.rs

[feat] add deterministic duplicate code report CLI

32ece2e

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread .github/workflows/coderabbit-review.yml

Comment thread src/discovery.rs

Comment thread src/duplicate.rs

Comment thread src/duplicate.rs

b4prog added 9 commits June 25, 2026 12:54

[ci] wrap CodeRabbit workflow script in async IIFE

a4e1b14

[fix] deduplicate resolved explicit source files

cb9c7e8

[fix] reject overlapping duplicate ranges in the same file

d998211

[docs] document cargo installation from GitHub and local source

d886d25

[chore] add clippy lint threshold configuration

a61844b

[docs] document agent verification requirements and local checks

cbe018a

[refactor] reduce nesting in duplicate and language helpers

945d07c

[chore] satisfy stricter clippy lint requirements

9417bae

[ci] add Clippy validation to the Rust CI workflow

5e77570

b4prog added 2 commits June 25, 2026 15:57

[ci] enable CodeRabbit request changes approval workflow

436936f

[ci] enable detailed CodeRabbit reviews and disable poems

f9054a0

[test] add coverage for parser, discovery, duplicate, and path edge c…

1ea1269

…ases

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/duplicate.rs

[test] make duplicate sort fixture fail without sorting

b9ecef2

Conversation

b4prog commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

b4prog commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Uh oh!

b4prog commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Uh oh!

b4prog commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Uh oh!

b4prog commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Uh oh!

b4prog commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

b4prog commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading