Conversation
|
This PR will trigger no release when merged. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Hi @solaris007
|
solaris007
left a comment
There was a problem hiding this comment.
@tkotthakota-adobe - good progress on the refactoring. The overall architecture is sound: database-based bot detection via bot-detection.js, clean separation from CloudWatch utils, and well-structured Slack messages. A few items to address:
Important
1. Test mock path mismatch - esmock won't intercept checkAndAlertBotProtection
The handler imports:
import { checkAndAlertBotProtection } from '../../utils/bot-detection.js';But 15+ test locations mock it at the wrong path:
'../../../src/utils/cloudwatch-utils.js': {
checkAndAlertBotProtection: sinon.stub().resolves(null), // <-- ineffective
getAuditStatus: sinon.stub().resolves({ executed: true, failureReason: null }),
},esmock resolves by module path, so this mock doesn't intercept the handler's import from bot-detection.js. Tests pass because the bot detection code path either isn't reached (no jobId in scraping mock data) or the unmocked function returns null. Fix by mocking at the correct path:
'../../../src/utils/bot-detection.js': {
checkAndAlertBotProtection: sinon.stub().resolves(null),
},
'../../../src/utils/cloudwatch-utils.js': {
getAuditStatus: sinon.stub().resolves({ executed: true, failureReason: null }),
},2. Dead re-export in cloudwatch-utils.js
// Bot detection functions have been moved to bot-detection.js
// Re-export for backward compatibility
export {
convertAbortInfoToStats,
checkAndAlertBotProtection,
} from './bot-detection.js';This is a new file - there's no backward compatibility concern. Nothing uses this re-export (handler and tests both import from bot-detection.js directly). Remove it.
3. sortJobsByDate - confusing variable names
function sortJobsByDate(jobs) {
return jobs.sort((a, b) => {
const dateA = new Date(b.startedAt || b.createdAt || 0); // dateA from b?
const dateB = new Date(a.startedAt || a.createdAt || 0); // dateB from a?
return dateA - dateB;
});
}Functionally correct (sorts latest first) but the swapped names are confusing. Consider:
const dateA = new Date(a.startedAt || a.createdAt || 0);
const dateB = new Date(b.startedAt || b.createdAt || 0);
return dateB - dateA;4. Test file naming / missing coverage
cloudwatch-utils.test.jsactually testsbot-detection.jsfunctions - consider renaming tobot-detection.test.js- No tests for
getAuditStatus(the only function actually exported bycloudwatch-utils.js)
Minor
@slack/web-apiadded as a direct dependency inpackage.jsonbut not imported in any source file - is this needed?- Skipped test (
should use fallback stats when CloudWatch returns no logs) with no tracking ticket
What looks good
bot-detection.js- clean, focused, proper error handling, good separationisScrapingAvailablecorrectly returnsjobIdfrom the scrape jobformatBotProtectionSlackMessage- well-structured with impact percentage, sample URLs, and resolution instructions- Defensive error handling throughout (try-catch around Slack alerts, null checks)
Overall this is in good shape. The mock path mismatch (#1) is the most important fix since it means bot detection integration isn't actually being tested.
|
@tkotthakota-adobe - regarding
The IPs you proposed ( For And for the Slack message to be useful per-environment, you'd want to set the right IPs for each:
|
|
@tkotthakota-adobe - on your Coralogix questions: Error log filteringThe filter or:
Debug log visibility
Heads up - prod error in task-processorI spotted this error from yesterday (Feb 6) in Coralogix: The |
|
@tkotthakota-adobe - both secrets are now set for all three environments:
Note on the IPs: The values you originally proposed ( The Lambda will pick up the new secrets on next cold start - no redeployment needed. |
|
@solaris007 Appreciate checking the IPs and adding both variables to secrets manager on all 3 environments. Review comments are addressed.
|
|
@solaris007 what could be causing this deploy error? Changes made should not effect deployment in my view.
|
|
@solaris007 I noticed on aws console lambda is being deployed. For some reason Jenkins deployer is showing the error I shared earlier. |
solaris007
left a comment
There was a problem hiding this comment.
Re-review - All requested changes addressed
The 4 issues from the previous review have all been fixed:
- Mock paths now correctly target
bot-detection.js(verified all esmock calls) - Dead re-export removed -
cloudwatch-utils.jsonly exportsgetAuditStatus sortJobsByDatefunction vssortedJobsresult - no more name collision- Single
bot-detection.test.js, no duplicate test file
Bot detection end-to-end flow is clean: handler gets jobId from scrape jobs, checkAndAlertBotProtection reads abortInfo via ScrapeClient, converts to stats, sends Slack alert. LGTM.
Minor cleanup items
-
Duplicate JSDoc on
filterJobsByTimestamp- Two JSDoc blocks stacked on the same function. Remove the shorter one. -
Unnecessary esmock mock -
bot-detection.test.jsmocksfetchRecentThreadMessagesfromslack-utils.js, butbot-detection.jsdoesn't import it. Harmless but confusing. -
Gist tarball dependencies - 4 packages on gist tarballs. Must be replaced with published npm versions once #1308 merges.
-
getScrapeJobsByBaseURLno longer filters by'default'processing type - Intentional? This now returns jobs of all processing types. Just confirming this is desired behavior.
|
@solaris007 addressed your comments. For some reason this PR is not getting deployed. When I created fresh test PR it deployed fine. When I created new PR out of this PR it failed. That means some state in this PR is causing issue. I am running out of ideas. One other option is just cherry pick code to new PR. This issue started happening only from yesterday. Update: Upgrading to latest @adobe/helix-deploy (same as in audit worker) solved the issue. |

TODO:
Add SCRAPE_JOB_CONFIGURATION to spacecat-services--task-manager secrets manager on prod. Value can be obtained from spacecat-services--audit-worker secrets manager.Implemented comprehensive bot protection detection and alerting in the Task Processor to identify when sites are blocked by bot protection services (Cloudflare, Akamai, Imperva, etc.) and prevent unnecessary processing.
Tests:
DynamoDB records created and fetched using scrape client:
Sample record
Debug Log:
Tests:
https://cq-dev.slack.com/archives/C060T2PPF8V/p1770426770385289?thread_ts=1770424775.658269&cid=C060T2PPF8V
https://jira.corp.adobe.com/browse/SITES-37727