galileofile-reporter
Duplicate file across the tenant
Updated: Galileo File Reporter
25.2+ 25.2+
Summary
This report uses the Microsoft 365 scanning features. The tenant scan prompts Agent365 to collect a hash of each file's content and store it in the database, where it can be compared against other files with matching content.
This query demonstrates how the JOINs work together to produce the desired results.
Code
WITH
q AS (SELECT di.name,
di.web_url,
di.size,
srs.byte_string(di.size) AS size_string,
uc.display_name AS created_by,
um.display_name AS modified_by,
count(*) OVER (PARTITION BY di.file_hash) AS hash_count,
srs.bytes_to_hex_string(di.file_hash) as file_hash
FROM ms365.drive_items AS di
LEFT OUTER JOIN ms365.users AS uc ON uc.ms365_id = di.created_by
LEFT OUTER JOIN ms365.users AS um ON um.ms365_id = di.modified_by
LEFT OUTER JOIN ms365.drives AS d ON d.ms365_id = di.ms365_drive_id
LEFT OUTER JOIN ms365.drive_scans AS ds ON ds.id = di.scan_id
WHERE (di.file_hash IS NOT NULL) AND
(di.item_type = 1) AND
(ds.scan_state = 1))
SELECT
q.*
FROM
q
WHERE
q.hash_count >= 2 Preview Images
Downloads
| Attachment | Size |
|---|---|
| duplicate file across the tenant.zip | 3.94 KB |
Sample Report
| Attachment | Size |
|---|---|
| duplicate file across the tenant.pdf | 161.04 KB |