Duplicate file across the tenant

Duplicate file across the tenant

Details

Summary

This report utilizes the new added Microsoft 365 scanning features of version 4.0.  The 4.0 tenant scan prompts the new Agent365 to collect a hash of the file content and store the hash in the database where it can be use to compare against other files that match in content.

This query is designed to show show the different joins work together to get the desired data.

 

Code
 WITH
    q AS (SELECT di.name,
          di.web_url,
          di.size,
          srs.byte_string(di.size) AS size_string,
          uc.display_name AS created_by,
          um.display_name AS modified_by,
          count(*) OVER (PARTITION BY di.file_hash) AS hash_count,
          srs.bytes_to_hex_string(di.file_hash) as file_hash
        FROM ms365.drive_items AS di
          LEFT OUTER JOIN ms365.users AS uc ON uc.ms365_id = di.created_by
          LEFT OUTER JOIN ms365.users AS um ON um.ms365_id = di.modified_by
          LEFT OUTER JOIN ms365.drives AS d ON d.ms365_id = di.ms365_drive_id
          LEFT OUTER JOIN ms365.drive_scans AS ds ON ds.id = di.scan_id
        WHERE (di.file_hash IS NOT NULL) AND
          (di.item_type = 1) AND
          (ds.scan_state = 1))
SELECT
    q.*
FROM
    q
WHERE
    q.hash_count >= 2 
Author
rlagger
Last modified
Friday, April 26, 2024 - 12:48
Properties
Supported Version
Report Category
Includes a Report Layout
Yes
Downloads
AttachmentSize
duplicate file across the tenant.zip3.94 KB
Sample Report
AttachmentSize
duplicate file across the tenant.pdf161.04 KB
Preview Images