iView home page
iView Multimedia Forum Index

FAQ FAQ     SearchSearch     MemberlistMemberlist     UsergroupsUsergroups    RegisterRegister  
ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Script to find duplicates files / assign md5 hashes

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    iView Multimedia Forum Index -> AppleScript
View previous topic :: View next topic  
Author Message
algal



Joined: 07 Jan 2007
Posts: 2

PostPosted: Sun Jan 07, 2007 11:09 am    Post subject: Script to find duplicates files / assign md5 hashes Reply with quote

Are you ever plagued by duplicate files on disk (not just duplicate catalog items for the same file)?

The following two scripts will let you remove duplicate items. Details on what they do and how to use them are in the documentation, in the scripts themselves. I hope this is helpful to someone. It works for me. No warranty express or implied etc., etc..

The first script assigns md5 digest to the Job Identifier annotation. It's called "JobId from Digest"

Code:
-- WHAT PROBLEM IT SOLVES:

-- iView includes a command for removing duplicate media items. However, this command only finds media items which point to the same file. It does not solve the problem of removing media items which point to different but identical files.

-- WHAT IT DOES:

-- This command takes one or more selected files, and sets their Job Identifiers to their MD5 Digests. The MD5 digest is a string which is generated by inspecting a given file, and acts a sort of ASCII fingerprint of that file, in that it is designed to have an extremely good chance of being unique to that file. Instead of comparing images by inspecting them manually, you can just compare their digests.

-- HOW TO USE:

-- If you have duplicate media items (duplicate files on disk, not just duplicate items in your catalog pointing to the same file), then run this command on all your media items and afterwards sort by Job Description. Sorting by MD5 digest will group identical items together, making it much easier to remove the unwanted duplicates. To automate that last step, use the other script "Label Duplicate JobIds"

-- by algal, on the iView forum. No warranty express or implied, etc..

on run
   -- get the list of selected ID's in front window
   set itemsToProcess to GetSelection()
   
   -- show about
   AboutScript()
   
   tell window 1 of application "iView MediaPro"
      repeat with theItem in itemsToProcess
         -- get the item's md5sum
         set thePathname to the path of theItem
         
         try
            do shell script "/sbin/md5 -q " & quoted form of the POSIX path of thePathname
         on error errMsg number errNo
            error errMsg number errNo
            return "failed"
         end try
         set theDigest to result
         
         -- set the job identifier
         set the transmission of theItem to theDigest
      end repeat
   end tell
end run

-- about this script
on AboutScript()
   display dialog
      "This command sets items' Job Identifiers to their MD5 digests" & return
      buttons {"Cancel", "OK"} default button 2 with icon note
   
   set theAnswer to the button returned of the result
   return theAnswer
end AboutScript

-- get the selected media items in an array
on GetSelection()
   set selectedItems to {}
   tell application "iView MediaPro"
      if catalog 1 exists then set selectedItems to the selection of catalog 1
   end tell
   if number of items in selectedItems is 0 then
      display dialog
         "You need to select at least one media item in the front catalog in order to use this script." buttons {"OK"} default button
         "OK" with icon note giving up after 10
      error number -128
   end if
   return selectedItems
end GetSelection


This second (optional) script labels items whose predecessor has the same JobId. It's called "Label Duplicate JobIds":
Code:
-- WHAT PROBLEM IT SOLVES:

-- This command can be used in conjunction with "JobId from Digest" in order to remove duplicate media items.

-- WHAT IT DOES:

-- This command takes one or more selected files, and labels the items whose Job Identifier differs from the Job Identifier of the preceding item. What good is this? On its own, not much. However, when used intelligently with the other command "JobId from Digest", it provides a convenient way to label all the media items pointing to duplicate files.

-- HOW TO USE:

-- 1) Select all items, and run "JobId from Digest". This will populate the Job Identifier annotation field with an md5 digest, a "hash" string that acts as a sort of ASCII fingerprint of the contents of the media item file itself.

-- 2) Sort all your items by Job Identifier. This will cause all media items with the same md5 has to be grouped together. Each group will include different catalog items pointing to the same file, as well as different catalog items pointing to different but identical files. If you want to remove all such duplicates, then your goal is to keep only one item in each group (for instance, the first item) and remove the rest. You can do this manually, or else ...

-- 3) Select all items, and run "Label Duplicate JobIds". This will label red all the items in each group except the first item.

-- 4) Dispose of the red items however you like.

-- by algal, on the iView forum. No warranty express or implied, etc..

on run
   -- get the list of selected ID's in front window
   set itemsToProcess to GetSelection()
   
   -- show about
   AboutScript()
   
   tell window 1 of application "iView MediaPro"
      set lastDigest to "invalid"
      -- set the label of all items to 1
      repeat with theItem in itemsToProcess
         set theName to the name of theItem
         set theDigest to the transmission of theItem
         if theDigest = lastDigest then
            set theLabel to 1
            set the label index of theItem to theLabel
         end if
         set lastDigest to theDigest
      end repeat
   end tell
end run

-- get the selected media items in an array
on GetSelection()
   set selectedItems to {}
   tell application "iView MediaPro"
      if catalog 1 exists then set selectedItems to the selection of catalog 1
   end tell
   if number of items in selectedItems is 0 then
      display dialog
         "You need to select at least one media item in the front catalog in order to use this script." buttons {"OK"} default button
         "OK" with icon note giving up after 10
      error number -128
   end if
   return selectedItems
end GetSelection

-- about this script
on AboutScript()
   display dialog
      "Labels items with same JobId as their predecessor. If used after assigning digests to and sorting on JobIds, this labels duplicate items." & return
      buttons {"Cancel", "OK"} default button 2 with icon note
   set theAnswer to the button returned of the result
   return theAnswer
end AboutScript


Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    iView Multimedia Forum Index -> AppleScript All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



  © iView Multimedia Ltd. | All Rights Reserved Privacy Policy | Copyright | Site Map