Hi guys,
The RDKit Diversity Picker node uses the MinMax method to select a desired diversity set. This method is computationally intensive for large datasets of compounds from which pick up and for large data subsets to pick up. I want to build a diversity set starting from a large dataset (let say 1 M compounds) by reducing its size by 2 order of magnitude (obtaining a diversity set of 100K compounds). When I tried to do this with RDKit Diversity Picker I ended filling up the RAM and SWAP of my system and I couldn't complete my task.
Please, can anybody suggest what are large, reasonable size of datasets (from which pick up compounds) and diversity subsets that can be obtained using this node?
Thank you
Gio