From owner-chemistry@ccl.net Wed Sep 24 10:35:00 2025 From: "Andrew Dalke dalke*|*dalkescientific.com" To: CCL Subject: CCL: ANN: chemfp 5.0 Message-Id: <-55416-250924102304-23060-j6PmGBbp8XVVIT09Z0ATZA:+:server.ccl.net> X-Original-From: Andrew Dalke Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 Date: Wed, 24 Sep 2025 16:22:42 +0200 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Sent to CCL by: Andrew Dalke [dalke[*]dalkescientific.com] Hello CCL subscribers, 14 years and 4 days ago I announced chemfp 1.0, my package for cheminformatics fingerprint generation and search, to CCL. Since then I've added many new features, including clustering, and diversity selection. Given CCL's upcoming shutdown, it is with bittersweet pleasure I announce the release of chemfp 5.0. For a description of the new features in this release, see https://chemfp.com/docs/whats_new_in_50.html . The highlights are: • Update the FPB format to handle over 1 billion fingerprints. • New chemfp shardsearch command-line tool which does similarity search across multiple target files and merges the result. - Tested with the 977 million structures in GDB-13 • New chemfp simhistogram / chemfp simhist command-line tool and corresponding chemfp.simhistogram() high-level API function to create a histogram of similarity scores. • Initial support for count fingerprints: - new text-based FPC format based on the FPS format - rdkit2fpc tool which uses RDKit's sparse fingerprint generators - fpc2fps tool with various method to convert sparse count fingerprints to binary fingerprints • Fast implementations of the 4860-bit Klekota-Roth fingerprint for the OpenEye and RDKit toolkits. Chemfp is available at no cost to academic users. The chemfp home page is https://chemfp.com/ Cheers, Andrew Dalke dalke * dalkescientific.com -- Have useful but old in-house cheminformatics software in need of refurbishment? No one now knows how it works or has the time? Perhaps I can help. Contact me.