Class CIPChirality

java.lang.Object
org.jmol.symmetry.CIPChirality

public class CIPChirality extends Object
A fully validated relatively efficient implementation of Cahn-Ingold-Prelog rules for assigning R/S, M/P, and E/Z stereochemical descriptors. Based on IUPAC Blue Book rules of 2013 and assorted corrections. IUPAC Project: Corrections, Revisions and Extension for the Nomenclature of Organic Chemistry - IUPAC Recommendations and Preferred Names 2013 (the IUPAC Blue Book) https://iupac.org/projects/project-details/?project_nr=2001-043-1-800 http://www.sbcs.qmul.ac.uk/iupac/bibliog/BBerrors.html Settable options: set testflag1 use advanced in/out-sensitive Rule 6 (r,r-bicyclo[2.2.2]octane) set testflag2 turn off tracking (saving of _M.CIPInfo) for speed Features include: - deeply validated - includes revised Rules 1b, and 2 - includes a proposed Rule 6 - implemented in Java (Jmol) and JavaScript (JSmol) - only a few Java classes; < 1000 lines - efficient, one-pass process for each center using a single finite digraph for all auxiliary descriptors - exhaustive processing of all 9 sequence rules (1a, 1b, 2, 3, 4a, 4b, 4c, 5, 6) - includes R/S, r/s, M/P (axial, not planar), E/Z - covers any-length odd and even cumulenes - uses Jmol conformational SMARTS to detect atropisomers and helicenes - covers chiral phosphorus and sulfur, including trigonal pyramidal and tetrahedral - properly treats complex combinations of R/S, M/P, and seqCis/seqTrans centers (Rule 4b) - properly treats neutral-species resonance structures using fractional atomic mass and a modified Rule 1b - implements CIP spiro rule (BB P-93.5.3.1) as part of Rule 6 - detects small rings (fewer than 8 members) and removes E/Z specifications for such - detects chiral bridgehead nitrogens and E/Z imines and diazines - reports atom descriptor along with the rule that ultimately decided it - fills _M.CIPInfo with detailed information about how each ligand was decided (feature turned off by set testflag2) - generates advanced Rule 6 descriptors for cubane and the like. (Generally 'r') using set testflag1 Primary 236-compound Chapter-9 validation set (AY-236) provided by Andrey Yerin, ACD/Labs (Moscow). Mikko Vainio also supplied a 64-compound testing suite (MV-64), which is available on SourceForge in the Jmol-datafiles directory. (https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol-datafiles/cip). Additional test structures provided by John Mayfield. Additional thanks to the IUPAC Blue Book Revision project, specifically Karl-Heinz Hellwich for alerting me to the errata page for the 2013 IUPAC specs (http://www.chem.qmul.ac.uk/iupac/bibliog/BBerrors.html), Gerry Moss for discussions, Andrey Yerin for discussion and digraph checking. Many thanks to the members of the BlueObelisk-Discuss group, particularly Mikko Vainio, John Mayfield (aka John May), Wolf Ihlenfeldt, and Egon Willighagen, for encouragement, examples, serious skepticism, and extremely helpful advice. References: CIP(1966) R.S. Cahn, C. Ingold, V. Prelog, Specification of Molecular Chirality, Angew.Chem. Internat. Edit. 5, 385ff Custer(1986) Roland H. Custer, Mathematical Statements About the Revised CIP-System, MATCH, 21, 1986, 3-31 http://match.pmf.kg.ac.rs/electronic_versions/Match21/match21_3-31.pdf Mata(1993) Paulina Mata, Ana M. Lobo, Chris Marshall, A.Peter Johnson The CIP sequence rules: Analysis and proposal for a revision, Tetrahedron: Asymmetry, Volume 4, Issue 4, April 1993, Pages 657-668 Mata(1994) Paulina Mata, Ana M. Lobo, Chris Marshall, and A. Peter Johnson, Implementation of the Cahn-Ingold-Prelog System for Stereochemical Perception in the LHASA Program, J. Chem. Inf. Comput. Sci. 1994, 34, 491-504 491 http://pubs.acs.org/doi/abs/10.1021/ci00019a004 Mata(2005) Paulina Mata, Ana M. Lobo, The Cahn, Ingold and Prelog System: eliminating ambiguity in the comparison of diastereomorphic and enantiomorphic ligands, Tetrahedron: Asymmetry, Volume 16, Issue 13, 4 July 2005, Pages 2215-2223 Favre(2013) Henri A Favre, Warren H Powell, Nomenclature of Organic Chemistry : IUPAC Recommendations and Preferred Names 2013 DOI:10.1039/9781849733069 http://pubs.rsc.org/en/content/ebook/9780854041824#!divbookcontent code history: 5/12/18 Jmol 14.29.14 fixes minor Rule 5 bug and adds advanced Rule 6 in/out testflag1 option (857 lines) 5/1/18 Jmol 14.29.14 fixes enantiomorphic Rule 5 R/S check for BH64_85 and BH64_86 4/25/18 Jmol 14.29.14 fixes spiroallene Rule 6 issue for BH64_84 4/23/18 Jmol 14.29.14 fixes Rule 2 for JM_008, involving mass and duplicates (824 lines) 4/11/18 Jmol 14.29.13 adds optional CIPDataTracker class (822 lines) 4/2/18 Jmol 14.29.13 adds optional CIPDataSmiles class 4/2/18 Jmol 14.29.13 adds John's "mancude-like" cyclic conjugated ene Kekule averaging 12/10/17 Jmol 14.29.9 adds CIPData, mancude Kekule averaging 11/11/17 Jmol 14.25.1 adds "duplicate over terminal" in Rule 1b; streamlined (777 lines) 11/05/17 Jmol 14.24.1 fixes a problem with seqCis/seqTrans and also with Rule 2 (799 lines) 10/17/17 Jmol 14.20.10 adds S4 check in Rule 6 and also fixes bug in aux descriptors being skipped when two ligands are equivalent for the root (798 lines) 9/19/17 CIPChirality code simplification (778 lines) 9/14/17 Jmol 14.20.6 switching to Mikko's idea for Rule 4b and 5. Abandons "thread" idea. Uses breadth-first algorithm for generating bitsets for R and S. Processing time reduced by 50%. Still could be optimized some. (820 lines) 7/25/17 Jmol 14.20.4 consolidates all ene determinations; moves auxiliary descriptor generation to prior to Rule 3 (850 lines) 7/23/17 Jmol 14.20.4 adds Rule 6; rewrite/consolidate spiro, C3, double spiran code (853 lines) 7/19/17 Jmol 14.20.3 fixing Rule 2 (880 lines) 7/13/17 Jmol 14.20.3 more thorough spiro testing (858 lines) 7/10/17 Jmol 14.20.2 adding check for C3 and double spiran (CIP 1966 #32 and #33) 7/8/17 Jmol 14.20.2 adding presort for Rules 4a and 4c (test12.mol; 828 lines) 7/7/17 Jmol 14.20.1 minor coding efficiencies (833 lines) 7/6/17 Jmol 14.20.1 major rewrite to correct and simplify logic; full validation for 433 structures (many duplicates) in AY236, BH64, MV64, MV116, JM, and L (836 lines) 6/30/17 Jmol 14.20.1 major rewrite of Rule 4b (999 lines) 6/25/17 Jmol 14.19.1 minor fixes for Rule 4b and 5 for BH64_012-015; better atropisomer check 6/12/2017 Jmol 14.18.2 tested for Rule 1b sphere (AY236.53, 163, 173, 192); 957 lines 6/8/2017 Jmol 14.18.2 removed unnecessary presort for Rule 1b 5/27/17 Jmol 14.17.2 fully interfaced using SimpleNode and SimpleEdge 5/27/17 Jmol 14.17.1 fully validated; simplified code; 978 lines 5/17/17 Jmol 14.16.1. adds helicene M/P chirality; 959 lines validated using CCDC structures HEXHEL02 HEXHEL03 HEXHEL04 ODAGOS ODAHAF http://pubs.rsc.org/en/content/articlehtml/2017/CP/C6CP07552E 5/14/17 Jmol 14.15.5. trimmed up and documented; no need for lone pairs; 948 lines 5/13/17 Jmol 14.15.4. algorithm simplified; validated for mixed Rule 4b systems involving auxiliary R/S, M/P, and seqCis/seqTrans; 959 lines 5/06/17 validated for 236 compound set AY-236. 5/02/17 validated for 161 compounds, including M/P, m/p (axial chirality for biaryls and odd-number cumulenes) 4/29/17 validated for 160 compounds, including M/P, m/p (axial chirality for biaryls and odd-number cumulenes) 4/28/17 Validated for 146 compounds, including imines and diazines, sulfur, phosphorus 4/27/17 Rules 3-5 preliminary version 14.15.1 4/6/17 Introduced in Jmol 14.12.0; validated for Rules 1 and 2 in Jmol 14.13.2; 100 lines NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! Added logic to Rule 1b: Rule 1b: In comparing duplicate atoms, the one with lower root distance has precedence, where root distance is defined as: (a) in the case of ring-closure duplicates, the sphere of the duplicated atom; and (b) in the case of multiple-bond duplicates, the sphere of the atom to which the duplicate atom is attached. Rationale: Using only the distance of the duplicated atom (current definition) introduces a Kekule bias, which can be illustrated with various simple models. By moving that distance to be the sphere of the parent atom of the duplicate, the problem is resolved. Added clarification to Rule 2: Rule 2: Higher mass precedes lower mass, where mass is defined in the case of nonduplicate atoms with identified isotopes for elements as their exact isotopic mass and, in all other cases, as their element's atomic weight. Rationale: BB is not self-consistent, including both "mass number" (in the rule) and "atomic mass" in the description, where "79Br < Br < 81Br". And again we have the same Kekule-ambiguous issue as in Rule 1b. The added clarification fixes the Kekule issue (not using isotope mass number for duplicate atoms), solves the problem that F < 19F (though 100% nat. abundance), and is easily programmable. In Jmol the logic is very simple, actually using the isotope mass number, but doing two checks: a) if one of four specific isotopes (16O, 52Cr, 96Mo, 175Lu), reverse the test, and b) if on the list of 100% natural isotopes or one of the non-natural elements, use the element's accepted atomic weight. See CIPAtom.getMass(); PROPOSED Rule 6: An undifferentiated reference node has priority over any other undifferentiated node. Rationale: This rule is stated in CIP(1966) p. 357.
Author:
Bob Hanson hansonr@stolaf.edu
  • Field Details

    • RULE_2_nXX_EQ_XX

      static final String RULE_2_nXX_EQ_XX
      These elements have 100% natural abundance; we will use their isotope mass number instead of their actual average mass, since there is no difference
      See Also:
    • RULE_2_REDUCE_ISOTOPE_MASS_NUMBER

      static final String RULE_2_REDUCE_ISOTOPE_MASS_NUMBER
      These elements have an isotope number that is a bit higher than the average mass, even though their actual isotope mass is a bit lower. We will change 16 to 15.9, 52 to 51.9, 96 to 95.9, 175 to 174.9 so as to force the unspecified mass atom to be higher priority than the specified one. All other isotopes can use their integer isotope mass number instead of looking up their exact isotope mass.
      See Also:
    • NO_CHIRALITY

      static final int NO_CHIRALITY
      See Also:
    • TIED

      static final int TIED
      See Also:
    • A_WINS

      static final int A_WINS
      See Also:
    • B_WINS

      static final int B_WINS
      See Also:
    • IGNORE

      static final int IGNORE
      See Also:
    • UNDETERMINED

      static final int UNDETERMINED
      See Also:
    • STEREO_R

      static final int STEREO_R
      See Also:
    • STEREO_S

      static final int STEREO_S
      See Also:
    • STEREO_M

      static final int STEREO_M
      See Also:
    • STEREO_P

      static final int STEREO_P
      See Also:
    • STEREO_Z

      static final int STEREO_Z
      See Also:
    • STEREO_E

      static final int STEREO_E
      See Also:
    • STEREO_BOTH_RS

      static final int STEREO_BOTH_RS
      See Also:
    • STEREO_BOTH_EZ

      static final int STEREO_BOTH_EZ
      See Also:
    • RULE_1a

      static final int RULE_1a
      See Also:
    • RULE_1b

      static final int RULE_1b
      See Also:
    • RULE_2

      static final int RULE_2
      See Also:
    • RULE_3

      static final int RULE_3
      See Also:
    • RULE_4a

      static final int RULE_4a
      See Also:
    • RULE_4b

      static final int RULE_4b
      See Also:
    • RULE_4c

      static final int RULE_4c
      See Also:
    • RULE_5

      static final int RULE_5
      See Also:
    • RULE_6

      static final int RULE_6
      See Also:
    • ruleNames

      static final String[] ruleNames
    • MAX_PATH

      static final int MAX_PATH
      maximum path to display for debugging only using SET DEBUG in Jmol
      See Also:
    • SMALL_RING_MAX

      static final int SMALL_RING_MAX
      maximum ring size that can have a double bond with no E/Z designation; also used for identifying aromatic rings and bridgehead nitrogens
      See Also:
    • currentRule

      int currentRule
      the current rule being applied exhaustively
    • root

      The atom for which we are determining the stereochemistry
    • data

      CIPData data
      collected bitsets and more specialized SMILES/SMARTS searches and vwr references
    • doTrack

      boolean doTrack
      are we tracking pathways for _M.CIPInfo?
    • isAux

      boolean isAux
      are we in the midst of auxiliary center creation?
    • bsNeedRule

      javajs.util.BS bsNeedRule
      set bits RULE_1a - RULE_6 to indicate a need for that rule based on what is in the model
    • havePseudoAuxiliary

      boolean havePseudoAuxiliary
      do we have r or s and so will need to recalculate Mata like/unlike lists in Rule 5?
    • ptIDLogger

      int ptIDLogger
      incremental pointer providing a unique ID to every CIPAtom for debugging
  • Constructor Details

    • CIPChirality

      public CIPChirality()
  • Method Details

    • getRuleName

      public String getRuleName(int rule)
    • getChiralityForAtoms

      public void getChiralityForAtoms(CIPData data)
      A general determination of chirality that involves ultimately all of Rules 1-6.
      Parameters:
      data -
    • setStereoFromSmiles

      private void setStereoFromSmiles(javajs.util.BS bsHelix, int stereo, SimpleNode[] atoms)
    • preFilterAtomList

      private boolean preFilterAtomList(SimpleNode[] atoms, javajs.util.BS bsToDo, javajs.util.BS bsEnes)
      Remove unnecessary atoms from the list and let us know if we have alkenes to consider.
      Parameters:
      atoms -
      bsToDo -
      bsEnes -
      Returns:
      whether we have any alkenes that could be EZ
    • isFirstRow

      static boolean isFirstRow(SimpleNode a)
      Check if an atom is 1st row.
      Parameters:
      a -
      Returns:
      elemno > 2 && elemno <= 10
    • clearSmallRingEZ

      private void clearSmallRingEZ(SimpleNode[] atoms, javajs.util.Lst<int[]> lstEZ)
      Remove E/Z designations for small-rings double bonds (IUPAC 2013.P-93.5.1.4.1).
      Parameters:
      atoms -
      lstEZ -
    • getAtomBondChirality

      private void getAtomBondChirality(SimpleNode atom, javajs.util.Lst<int[]> lstEZ, javajs.util.BS bsToDo)
      Get E/Z characteristics for specific atoms. Also check here for atropisomeric M/P designations
      Parameters:
      atom -
      lstEZ -
      bsToDo -
    • getLastCumuleneAtom

      private SimpleNode getLastCumuleneAtom(SimpleEdge bond, SimpleNode atom, int[] nSP2, SimpleNode[] parents)
      Parameters:
      bond -
      atom -
      nSP2 - returns the number of sp2 carbons in this alkene or cumulene
      parents -
      Returns:
      the terminal atom of this alkene or cumulene
    • getAtomChiralityLimited

      int getAtomChiralityLimited(SimpleNode atom, CIPChirality.CIPAtom cipAtom, SimpleNode parentAtom)
      Determine R/S or one half of E/Z determination
      Parameters:
      atom - ignored if a is not null (just checking ene end top priority)
      cipAtom - ignored if atom is not null
      parentAtom - null for tetrahedral, other alkene carbon for E/Z
      Returns:
      if and E/Z test, [0:none, 1: atoms[0] is higher, 2: atoms[1] is higher] otherwise [0:none, 1:R, 2:S]
    • getBondChiralityLimited

      private int getBondChiralityLimited(SimpleEdge bond, SimpleNode a)
      Determine the axial or E/Z chirality for this bond, with the given starting atom a
      Parameters:
      bond -
      a - first atom to consider, or null
      Returns:
      one of: {NO_CHIRALITY | STEREO_Z | STEREO_E | STEREO_Ra | STEREO_Sa | STEREO_ra | STEREO_sa}
    • setBondChirality

      private int setBondChirality(SimpleNode a, SimpleNode pa, SimpleNode pb, SimpleNode b, boolean isAxial)
      Determine the axial or E/Z chirality for the a-b bond.
      Parameters:
      a -
      pa -
      pb -
      b -
      isAxial -
      Returns:
      one of: {NO_CHIRALITY | STEREO_Z | STEREO_E | STEREO_M | STEREO_P | STEREO_m | STEREO_p}
    • getEneChirality

      int getEneChirality(CIPChirality.CIPAtom winner1, CIPChirality.CIPAtom end1, CIPChirality.CIPAtom end2, CIPChirality.CIPAtom winner2, boolean isAxial, boolean allowPseudo)
      Determine the stereochemistry of a bond
      Parameters:
      winner1 -
      end1 -
      end2 -
      winner2 -
      isAxial - if an odd-cumulene
      allowPseudo - if we are working from a high-level bond stereochemistry method
      Returns:
      STEREO_M, STEREO_P, STEREO_Z, STEREO_E, or NO_CHIRALITY