BioMike commited on
Commit
b84d2d1
·
verified ·
1 Parent(s): 577e72b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -20
README.md CHANGED
@@ -37,27 +37,101 @@ SMILES2IUPAC-small is based on the MT5 model with optimizations in implementing
37
  - **Paper [optional]:** coming soon
38
  - **Demo [optional]:** huggingface.co/spaces/knowledgator/ChemicalConverters
39
 
40
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
-
44
- ### Direct Use
45
-
46
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
-
48
- [More Information Needed]
49
-
50
- ### Downstream Use [optional]
51
-
52
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
-
54
- [More Information Needed]
55
-
56
- ### Out-of-Scope Use
57
-
58
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
-
60
- [More Information Needed]
61
 
62
  ## Bias, Risks, and Limitations
63
 
 
37
  - **Paper [optional]:** coming soon
38
  - **Demo [optional]:** huggingface.co/spaces/knowledgator/ChemicalConverters
39
 
40
+ ## Quickstart
41
+ Firstly, install the library:
42
+ ```commandline
43
+ pip install chemical-converters
44
+ ```
45
+ ### SMILES to IUPAC
46
+ You can choose pretrained model from table in the section "Models",
47
+ but we recommend to use model "smiles2iupac-canonical-base".
48
+ #### ! Preferred IUPAC style
49
+ To choose the preferred IUPAC style, place style tokens before
50
+ your SMILES sequence.
51
+
52
+ | Style Token | Description |
53
+ |-------------|----------------------------------------------------------------------------------------------------|
54
+ | `<BASE>` | The most known name of the substance, sometimes is the mixture of traditional and systematic style |
55
+ | `<SYST>` | The totally systematic style without trivial names |
56
+ | `<TRAD>` | The style is based on trivial names of the parts of substances |
57
+
58
+ #### To perform simple translation, follow the example:
59
+ ```python
60
+ from chemicalconverters import NamesConverter
61
+ converter = NamesConverter(model_name="smiles2iupac-canonical-base")
62
+ print(converter.smiles_to_iupac('CCO'))
63
+ print(converter.smiles_to_iupac(['<SYST>CCO', '<TRAD>CCO', '<BASE>CCO']))
64
+ ```
65
+ ```text
66
+ ['ethanol']
67
+ ['ethanol', 'ethanol', 'ethanol']
68
+ ```
69
+ #### Processing in batches:
70
+ ```python
71
+ from chemicalconverters import NamesConverter
72
+ converter = NamesConverter(model_name="smiles2iupac-canonical-base")
73
+ print(converter.smiles_to_iupac(["<BASE>C=CC=C" for _ in range(10)], num_beams=1,
74
+ process_in_batch=True, batch_size=1000))
75
+ ```
76
+ ```text
77
+ ['buta-1,3-diene', 'buta-1,3-diene'...]
78
+ ```
79
+ #### Validation SMILES to IUPAC translations
80
+ It's possible to validate the translations by reverse translation into IUPAC
81
+ and calculating Tanimoto similarity of two molecules fingerprints.
82
+ ````python
83
+ from chemicalconverters import NamesConverter
84
+ converter = NamesConverter(model_name="smiles2iupac-canonical-base")
85
+ print(converter.smiles_to_iupac('CCO', validate=True))
86
+ ````
87
+ ````text
88
+ ['ethanol'] 1.0
89
+ ````
90
+ The larger is Tanimoto similarity, the more is probability, that the prediction was correct.
91
+
92
+ You can also process validation manually:
93
+ ```python
94
+ from chemicalconverters import NamesConverter
95
+ validation_model = NamesConverter(model_name="iupac2smiles-canonical-base")
96
+ print(NamesConverter.validate_iupac(input_sequence='CCO', predicted_sequence='CCO', validation_model=validation_model))
97
+ ```
98
+ ```text
99
+ 1.0
100
+ ```
101
+ !Note validation was not implemented in processing in batches.
102
+
103
+ ### IUPAC to SMILES
104
+ You can choose pretrained model from table in the section "Models",
105
+ but we recommend to use model "iupac2smiles-canonical-base".
106
+ #### To perform simple translation, follow the example:
107
+ ```python
108
+ from chemicalconverters import NamesConverter
109
+ converter = NamesConverter(model_name="iupac2smiles-canonical-base")
110
+ print(converter.smiles_to_iupac('ethanol'))
111
+ print(converter.smiles_to_iupac(['ethanol', 'ethanol', 'ethanol']))
112
+ ```
113
+ ```text
114
+ ['CCO']
115
+ ['CCO', 'CCO', 'CCO']
116
+ ```
117
+ #### Processing in batches:
118
+ ```python
119
+ from chemicalconverters import NamesConverter
120
+ converter = NamesConverter(model_name="smiles2iupac-canonical-base")
121
+ print(converter.smiles_to_iupac(["buta-1,3-diene" for _ in range(10)], num_beams=1,
122
+ process_in_batch=True, batch_size=1000))
123
+ ```
124
+ ```text
125
+ ['<SYST>C=CC=C', '<SYST>C=CC=C'...]
126
+ ```
127
+ Our models also predict IUPAC styles from the table:
128
+
129
+ | Style Token | Description |
130
+ |-------------|----------------------------------------------------------------------------------------------------|
131
+ | `<BASE>` | The most known name of the substance, sometimes is the mixture of traditional and systematic style |
132
+ | `<SYST>` | The totally systematic style without trivial names |
133
+ | `<TRAD>` | The style is based on trivial names of the parts of substances |
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
  ## Bias, Risks, and Limitations
137