NR Y-complex modelling¶
For modelling the NR Y-complex the calculation of fit libraries, global optimization and refinement of top “globally optimized” model with Assembline will be used. The dir scnpc_tutorial/NR_Y_complex
includes source code files, input files and precalculated modelling results for the NR Y-complex:
- parameters file & configuration file for 'refinement' integrative modelling of wt NR Y-complex
- sequence fasta file, NR_Y_complex_de_novo_model_PDBs, EM (input data)
- NR_Y_complex_final_refined_model.pdb, out (output modelling results)
- NR_Y_complex_de_novo_run (directory with global optimization data):
- parameters file & configuration file for 'global optimization' integrative modelling of wt NR Y-complex
- sequence fasta file, input_PDB, EM (input data)
- NR_Y_complex_de_novo_model.pdb, out (output modelling results)
- systematic_fits (directory):
- parameters file for systematic fitting
- em_maps, PDB (input data)
- result_fits_chimera (output fitting results)
First activate your virtual environment and enter the
NR_Y_complex/NR_Y_complex_de_novo_run/systematic_fits
dirsource activate Assembline cd NR_Y_complex/NR_Y_complex_de_novo_run/systematic_fits
or depending on your computer setup:
conda activate Assembline cd NR_Y_complex/NR_Y_complex_de_novo_run/systematic_fits
Run the generation of fit libraries for NR Y-complex rigid bodies with Assembline (results calculated in dir
systematic_fits/result_fits_chimera/NR_merged_unerased_tail_relative_clean_v1.3.mrc/
)fit.py systematic_fitting_parameters.py
The fit libraries have been precalculated and analysed in dir
systematic_fits/result_fits_chimera/NR_merged_unerased_tail_relative_clean_v1.3.mrc/
. To analyse the fit results on your own run the following# while in the systematic_fits/ dir genpval.py result_fits_chimera
To generate the top five fits of each input rigid body (i.e. top five fits from each fit library) run the following
#upon successful run of fit.py and genpval cd result_fits_chimera/search100000_metric_cam_rad_600_inside0.3_res_40 genPDBs_many.py -n5 top5 */*/solutions.csv #repeat procedure for the other results in result_fits_chimera/ dir
After completing the calculation of fit libraries (or use the precalculated results) enter global optimization project dir (i.e.
NR_Y_complex_de_novo_run
) and run the global optimizationcd NR_Y_complex/NR_Y_complex_de_novo_run # this will run 20000 global optimization modelling runs on a slurm cluster (for options/parameters or local runs inspect the Assembline manual) assembline.py --traj --models -o out --multi --start_idx 0 --njobs 20000 config.json params.py
Note
There is already an output dir
NR_Y_complex_de_novo_run/out
so in case you want to run the modelling then rename theout/
dir as it will be overwritten from the run aboveEnter
out/
dir, generate output scoring lists and rebuild atomic structures of modelscd out extract_scores.py #this should create a couple of files including all_scores_sorted_uniq.csv rebuild_atomic.py --top 10 --project_dir <full path to the original project directory NR_Y_complex_de_novo_run> config.json all_scores_sorted_uniq.csv
While in the
out/
dir run the following command to prepare your output models for analysis withimp-sampcon
tool fromIMP
setup_analysis.py -s all_scores.csv -o analysis -d density.txt
Note
The density.txt is not provided but only in the
CR_Y_complex/out
, therefore visit this dir to inspect it. To generate it yourself please inspect Assembline analysis section.Run
imp-sampcon exhaust
tool (command-line tool provided with IMP) to perform the sampling analysis:cd analysis imp_sampcon exhaust -n <prefix for output files> \ --rmfA sample_A/sample_A_models.rmf3 \ --rmfB sample_B/sample_B_models.rmf3 \ --scoreA scoresA.txt --scoreB scoresB.txt \ -d ../density.txt \ -m cpu_omp \ -c <int for cores to process> \ -gp \ -g <float with clustering threshold step> \
Note
For further descriptions of settings for
imp_sampcon
please see Sampling exhaustiveness and precision with AssemblineNote
The four plots from the sampling exhaustiveness analysis are provided only in the
CR_Y_complex/out/analysis
. Therefore, visit this dir to inspect it and follow the commands above to run on your own.In order to refine the best globally optimized model of NR Y-complex produced previsouly enter the main project dir (i.e.
scnpc_tutorial/NR_Y_complex/
) and run the refinement# this will run 20000 refinement modelling runs on a slurm cluster (for options/parameters or local runs inspect the Assembline manual) assembline.py --traj --models -o out --multi --start_idx 0 --njobs 20000 config.json params.py
Note
There is already an output dir
NR_Y_complex/out
so in case you want to run the modelling then rename theout/
dir as it will be overwritten from the run above. Also as you noticed the input PDBs used for refinement were stored inNR_Y_complex_de_novo_model_PDBs
for convenience. If you want to run refinement on multiple models in parallel inspect the Assembline manual.Enter
out/
dir, generate output scoring lists and rebuild atomic structures of modelscd out extract_scores.py #this should create a couple of files including all_scores_sorted_uniq.csv rebuild_atomic.py --top 10 --project_dir <full path to the original project directory NR_Y_complex> config.json all_scores_sorted_uniq.csv
As after the
global optimization
run, while in theout/
dir run the following command to prepare your output models for analysis withimp-sampcon
tool fromIMP
setup_analysis.py -s all_scores.csv -o analysis -d density.txt
Note
The density.txt is not provided but only in the
CR_Y_complex/out
, therefore visit this dir to inspect it. To generate it yourself please inspect Assembline analysis section.Run
imp-sampcon exhaust
tool (command-line tool provided with IMP) to perform the sampling analysis:cd analysis imp_sampcon exhaust -n <prefix for output files> \ --rmfA sample_A/sample_A_models.rmf3 \ --rmfB sample_B/sample_B_models.rmf3 \ --scoreA scoresA.txt --scoreB scoresB.txt \ -d ../density.txt \ -m cpu_omp \ -c <int for cores to process> \ -gp \ -g <float with clustering threshold step> \
Note
For further descriptions of settings for
imp_sampcon
please see Sampling exhaustiveness and precision with Assembline