NR Y-complex modelling
======================

For modelling the NR Y-complex the calculation of fit libraries, global optimization and refinement of top "globally optimized" model with `Assembline <https://assembline.readthedocs.io/en/latest/#>`__ will be used. The dir ``scnpc_tutorial/NR_Y_complex`` includes source code files, input files and precalculated modelling results for the NR Y-complex: ::

    - parameters file & configuration file for 'refinement' integrative modelling of wt NR Y-complex
    - sequence fasta file, NR_Y_complex_de_novo_model_PDBs, EM (input data)
    - NR_Y_complex_final_refined_model.pdb, out (output modelling results)
    - NR_Y_complex_de_novo_run (directory with global optimization data):

        - parameters file & configuration file for 'global optimization' integrative modelling of wt NR Y-complex
        - sequence fasta file, input_PDB, EM (input data)
        - NR_Y_complex_de_novo_model.pdb,  out (output modelling results)

        - systematic_fits (directory):

            - parameters file for systematic fitting
            - em_maps, PDB (input data)
            - result_fits_chimera (output fitting results)

#. First activate your virtual environment and enter the ``NR_Y_complex/NR_Y_complex_de_novo_run/systematic_fits`` dir

    .. code-block:: bash

        source activate Assembline

        cd NR_Y_complex/NR_Y_complex_de_novo_run/systematic_fits

    or depending on your computer setup:

    .. code-block:: bash

        conda activate Assembline

        cd NR_Y_complex/NR_Y_complex_de_novo_run/systematic_fits

#. Run the generation of fit libraries for NR Y-complex rigid bodies with Assembline (results calculated in dir ``systematic_fits/result_fits_chimera/NR_merged_unerased_tail_relative_clean_v1.3.mrc/``)

    .. code-block:: bash

        fit.py systematic_fitting_parameters.py

#. The fit libraries have been precalculated and analysed in dir ``systematic_fits/result_fits_chimera/NR_merged_unerased_tail_relative_clean_v1.3.mrc/``. To analyse the fit results on your own run the following

    .. code-block:: bash

            # while in the systematic_fits/ dir
            genpval.py result_fits_chimera

#. To generate the top five fits of each input rigid body (i.e. top five fits from each fit library) run the following

    .. code-block:: bash

            #upon successful run of fit.py and genpval
            cd result_fits_chimera/search100000_metric_cam_rad_600_inside0.3_res_40

            genPDBs_many.py -n5 top5 */*/solutions.csv

            #repeat procedure for the other results in result_fits_chimera/ dir

#. After completing the calculation of fit libraries (or use the precalculated results) enter global optimization project dir (i.e. ``NR_Y_complex_de_novo_run``) and run the global optimization

    .. code-block:: bash

            cd NR_Y_complex/NR_Y_complex_de_novo_run

            # this will run 20000 global optimization modelling runs on a slurm cluster (for options/parameters or local runs inspect the Assembline manual)
            assembline.py --traj --models -o out --multi --start_idx 0 --njobs 20000 config.json params.py

    .. note:: There is already an output dir ``NR_Y_complex_de_novo_run/out`` so in case you want to run the modelling then rename the ``out/`` dir as it will be overwritten from the run above 

#. Enter ``out/`` dir, generate output scoring lists and rebuild atomic structures of models 

    .. code-block:: bash

        cd out

        extract_scores.py #this should create a couple of files including all_scores_sorted_uniq.csv

        rebuild_atomic.py --top 10 --project_dir <full path to the original project directory NR_Y_complex_de_novo_run> config.json all_scores_sorted_uniq.csv

#. While in the ``out/`` dir run the following command to prepare your output models for analysis with ``imp-sampcon`` tool from ``IMP``
    
    .. code-block:: bash

        setup_analysis.py -s all_scores.csv -o analysis -d density.txt

    .. note:: The density.txt is not provided but only in the ``CR_Y_complex/out``, therefore visit this dir to inspect it. To generate it yourself please inspect `Assembline analysis section <https://assembline.readthedocs.io/en/latest/sampling_exhaustiveness.html>`__.

#. Run ``imp-sampcon exhaust`` tool (command-line tool provided with IMP) to perform the sampling analysis:

    .. code-block:: bash

        cd analysis

        imp_sampcon exhaust -n <prefix for output files> \
        --rmfA sample_A/sample_A_models.rmf3 \
        --rmfB sample_B/sample_B_models.rmf3 \
        --scoreA scoresA.txt --scoreB scoresB.txt \
        -d ../density.txt \
        -m cpu_omp \
        -c <int for cores to process> \
        -gp \
        -g <float with clustering threshold step> \

    .. note:: For further descriptions of settings for ``imp_sampcon`` please see `Sampling exhaustiveness and precision with Assembline <https://assembline.readthedocs.io/en/latest/sampling_exhaustiveness.html#>`_

    .. note:: The four plots from the sampling exhaustiveness analysis are provided only in the ``CR_Y_complex/out/analysis``. Therefore, visit this dir to inspect it and follow the commands above to run on your own.

#. In order to refine the best globally optimized model of NR Y-complex produced previsouly enter the main project dir (i.e. ``scnpc_tutorial/NR_Y_complex/``) and run the refinement

    .. code-block:: bash

            # this will run 20000 refinement modelling runs on a slurm cluster (for options/parameters or local runs inspect the Assembline manual)
            assembline.py --traj --models -o out --multi --start_idx 0 --njobs 20000 config.json params.py

    .. note:: There is already an output dir ``NR_Y_complex/out`` so in case you want to run the modelling then rename the ``out/`` dir as it will be overwritten from the run above. Also as you noticed the input PDBs used for refinement were stored in ``NR_Y_complex_de_novo_model_PDBs`` for convenience. If you want to run refinement on multiple models in parallel inspect the Assembline manual.

#. Enter ``out/`` dir, generate output scoring lists and rebuild atomic structures of models 

    .. code-block:: bash

        cd out

        extract_scores.py #this should create a couple of files including all_scores_sorted_uniq.csv

        rebuild_atomic.py --top 10 --project_dir <full path to the original project directory NR_Y_complex> config.json all_scores_sorted_uniq.csv

#. As after the ``global optimization`` run, while in the ``out/`` dir run the following command to prepare your output models for analysis with ``imp-sampcon`` tool from ``IMP``
    
    .. code-block:: bash

        setup_analysis.py -s all_scores.csv -o analysis -d density.txt

    .. note:: The density.txt is not provided but only in the ``CR_Y_complex/out``, therefore visit this dir to inspect it. To generate it yourself please inspect `Assembline analysis section <https://assembline.readthedocs.io/en/latest/sampling_exhaustiveness.html>`__.

#. Run ``imp-sampcon exhaust`` tool (command-line tool provided with IMP) to perform the sampling analysis:

    .. code-block:: bash

        cd analysis

        imp_sampcon exhaust -n <prefix for output files> \
        --rmfA sample_A/sample_A_models.rmf3 \
        --rmfB sample_B/sample_B_models.rmf3 \
        --scoreA scoresA.txt --scoreB scoresB.txt \
        -d ../density.txt \
        -m cpu_omp \
        -c <int for cores to process> \
        -gp \
        -g <float with clustering threshold step> \

    .. note:: For further descriptions of settings for ``imp_sampcon`` please see `Sampling exhaustiveness and precision with Assembline <https://assembline.readthedocs.io/en/latest/sampling_exhaustiveness.html#>`_