Analyze ======= Enter the output directory. .. code-block:: bash cd out Extract scores -------------- .. code-block:: bash extract_scores.py This should create a couple of files including all_scores_sorted_uniq.csv Create a CIF file of the top N models -------------------------------------- .. code-block:: bash rebuild_atomic.py --top 10 --project_dir config.json all_scores_sorted_uniq.csv Assess sampling exhaustiveness ------------------------------ Run sampling performance analysis with imp-sampcon tool (described by `Viswanath et al. 2017 `_) .. warning:: For the global optimization, the sampling exhaustiveness is not always applicable. For some cases, the optimization at this stage can actually work so well that it leads to all or most models being the same, resulting in very few clusters. In such cases, the sampling is exhaustive under the assumptions in the json but the estimation of sampling precision won't be possible. In such cases we recommend to intensively refine (e.g. with high initial temperatures in simulated annealing) the top (or all models) to create a diverse set of models for analysis. #. Prepare the ``density.txt`` file .. code-block:: bash create_density_file.py --project_dir ../ config.json --by_rigid_body .. note:: Example ``density.txt`` file is provided in ``CR_Y_complex/`` #. Run ``setup_analysis.py`` script to prepare input files for the sampling exhaustiveness analysis. .. code-block:: bash setup_analysis.py -s all_scores.csv -o analysis -d density.txt --score_thresh ``--score_thresh`` is optional and used to filter out some rare very poorly scoring models (the threshold can be adjusted based on the ``scores.pdf`` generated above) .. note:: For further descriptions of settings for ``setup_analysis`` please see `Sampling exhaustiveness and precision with Assembline `_ #. Run ``imp-sampcon exhaust`` tool (command-line tool provided with IMP) to perform the actual analysis: .. code-block:: bash cd analysis imp_sampcon exhaust -n \ --rmfA sample_A/sample_A_models.rmf3 \ --rmfB sample_B/sample_B_models.rmf3 \ --scoreA scoresA.txt --scoreB scoresB.txt \ -d ../density.txt \ -m cpu_omp \ -c \ -gp \ -g \ .. note:: For further descriptions of settings for ``imp_sampcon`` please see `Sampling exhaustiveness and precision with Assembline `_ #. In the output you will get, among other files: * ``.Sampling_Precision_Stats.txt`` Estimation of the sampling precision. * Clusters obtained after clustering at the determined (By imp-sampcon) sampling precision in directories and files starting from ``cluster`` in their names, containing information about the models in the clusters and cluster localization densities * ``.Cluster_Precision.txt`` listing the precision for each cluster * PDF files with plots with the results of exhaustiveness tests See `Viswanath et al. 2017 `_ for detailed explanation of these concepts. #. Optimize the plots The fonts and value ranges in X and Y axes in the default plots from ``imp_sampcon exhaust`` are frequently not optimal. For this you have to adjust them manually. #. Copy the original ``gnuplot`` scripts to the current ``analysis`` directory by executing: .. code-block:: bash copy_sampcon_gnuplot_scripts.py This will copy four scripts to the current directory: * ``Plot_Cluster_Population.plt`` for the ``.Cluster_Population.pdf`` plot * ``Plot_Convergence_NM.plt`` for the ``.ChiSquare.pdf`` plot * ``Plot_Convergence_SD.plt`` for the ``.Score_Dist.pdf`` plot * ``Plot_Convergence_TS.plt`` for the ``.Top_Score_Conv.pdf`` plot #. Edit the scripts to adjust according to your liking #. Run the scripts again:: gnuplot -e "sysname=''" Plot_Cluster_Population.plt gnuplot -e "sysname=''" Plot_Convergence_NM.plt gnuplot -e "sysname=''" Plot_Convergence_SD.plt gnuplot -e "sysname=''" Plot_Convergence_TS.plt #. Extract cluster models for visualization .. code-block:: bash extract_cluster_models.py \ --project_dir \ --outdir \ --ntop \ --scores \ Identities_A.txt Identities_B.txt For example, to extract the 5 top scoring models from cluster 0: .. code-block:: bash extract_cluster_models.py \ --project_dir ../../ \ --outdir cluster.0/ \ --ntop 5 \ --scores ../all_scores.csv \ Identities_A.txt Identities_B.txt cluster.0.all.txt ../config.json The models are saved in the CIF format to ``cluster.0`` directory #. If you want to re-cluster at a specific threshold (e.g. to get bigger clusters), you can do: .. code-block:: bash mkdir recluster cd recluster/ cp ../Distances_Matrix.data.npy . cp ../*ChiSquare_Grid_Stats.txt . cp ../*Sampling_Precision_Stats.txt . imp_sampcon exhaust -n \ --rmfA ../sample_A/sample_A_models.rmf3 \ --rmfB ../sample_B/sample_B_models.rmf3 \ --scoreA ../scoresA.txt --scoreB ../scoresB.txt \ -d ../density.txt \ -m cpu_omp \ -c 4 \ -gp \ --skip \ --cluster_threshold