Frequently asked questions¶
When starting to use hiphive it is a good idea to run all the tests to check that everything works as expected. A few common reasons why tests may fail include
Out of sync test and source-code. The tests have to match the version of the package.
Old packages: If the numpy/sympy/spglib version that has been installed does not match the version requirements some tests can fail.
Running hiphive on windows is currently not supported and if attempted strange errors and test failures can occur.
How should I select cutoffs for the ClusterSpace?¶
When selecting cutoffs for your model is often a good idea to try out a few different choices and see if you can achieve convergence. The easiet way to do this is to study the cross validation score as a function of the cutoffs. This well help you choose optimal cutoffs and allow you to detect potential overfitting.
Often it is also possible (and advisable) to study directly the convergence of the thermodynamic property of interest, e.g., the frequency spectrum or thermal conductivity, as a function of the cutoffs in order to ensure the results are converged.
How many training structures are required?¶
The number of training structures needed to train an accurate force constant potential strongly depends on the number of free paramreters (and hence crystal symmetry), the order of the expansion and desired the accuracy.
ClusterSpace you are working with
contains the number of degrees of freedom of the force constant
potential (accessible via
cs.n_dofs). This will correspond to the
number of columns in the sensing matrix when optimizing the
parameters. Each training structure contains \(3N\) force
components, i.e. each structure gives rise to \(3N\) rows in the
sensing matrix. In general it is a good idea for the linear problem to
be solved to be overdetermined, i.e. that the sensing matrix should
contain more rows than columns. This will provide an initial
indication of the number of training structures required.
Furthermore, it is a good idea to check the convergence of the force constant potential with respect to the number of training structures used. For example the RMSE score from the term:cross validation analysis, as done in learning curve topic, provides a good means to check convergence.
Optimizer fails with memory error¶
When running on multi-core systems you might encounter errors such as
‘RFE fails with “OSError: [Errno 12] Cannot allocate memory’. This can
occur since scikit-learn, which is used for the optimization, attempts
by default to parallelize the computation over multiple CPUs,
increasing the memory requirement as well. This behaviour can be
controlled via the n_jobs parameter. The
default value (
n_jobs=-1) attempts to use all available CPUs. To
reduce the memory consumption the maximum number of concurrently
running jobs should be set explicitly, e.g.,