Frequently asked questions

Failing tests

When starting to use hiphive it is a good idea to run all the tests to check that everything works as expected. A few common reasons why tests may fail include

  • Out of sync test and source-code. The tests have to match the version of the package.

  • Old packages: If the numpy/sympy/spglib version that has been installed does not match the version requirements some tests can fail.

  • Running hiphive on windows is currently not supported and if attempted strange errors and test failures can occur.

How should I select cutoffs for the ClusterSpace?

When selecting cutoffs for your model is often a good idea to try out a few different choices and see if you can achieve convergence. The easiet way to do this is to study the cross validation score as a function of the cutoffs. This well help you choose optimal cutoffs and allow you to detect potential overfitting.

Often it is also possible (and advisable) to study directly the convergence of the thermodynamic property of interest, e.g., the frequency spectrum or thermal conductivity, as a function of the cutoffs in order to ensure the results are converged.

How many training structures are required?

The number of training structures needed to train an accurate force constant potential strongly depends on the number of free paramreters (and hence crystal symmetry), the order of the expansion and desired the accuracy.

The ClusterSpace you are working with contains the number of degrees of freedom of the force constant potential (accessible via cs.n_dofs). This will correspond to the number of columns in the sensing matrix when optimizing the parameters. Each training structure contains \(3N\) force components, i.e. each structure gives rise to \(3N\) rows in the sensing matrix. In general it is a good idea for the linear problem to be solved to be overdetermined, i.e. that the sensing matrix should contain more rows than columns. This will provide an initial indication of the number of training structures required.

Furthermore, it is a good idea to check the convergence of the force constant potential with respect to the number of training structures used. For example the RMSE score from the term:cross validation analysis, as done in learning curve topic, provides a good means to check convergence.

Optimizer fails with memory error

When running on multi-core systems you might encounter errors such as ‘RFE fails with “OSError: [Errno 12] Cannot allocate memory’. This can occur since scikit-learn, which is used for the optimization, attempts by default to parallelize the computation over multiple CPUs, increasing the memory requirement as well. This behaviour can be controlled via the n_jobs parameter. The default value (n_jobs=-1) attempts to use all available CPUs. To reduce the memory consumption the maximum number of concurrently running jobs should be set explicitly, e.g., n_jobs=2.