phik

  • Version: 0.12.4. Released: Jan 2024

  • Release notes: https://github.com/KaveIO/PhiK/blob/master/CHANGES.rst

    Bạn đang xem: phik

  • Repository: https://github.com/kaveio/phik

  • Documentation: https://phik.readthedocs.io

  • Publication: [offical] [arxiv pre-print]

Phi_K is a practical correlation constant that works consistently between categorical, ordinal and interval variables. It is based on several refinements vĩ đại Pearson’s hypothesis test of independence of two variables. Essentially, the contingency test statistic of two variables is interpreted as coming from a rotated bi-variate normal distribution, where the tilt is interpreted as Phi_K.

The combined features of Phi_K khuông an advantage over existing coefficients. First, it works consistently between categorical, ordinal and interval variables. Second, it captures non-linear dependency. Third, it reverts vĩ đại the Pearson correlation coefficient in case of a bi-variate normal input distribution. These are useful features when studying the correlation matrix of variables with mixed types.

For details on the methodology behind the calculations, please see our publication. Emphasis is paid vĩ đại the proper evaluation of statistical significance of correlations and vĩ đại the interpretation of variable relationships in a contingency table, in particular in case of low statistics samples. The presented algorithms are easy vĩ đại use and available through this public Python library.

Example notebooks

Static link

Google Colab link

basic tutorial

basic on colab

advanced tutorial (detailed configuration)

Xem thêm: doraemon và nobita

advanced on colab

spark tutorial

no spark available

Documentation

The entire Phi_K documentation including tutorials can be found at read-the-docs. See the tutorials for detailed examples on how vĩ đại lập cập the code with pandas. We also have one example on how calculate the Phi_K correlation matrix for a spark dataframe.

Check it out

The Phi_K library requires Python >= 3.8 and is pip friendly. To get started, simply do:

$ pip install phik

or kiểm tra out the code from out GitHub repository:

$ git clone https://github.com/KaveIO/PhiK.git
$ pip install -e PhiK/

where in this example the code is installed in edit mode (option -e).

You can now use the package in Python with:

import phik

Congratulations, you are now ready vĩ đại use the PhiK correlation analyzer library!

Quick run

As a quick example, you can do:

Xem thêm: sword art online movie ordinal scale vietsub full

import pandas as pd
import phik
from phik import resources, report

# open fake xế hộp insurance data
df = pd.read_csv( resources.fixture('fake_insurance_data.csv.gz') )
df.head()

# Pearson's correlation matrix between numeric variables (pandas functionality)
df.corr()

# get the phi_k correlation matrix between all variables
df.phik_matrix()

# get global correlations based on phi_k correlation matrix
df.global_phik()

# get the significance matrix (expressed as one-sided Z)
# of the hypothesis test of each variable-pair dependency
df.significance_matrix()

# contingency table of two columns
cols = ['mileage','car_size']
df[cols].hist2d()

# normalized residuals of contingency test applied vĩ đại cols
df[cols].outlier_significance_matrix()

# show the normalized residuals of each variable-pair
df.outlier_significance_matrices()

# generate a phik correlation report and save as test.pdf
report.correlation_report(df, pdf_file_name='test.pdf')

For all available examples, please see the tutorials at read-the-docs.

Contact and support

  • Issues and Ideas: https://github.com/kaveio/phik/issues

Please note that tư vấn is (only) provided on a best-effort basis.