Tag Archives: sankey diagram

Ecological Inference

As a customer, special promotions offered by companies makes me happy. Since companies are aware of that they try to know their customer better, learn more about them to define promotions suit well to them. On the other hand, as a customer I do not want to give my personal information to companies. Since the most of the citizens have similar intention about their personal information, there are regulations about that such as GDPR in EU Zone and KVKK in Turkey. Despite to two attitude of customers seems contradictory, the companies which consider customer is always right and work on to solve paradox will be the winner.

Ecological inference method is a promising solution alternative to the problem. The method tries to get individual level behavior from aggregated data. To solve problem several statistical approaches are offered. Since different individual behaviors can result in same aggregate data it is not possible to solve problem with exact results. In literature mostly problem is defined on voting behavior of different groups. Gary King has remarkable studies on topic and published several books and papers.

There is a library called PyEi is available on GitHub. It allows you to easily benefit from ecological inference method by python. There are several models for inference in the library, one can select most suitable for their need. Output of the inference is in the format of percentages. Then you can generate flowing values from referencing to your original data. My favorite visualisation technique to present ecological inference result is Sankey diagram. It clearly shows flowing amount from origin to destination. 

I failed to find an open source data to use ecological inference in marketing context. I hope to share a tiny study about that when I discover a convenient open source data.