Documentation Center

  • Trials
  • Product Updates

Contents

Fit Probability Distribution Objects to Grouped Data

This example shows how to fit probability distribution objects to grouped sample data, and create a plot to visually compare the pdf of each group.

Step 1. Load sample data.

Load the sample data.

load carsmall;

The data contains miles per gallon (MPG) measurements for different makes and models of cars, grouped by country of origin (Origin), model year (Model_Year), and other vehicle characteristics.

Step 2. Create a nominal array.

Transform Origin into a nominal array and remove the Italian car from the sample data. Since there is only one Italian car, fitdist cannot fit a distribution to that group. Removing the Italian car from the sample data prevents fitdist from returning an error.

Origin = nominal(Origin);
MPG2 = MPG(Origin~='Italy');
Origin2 = Origin(Origin~='Italy');
Origin2 = droplevels(Origin2,'Italy');

Step 3. Fit kernel distributions to each group.

Use fitdist to fit kernel distributions to each country of origin group in the MPG data.

[KerByOrig,Country] = fitdist(MPG2,'Kernel','by',Origin2)
KerByOrig = 

  Column 1

    [1x1 prob.KernelDistribution]

  Column 2

    [1x1 prob.KernelDistribution]

  Column 3

    [1x1 prob.KernelDistribution]

  Column 4

    [1x1 prob.KernelDistribution]

  Column 5

    [1x1 prob.KernelDistribution]


Country = 

    'France'
    'Germany'
    'Japan'
    'Sweden'
    'USA'

The cell array KerByOrig contains five kernel distribution objects, one for each country represented in the sample data. Each object contains properties that hold information about the data, the distribution, and the parameters. The array Country lists the country of origin for each group in the same order as the distribution objects are stored in KerByOrig.

Step 4. Compute the pdf for each group.

Extract the probability distribution objects for Germany, Japan, and USA. Use the positions of each country in KerByOrig shown in Step 3, which indicates that Germany is the second country, Japan is the third country, and USA is the fifth country. Compute the pdf for each group.

Germany = KerByOrig{2};
Japan = KerByOrig{3};
USA = KerByOrig{5};

x = 0:1:50;

USA_pdf = pdf(USA,x);
Japan_pdf = pdf(Japan,x);
Germany_pdf = pdf(Germany,x);

Step 5. Plot the pdf for each group.

Plot the pdf for each group on the same figure.

figure;
plot(x,USA_pdf,'r-');
hold on;
plot(x,Japan_pdf,'b-.');
plot(x,Germany_pdf,'k:');
legend({'USA','Japan','Germany'},'Location','NW');
title('MPG by Country of Origin');
xlabel('MPG');

The resulting plot shows how miles per gallon (MPG) performance differs by country of origin (Origin). Using this data, the USA has the widest distribution, and its peak is at the lowest MPG value of the three origins. Japan has the most regular distribution with a slightly heavier left tail, and its peak is at the highest MPG value of the three origins. The peak for Germany is between the USA and Japan, and the second bump near 44 miles per gallon suggests that there might be multiple modes in the data.

See Also

|

Related Examples

More About

Was this topic helpful?