Economics

Sectoral Income Inequality Dataset

The Leiden LIS Sectoral Income Inequality Dataset, assembled by Chen Wang, Stefan Thewissen and Olaf van Vliet (Version 1.1, March 2014), contains information on multiple indicators of earnings inequality and employment within 9 sectors and 12 subsectors, drawing upon micro data from Luxembourg Income Study (LIS). Combined with version 1.0 data are available for a total of 49 LIS waves, providing data for 12 developed countries between 1969 and 2005. Compared to version 1.0 of the dataset, version 1.1 presents updated data for the main part of the first version, namely, for 8 developed countries and 31 LIS waves between 1984 and 2005. Additional information of earnings and employment at the country level is included.

The data source for our database is the microdata accessed between September 2013 and March 2014, through the secured remote-execution system from the Luxembourg Income Study (LIS) Database. You can also access our database via the website of the LIS: Cross-National Data Center in Luxembourg.

The Leiden LIS Sectoral Income Inequality Dataset, assembled by Chen Wang, Stefan Thewissen, and Olaf van Vliet (Version 1.1, March 2014), contains information on multiple indicators of earnings inequality and employment within 9 sectors and 12 subsectors, drawing upon micro data from Luxembourg Income Study (LIS). Combined with version 1.0 data are available for a total of 49 LIS waves, providing data for 12 developed countries between 1969 and 2005. Compared to version 1.0 of the dataset, version 1.1 presents updated data for the main part of the first version, namely, for 8 developed countries and 31 LIS waves between 1984 and 2005. Additional information of earnings and employment at the country level is included.

The Leiden LIS Sectoral Income Inequality Dataset allows researchers and public policy analysts to compare sectoral earnings inequality and employment levels across developed countries over the last three decades, based on a classification of sectors standardised across countries and periods. The data can be linked to other sectoral databases, such as the OECD Structural Analysis (STAN) database. The database extends the work of Mahler, Jesuit, and Roscoe (Mahler et al., 1999) who calculate sectoral earnings inequality in 10 countries around the years 1985 and 1990.

Sectors, countries, and time periods
Industries are classified based on the International Standard of Industrial Classification (ISIC) rev. 3.0 at the two digit level. These include: agriculture, mining, manufacturing, utilities, construction, wholesale, transport and telecommunications, financial services, and community services. The manufacturing and transport and telecommunication sectors are differentiated further using the ISIC 3.0 three digit level. These are the manufacturing of food, textiles, wood, paper, chemicals, minerals, basic metals, machinery and equipment, transport equipment, and manufacturing n.e.c. and recycling. The transport and telecommunication sector is distinguished further into transport and storage, and post and telecommunications at the three digit level. This leads to a total of 21 sectors for which information is available. The classification scheme is included as a worksheet in the dataset.

Data are available for the following countries and waves:

Table 1 Country and Wave Sample

Country	Available waves
Version 1.1 (updated data)
Czech Republic	1996, 2004
Denmark	1987, 1992, 1995, 2000, 2004
Finland	1987, 1991, 1995, 2000, 2004
Germany	1984, 1989, 1994, 2000, 2004
Ireland	1994-1996, 2004
Sweden	1987, 1992, 2000, 2005
UK	1986, 1999, 2004
US	1986, 1991, 1994, 2000, 2004
Version 1.0 (not updated in version 1.1)
Austria	2004
Belgium	1995, 2000
Ireland	1994, 1995, 1996, 2000
Poland	1986, 1992, 1995, 1999, 2004
Spain	1995, 2000
Sweden	1981
UK	1969, 1979
US	1979, 1997

Labour earnings and sample definition
We calculate labour earnings both at the household and individual level. We follow the labour earnings definition of Mahler et al. (1999), that is, we only include income from wages and salaries or self-employment. Income from other sources, such as interest and rent, is excluded. Also excluded are public benefits and income taxes. For three waves (Belgium 2000, Ireland 2000, Spain 2000) only net earnings are available. For all calculations we apply standard LIS top- and bottom coding conventions.

We restrict our sample to ‘prime age workers’, people aged between 25 and 54 with nonzero earnings. This group probably has the strongest labour market attachment as their earnings are less affected by retirement and schooling decisions. Based on this sample, we calculate the earnings inequality using household information (following Mahler et al., 1999) and using individual information for three sample definitions. For household earnings, we attribute the household earnings to the sector in which the household head is working. For our calculations based on individual earnings, we attribute the individual earnings to the sector in which the specific individual is working. We distinguish between three groups of individuals where we again only include people aged between 25 and 54 with nonzero earnings: only household heads, household heads and spouses, and all household members.

Indicators
The dataset contains information both at the country and at the sectoral level. At the country level it provides information on the (weighted) number of households and individuals pooled across all sectors. Also the Gini index for household earnings, and the Gini index and mean log deviation for individual earnings are shown at the country level for the same sample used in the sectoral analyses. For the inequality indicators based on household earnings we correct for differences in household size using the square root equivalence scale.

At the sectoral level, we provide multiple inequality indicators. For the calculations based on household information, the Gini index, the P90/P10 ratio, the mean log deviation, the Theil index, and the Atkinson index with inequality aversion parameter ε = 0.5 are included. We also include bootstrapped standard errors for the Gini index. To correct for possible underestimation of the level of inequality at the sectoral level in small sectors, we also provide the first order corrected Gini index based on Deltas (2003). For the sectoral calculations based on individual data, the dataset contains the (first order corrected) Gini index and the mean log deviation for the three sample definitions.

As a measure of inequality between sectors the dataset comprises the ratio of median earnings in a certain sector as a proportion of the median earnings in a country of the same wave, both for household and individual information. In addition, the dataset encompasses the relative employment sizes of sectors, defined as the number of households or individuals working in a sector compared to the total number of households or individuals.

Data file
The data file is presented in a Microsoft Excel worksheet file. The file consist of a worksheet with the data, the variable list, and the classification scheme.

Comparison with version 1.0
Version 1.1 of the Leiden LIS Sectoral Income Inequality Dataset updates data for the main part of version 1.0. This main part of the data consists of the waves between 1984 and 2005 for the set of countries for which comparable earnings information is available. This part of the dataset can be used for panel data analysis; the exact countries and waves are presented in Table 1.

Version 1.1 uses a slightly broader and more consistent sample definition for all variables. In this second version the number of individuals included is higher, leading to more accurate inequality and earnings estimations. Nevertheless, correlations between the variables from old and new data are very high (well above 0.99 for the main indicators based on individual data). The new sample definition is also consistent with the definition for the indicators based on household information. Hence, these are not updated.

For the variables based on individual information the new LIS data are used (version 7; October 2013 update). The variables based on household information are, as in version 1.0 of the Leiden LIS Sectoral Income Inequality Dataset, based on LIS data version 7; March 2012 update. The differences between the LIS March 2012 and October 2013 updates are very small. Thus, variables from versions 1.0 and 1.1 can be used interchangeably.

Citation
Reference to the data: Wang, C., Thewissen, S., and Van Vliet, O. (2014) ‘Leiden LIS Sectoral Income Inequality Dataset version 1.1’, Leiden University.

Contact
Any questions regarding the Leiden LIS Sectoral Income Inequality Dataset may be addressed to:

Chen Wang, Economics Department, Leiden University, PO Box 9520, 2300 RA Leiden, The Netherlands
Stefan Thewissen, Economics Department, Leiden University, PO Box 9520, 2300 RA Leiden, The Netherlands.
Olaf van Vliet, Economics Department, Leiden University, PO Box 9520, 2300 RA Leiden, The Netherlands.

Papers and publications based on the Leiden LIS Sectoral Income Inequality Dataset

S. Thewissen, O. van Vliet and C. Wang (2017), ‘Taking the Sector Seriously: Data, Developments and Drivers of Intrasectoral Earnings Inequality’, Social Indicators Research, DOI: 10.1007/s11205-017-1677-2.
S. Thewissen and O. van Vliet (2014), ‘Competing with the Dragon: Employment and Wage Effects of Chinese Trade Competition in 17 Sectors Across 18 OECD Countries’, LIS Working Paper Series No. 623.
S. Thewissen, C. Wang, and O. van Vliet (2013), 'Sectoral trends in earnings inequality and employment: International trade, skill-biased technological change, or labour market institutions? ', LIS Working Paper Series 595.
S. Thewissen, O. van Vliet, and C. Wang (2013), ‘Sectorale loonongelijkheid en werkgelegenheid in internationaal perspectief tussen 1985-2005’, TPEdigitaal 7(3), pp. 139-160.