Japan life tables
I recently watched this Vice video on Youtube which is a coverage about the Japanese sex industry. Except for the fact that it looks like it was made with a message in mind that ressembles "the Japanese are weird", it cited government surveys about the population of Japan declining. Given that I have been interested in data relating to death statistics and more generally statistics about populations, I decided to examine the Japanese age pyramid.
In this endeavour, the first step was to gather the right data. I was able to find life tables for Japan at the Ministry of Health, Labor and Welfare website.
In this article, I'm going to be working on the data for Japanese men. The linked file contains two interesting things:
- the number of surviving people in an hypothetical cohort of 100000 people starting as babies
- the probability of dying at a given age (death rate)
data = """Age,number of survivors,number of deaths,survivors rate,death rate,force of mortality,life expectancy,stationary L,stationary T
0,100000,246,0.99754,0.00246,0.09375,79.55,99808,7955005
1,99754,37,0.99963,0.00037,0.00057,78.75,99733,7855198
2,99716,26,0.99974,0.00026,0.00026,77.78,99704,7755464
3,99690,18,0.99982,0.00018,0.00022,76.80,99681,7655761
4,99672,13,0.99987,0.00013,0.00015,75.81,99665,7556080
5,99659,11,0.99989,0.00011,0.00012,74.82,99653,7456415
6,99647,10,0.99990,0.00010,0.00011,73.83,99642,7356762
7,99637,9,0.99991,0.00009,0.00010,72.84,99632,7257120
8,99628,8,0.99992,0.00008,0.00009,71.84,99623,7157488
9,99619,8,0.99992,0.00008,0.00008,70.85,99615,7057865
10,99612,8,0.99992,0.00008,0.00008,69.85,99608,6958249
11,99603,10,0.99990,0.00010,0.00009,68.86,99599,6858642
12,99594,11,0.99989,0.00011,0.00010,67.87,99588,6759043
13,99583,13,0.99987,0.00013,0.00012,66.87,99577,6659454
14,99570,15,0.99985,0.00015,0.00014,65.88,99563,6559878
15,99555,19,0.99981,0.00019,0.00017,64.89,99546,6460315
16,99536,24,0.99976,0.00024,0.00021,63.90,99525,6360769
17,99512,30,0.99970,0.00030,0.00027,62.92,99498,6261244
18,99482,37,0.99962,0.00038,0.00034,61.94,99464,6161746
19,99445,44,0.99955,0.00045,0.00041,60.96,99423,6062282
20,99401,51,0.99949,0.00051,0.00048,59.99,99376,5962859
21,99350,57,0.99943,0.00057,0.00055,59.02,99322,5863483
22,99293,61,0.99939,0.00061,0.00060,58.05,99262,5764162
23,99232,63,0.99936,0.00064,0.00063,57.09,99200,5664899
24,99169,64,0.99936,0.00064,0.00064,56.12,99137,5565699
25,99105,64,0.99936,0.00064,0.00064,55.16,99073,5466562
26,99041,64,0.99935,0.00065,0.00065,54.19,99009,5367489
27,98977,66,0.99934,0.00066,0.00066,53.23,98944,5268480
28,98911,67,0.99933,0.00067,0.00067,52.26,98878,5169536
29,98845,68,0.99932,0.00068,0.00068,51.30,98811,5070658
30,98777,68,0.99931,0.00069,0.00069,50.33,98743,4971847
31,98709,70,0.99929,0.00071,0.00070,49.37,98674,4873104
32,98638,73,0.99926,0.00074,0.00073,48.40,98602,4774431
33,98565,76,0.99923,0.00077,0.00076,47.44,98527,4675829
34,98489,80,0.99919,0.00081,0.00079,46.48,98449,4577302
35,98409,84,0.99915,0.00085,0.00083,45.51,98367,4478853
36,98325,89,0.99910,0.00090,0.00088,44.55,98281,4380486
37,98236,96,0.99902,0.00098,0.00094,43.59,98188,4282205
38,98139,106,0.99892,0.00108,0.00103,42.63,98087,4184017
39,98034,115,0.99882,0.00118,0.00113,41.68,97977,4085929
40,97918,126,0.99872,0.00128,0.00123,40.73,97857,3987952
41,97793,137,0.99860,0.00140,0.00134,39.78,97725,3890096
42,97656,149,0.99848,0.00152,0.00146,38.83,97583,3792370
43,97508,162,0.99834,0.00166,0.00159,37.89,97428,3694788
44,97346,176,0.99819,0.00181,0.00173,36.95,97259,3597360
45,97170,192,0.99802,0.00198,0.00189,36.02,97075,3500100
46,96978,210,0.99784,0.00216,0.00207,35.09,96875,3403025
47,96768,231,0.99762,0.00238,0.00227,34.17,96655,3306151
48,96538,254,0.99737,0.00263,0.00250,33.25,96413,3209496
49,96284,278,0.99711,0.00289,0.00276,32.33,96147,3113083
50,96006,304,0.99683,0.00317,0.00303,31.42,95856,3016936
51,95702,332,0.99653,0.00347,0.00332,30.52,95538,2921080
52,95370,363,0.99619,0.00381,0.00364,29.63,95191,2825542
53,95006,398,0.99581,0.00419,0.00400,28.74,94810,2730351
54,94608,436,0.99539,0.00461,0.00440,27.86,94393,2635541
55,94172,478,0.99493,0.00507,0.00485,26.98,93936,2541148
56,93694,523,0.99442,0.00558,0.00534,26.12,93436,2447212
57,93171,570,0.99388,0.00612,0.00586,25.26,92890,2353776
58,92601,620,0.99331,0.00669,0.00642,24.42,92295,2260886
59,91981,673,0.99268,0.00732,0.00701,23.58,91649,2168591
60,91308,739,0.99190,0.00810,0.00773,22.75,90944,2076941
61,90568,804,0.99112,0.00888,0.00853,21.93,90172,1985997
62,89764,862,0.99039,0.00961,0.00929,21.12,89338,1895826
63,88902,922,0.98963,0.01037,0.01003,20.32,88446,1806488
64,87980,986,0.98879,0.01121,0.01083,19.53,87492,1718042
65,86994,1056,0.98786,0.01214,0.01172,18.74,86472,1630549
66,85938,1134,0.98681,0.01319,0.01273,17.97,85378,1544077
67,84804,1216,0.98566,0.01434,0.01385,17.20,84203,1458699
68,83588,1298,0.98447,0.01553,0.01503,16.44,82946,1374496
69,82290,1386,0.98315,0.01685,0.01629,15.70,81605,1291551
70,80904,1490,0.98158,0.01842,0.01775,14.96,80168,1209946
71,79413,1607,0.97977,0.02023,0.01947,14.23,78620,1129778
72,77807,1732,0.97773,0.02227,0.02143,13.51,76952,1051158
73,76074,1876,0.97534,0.02466,0.02367,12.81,75149,974206
74,74199,2043,0.97247,0.02753,0.02636,12.12,73192,899057
75,72156,2228,0.96913,0.03087,0.02955,11.45,71058,825865
76,69928,2432,0.96522,0.03478,0.03328,10.79,68730,754807
77,67496,2645,0.96081,0.03919,0.03759,10.16,66192,686077
78,64851,2866,0.95580,0.04420,0.04249,9.56,63436,619885
79,61985,3083,0.95026,0.04974,0.04802,8.98,60461,556449
80,58902,3279,0.94432,0.05568,0.05407,8.42,57278,495988
81,55622,3453,0.93792,0.06208,0.06056,7.89,53910,438711
82,52169,3619,0.93063,0.06937,0.06780,7.38,50374,384801
83,48550,3783,0.92207,0.07793,0.07629,6.89,46672,334427
84,44767,3918,0.91248,0.08752,0.08618,6.43,42817,287756
85,40849,3997,0.90215,0.09785,0.09717,6.00,38854,244938
86,36852,3990,0.89173,0.10827,0.10871,5.59,34853,206085
87,32862,3919,0.88074,0.11926,0.12062,5.21,30894,171231
88,28943,3802,0.86865,0.13135,0.13362,4.85,27031,140337
89,25141,3646,0.85497,0.14503,0.14839,4.51,23303,113307
90,21495,3448,0.83959,0.16041,0.16615,4.19,19752,90003
91,18047,3171,0.82431,0.17569,0.18378,3.89,16436,70252
92,14876,2855,0.80805,0.19195,0.20290,3.62,13421,53816
93,12021,2515,0.79078,0.20922,0.22364,3.36,10734,40395
94,9506,2163,0.77245,0.22755,0.24614,3.12,8395,29661
95,7343,1813,0.75305,0.24695,0.27056,2.90,6407,21266
96,5529,1479,0.73256,0.26744,0.29704,2.69,4763,14859
97,4051,1171,0.71095,0.28905,0.32578,2.49,3441,10096
98,2880,898,0.68823,0.31177,0.35695,2.31,2410,6655
99,1982,665,0.66440,0.33560,0.39077,2.14,1632,4245
100,1317,475,0.63949,0.36051,0.42746,1.98,1065,2613
101,842,325,0.61351,0.38649,0.46726,1.84,669,1548
102,517,214,0.58652,0.41348,0.51044,1.70,402,879
103,303,134,0.55858,0.44142,0.55729,1.58,231,478
104,169,80,0.52977,0.47023,0.60811,1.46,126,247
105,90,45,0.50020,0.49980,0.66325,1.35,65,121
106,45,24,0.46998,0.53002,0.72307,1.25,32,56
107,21,12,0.43925,0.56075,0.78796,1.16,14,24
108,9,5,0.40818,0.59182,0.85837,1.07,6,10
109,4,2,0.37696,0.62304,0.93474,0.99,2,4
110,1,1,0.34578,0.65422,1.01761,0.92,1,1"""
First, let's set up our pylab environment.
%pylab inline
rcParams['figure.figsize'] = (10, 4)
Populating the interactive namespace from numpy and matplotlib
Now, let's read the data displayed above to a numpy array:
data = data.split('\n')[1:]
data = array([elem.split(',') for elem in data], dtype=np.double)
data.shape
(111, 9)
First, let's plot the survivors as a function of their age.
plot(data[:, 0], data[:, 1], label = "number of survivors")
legend(loc='lower left')
xlabel('age')
grid(True)
Now, let's plot the death rate.
plot(data[:, 0], data[:, 4], label = "death rate for men")
legend(loc='upper left')
grid(True)
From there, we can build a simple model of an evolving population. The state equation we're going to use is the following (adapted from http://en.wikipedia.org/wiki/Demographic_analysis): $$ P_{new} = P_{old} + B - D $$
where $P$ is the population number, $B$ is the number of births and $D$ the number of deaths. We will describe the population by a vector $P = P_i$ containing the respective population numbers for age $i$, i.e. $P_{12}$ is the population aged between 12 and 13. For each iteration, $P_0$ gets incremented by the number of births while all other age groups $i$ get decremented by $P_i * d_i$ where $d_i$ is the death probability for the age group in question.
Looking for the birth rates, I found the following figure here for the year 2011: 8.3 children per 1000 population per year. We will ignore the infant and neonatal death rates for this figure.
Looking for the current age pyramid, I found the following link. Unfortunately, it does not provide a very fine grained population pyramid, therefore I'll be interpolating it linearly on one year age categories for the following simulation.
First, let's do the data formatting.
age_data_men = """0 529000
1 543000
2 529000
3 529000
4 544000
5 2746000
10 2983000
15 3068000
20 3117000
25 3495000
30 3889000
35 4712000
40 4727000
45 4077000
50 3802000
55 3917000
60 4997000
65 3914000
70 3426000
75 2730000
80 1822000
85 899000
90 265000
95 60000
100 7000"""
age_data_men = array([elem.split('\t') for elem in age_data_men.split('\n')], dtype=np.int32)
We'll use the following approach for interpolation. First, we'll build a function that returns the number of people above a given age. Second, we'll build a function that gives the number of people in an age interval by substracting two numbers from the previous function.
def population_above(age, age_data=age_data_men):
"""returns population above age
"""
if age >= age_data[-1, 0]:
return age_data[-1, 1]
index = (age_data[:, 0] >= age).nonzero()[0][0]
if age_data[index, 0] == age:
return sum(age_data[index:, 1])
else:
return sum(age_data[index:, 1]) + (age_data[index, 0] - age) / float(age_data[index, 0] - age_data[index - 1, 0]) * age_data[index - 1, 1]
def population_between(start, end, age_data=age_data_men):
"""returns population between start and end ages
start and end are expected to be integers"""
return population_above(start, age_data=age_data) - population_above(end, age_data=age_data)
We can now test this function.
print population_between(0, 1), 529000
print population_between(1, 2), 543000
print population_between(5, 10), 2746000
print population_between(5, 15), (2746000 + 2983000)
print population_between(5, 6), (2746000 / 5.)
print population_between(10, 11), (2983000 / 5.)
print population_between(5, 11), (2746000 + 2983000 / 5.)
print population_between(9, 10), (2746000 / 5.)
529000 529000 543000 543000 2746000 2746000 5729000 5729000 549200.0 549200.0 596600.0 596600.0 3342600.0 3342600.0 549200.0 549200.0
From there, we can construct the age pyramid for men.
ages = arange(100)
pyramid = [population_between(age, age + 1) for age in ages]
plot(ages, pyramid)
[<matplotlib.lines.Line2D at 0xb1c332c>]
Just a quick check:
print sum(pyramid), sum(age_data_men[:, 1])
61320000.0 61327000
We can now iterate the simulation using the curve of death probability and the birth data. In the next cell, we're defining a function that constructs a new age pyramid from an old pyramid taking into account deaths and births.
def construct_new_pyramid(old_pyramid, death_probs, birth_rate):
births = birth_rate / 1000 * sum(old_pyramid)
new_pyramid = old_pyramid * (1 - death_probs)
new_pyramid[-2] += new_pyramid[-1]
new_pyramid[1:] = new_pyramid[0:-1]
new_pyramid[0] = births
return new_pyramid
death_probs = data[:, 4][:ages.shape[0]]
plot(ages, pyramid, label='start population')
plot(ages, construct_new_pyramid(pyramid, death_probs, 8.3), label='population after one year')
legend(loc='lower left')
<matplotlib.legend.Legend at 0xb1cffec>
As one can see, the population shifts to the right, while newborn babies are added to the left of the plot. We can iterate this behavious several times to construct estimates in the future for the years 2023, 2033 and 2043.
def iterate_pyramid(pyramid, death_probs, birth_rate, years):
for year in range(years):
pyramid = construct_new_pyramid(pyramid, death_probs, birth_rate)
return pyramid
plot(ages, pyramid, label='2013')
plot(ages, iterate_pyramid(pyramid, death_probs, 8.3, 10), label='2023')
plot(ages, iterate_pyramid(pyramid, death_probs, 8.3, 20), label='2033')
plot(ages, iterate_pyramid(pyramid, death_probs, 8.3, 30), label='2043')
legend(loc='lower left')
<matplotlib.legend.Legend at 0xb389b2c>
We can also plot population estimates according to this simulation model for one hundred years in the future:
future_years = arange(100)
total_men_population = [sum(iterate_pyramid(pyramid, death_probs, 8.3, future_year)) for future_year in future_years]
plot(future_years, array(total_men_population) / 1e6)
title('Total men population in Japan')
xlabel('Years in the future')
ylabel('Population (million)')
grid(True)
As one can see, there is no convergence, only steady decline. This is due to the fact that the birth rate is given per 1000 people. Thus, when the total population declines, the number of births declines and so on. This is better seen on a 1000 year outlook, where the population goes to zero in the long run:
future_years = arange(1000)
total_men_population = [sum(iterate_pyramid(pyramid, death_probs, 8.3, future_year)) for future_year in future_years]
plot(future_years, array(total_men_population) / 1e6)
title('Total men population in Japan')
xlabel('Years in the future')
ylabel('Population (million)')
grid(True)