mixture models for diverse machine translation: tricks of...
TRANSCRIPT
-
!1
Mixture Models for Diverse Machine Translation: Tricks of the Trade
Tianxiao Shen* Myle Ott* Michael Auli Marc’Aurelio Ranzato [email protected] (*: Equal contribution)
ICML 2019
-
!2
Translation Is One-To-Many
danke
thank you
thanks
thank you very much
German
English
is multi-modal, a sentence can have different translationsp(y|x)AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7W
sie brauchen zeit
you need time
they need time
it takes time
-
!3
Translation Is One-To-Many
danke
thank you
thanks
thank you very much
German
English
is multi-modal, a sentence can have different translationsp(y|x)AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7WAAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPRY8OKxgv2AdinZNNvGZpMlyYrL2v/gxYMiXv0/3vw3pu0etPXBwOO9GWbmBTFn2rjut1NYW9/Y3Cpul3Z29/YPyodHbS0TRWiLSC5VN8CaciZoyzDDaTdWFEcBp51gcj3zOw9UaSbFnUlj6kd4JFjICDZWasfV9OnxfFCuuDV3DrRKvJxUIEdzUP7qDyVJIioM4VjrnufGxs+wMoxwOi31E01jTCZ4RHuWChxR7Wfza6fozCpDFEplSxg0V39PZDjSOo0C2xlhM9bL3kz8z+slJrzyMybixFBBFovChCMj0ex1NGSKEsNTSzBRzN6KyBgrTIwNqGRD8JZfXiXti5rn1rzbeqVRz+MowgmcQhU8uIQG3EATWkDgHp7hFd4c6bw4787HorXg5DPH8AfO5w84So7W
sie brauchen zeit
you need time
they need time
it takes time
Goal: efficiently decode a diverse set of hypotheses
-
!4
Neural Machine Translation
Input: source sentence Output: target translation
x = x1, · · · , xLAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLuAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLuAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLuAAAB+XicbVBNS8NAEN3Ur1q/oh69LBbBQymJFPQiFLx48FDBfkAbwmazbZduNmF3UlpC/4kXD4p49Z9489+4bXPQ1gcDj/dmmJkXJIJrcJxvq7CxubW9U9wt7e0fHB7ZxyctHaeKsiaNRaw6AdFMcMmawEGwTqIYiQLB2sHobu63x0xpHssnmCbMi8hA8j6nBIzk2/bkduK7lR4NY9CVif/g22Wn6iyA14mbkzLK0fDtr14Y0zRiEqggWnddJwEvIwo4FWxW6qWaJYSOyIB1DZUkYtrLFpfP8IVRQtyPlSkJeKH+nshIpPU0CkxnRGCoV725+J/XTaF/42VcJikwSZeL+qnAEON5DDjkilEQU0MIVdzciumQKELBhFUyIbirL6+T1lXVdaruY61cr+VxFNEZOkeXyEXXqI7uUQM1EUVj9Ixe0ZuVWS/Wu/WxbC1Y+cwp+gPr8weZWJLu
y = y1, · · · , yTAAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBbBQymJFPQiFLx4rNAvaEPYbLbt0s0m7G4KIfSfePGgiFf/iTf/jds2B219MPB4b4aZeUHCmdKO822VtrZ3dvfK+5WDw6PjE/v0rKviVBLaITGPZT/AinImaEczzWk/kRRHAae9YPqw8HszKhWLRVtnCfUiPBZsxAjWRvJtO7vPfLc2JGGsVS3z275dderOEmiTuAWpQoGWb38Nw5ikERWacKzUwHUS7eVYakY4nVeGqaIJJlM8pgNDBY6o8vLl5XN0ZZQQjWJpSmi0VH9P5DhSKosC0xlhPVHr3kL8zxukenTn5UwkqaaCrBaNUo50jBYxoJBJSjTPDMFEMnMrIhMsMdEmrIoJwV1/eZN0b+quU3efGtVmo4ijDBdwCdfgwi004RFa0AECM3iGV3izcuvFerc+Vq0lq5g5hz+wPn8AqiKS+Q==
p(y|x; ✓) =TY
t=1
p(yt|y1:t�1, x; ✓)AAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHKAAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHKAAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHKAAACHnicbVDLSgMxFM34rPU16tJNsAgVtExEUZRCwY3LCtYKbR0yadqGZh4kd8RhnC9x46+4caGI4Er/xrQW0eqBwOGcc7m5x4uk0OA4H9bE5NT0zGxuLj+/sLi0bK+sXugwVozXWChDdelRzaUIeA0ESH4ZKU59T/K61z8Z+PVrrrQIg3NIIt7yaTcQHcEoGMm196Nicntz3IQeB7pVbkYqbLsplEl2dY6N58Jt4qbkCHZItv2dc+2CU3KGwH8JGZECGqHq2m/NdshinwfAJNW6QZwIWilVIJjkWb4Zax5R1qdd3jA0oD7XrXR4XoY3jdLGnVCZFwAeqj8nUuprnfieSfoUenrcG4j/eY0YOoetVARRDDxgX4s6scQQ4kFXuC0UZyATQyhTwvwVsx5VlIFpNG9KIOMn/yUXuyXilMjZXqGyN6ojh9bRBioigg5QBZ2iKqohhu7QA3pCz9a99Wi9WK9f0QlrNLOGfsF6/wQjOaHK
(opennmt.net)
-
!5
Search for Multiple Modes Is Difficult…
参与投票的成员中,58% 反对该合同交易易。Source
Of
Fifty-eight per cent of
It
those
the voting members
opposed the contract dealvoting
transaction
the
was
opposed the contract deal
argmaxy1,··· ,yT
TY
t=1
p(yt|y1:t�1, x; ✓)AAACfXicfZBdaxQxFIaz41cdv7Z66U1wWagyLjO1UKkIC3rhjVhhty3srEMmc3Y3NJOE5EzpMM7f8dd4q+Cv0ex2BG3FA4GH97wnyXlzI4XDOP7RC65dv3Hz1tbt8M7de/cf9LcfHjldWQ5TrqW2JzlzIIWCKQqUcGIssDKXcJyfvln3j8/AOqHVBGsD85ItlVgIztBLWX+cMrss2XnW1FkSpbzQ6KI6m7Q0NVYXWYOvk/bThJqdOsPPddYkB/g8aaPzVymuANnTrD+IR/Gm6FVIOhiQrg6z7d4wLTSvSlDIJXNulsQG5w2zKLiENkwrB4bxU7aEmUfFSnDzZrNqS4deKehCW38U0o3650TDSufqMvfOkuHKXe6txX/1ZhUuXs4boUyFoPjFQ4tKUtR0nRsthAWOsvbAuBX+r5SvmGUcfbph+hb8Lhbe+3s/GLAMtX3WdNm2frdlGq3pf0ahfhs9haEPNrkc41U42h0lL0a7H/cG470u4i3ymDwhOyQh+2RM3pFDMiWcfCFfyTfyvfczGAZRMLqwBr1u5hH5q4L9X9h3w1Q=
Beam search can effectively find one likely but cannot explore multiple modes
yAAACh3icfVBraxNBFJ1sfbTro2n96JfBEIgS1p1NYwNSqOgHsYgVTFvIxDA7uUmG7ouZu9LNuv/JXyP4SX+Ks2nqC/XCwJlzzr0z94RZpAz6/peGs3Ht+o2bm1vurdt37m43d3ZPTJprCUOZRqk+C4WBSCUwRIURnGUaRBxGcBqeP6/10w+gjUqTd1hkMI7FPFEzJQVaatJ81c46xceLpxwXgOLhATd5PCmXB6x6f0SzzvKnVPuW3R9Xt81L1uVymqLpHvHKLSbNlu/t94PB3oD6nt9jPmM1CPp91qPM81fVIus6nuw02nyayjyGBGUkjBkxP8NxKTQqGUHl8txAJuS5mMPIwkTEYMblaumKti0zpbNU25MgXbG/dpQiNqaIQ+uMBS7Mn1pN/k0b5TgbjEuVZDlCIi8fmuURxZTWCdKp0iAxKiwQUiv7VyoXQguJNmeXvwC7i4bXdu6bDLTAVD8qudDzWFxUdrc579bof0aVXBktcl0b7FV69N/gJPBYzwve7rUOg3XEm+Q+eUA6hJF9ckhekmMyJJJ8Ip/JV/LN2XIeO0+cwaXVaax77pHfynn2HUeVxZU=
It was rejected by 58 % of its members who voted in the ballot . Of the members who voted , 58 % opposed the contract transaction . Of the members who participated in the vote , 58 % opposed the contract .
References
-
!6
Explicitly Model Uncertainty
Introduce a latent variable to capture different translation modes
Better explore the search space, decode different from different
zAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF48t2A9oQ9lsJ+3azSbsboQa+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiagllittbCRtTRZmx2ZRsCN7qy+ukfVX13KrXrFXqtTyOIpzBOVyCB9dQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A5QuM8A==
zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw==
yAAACPHicfZDLSgMxFIYzXut4a3XpZrAIIlJmakGXBV24EVuwF+iUciY9bYOZzJBkxDL0Cdzq6/ge7t2JW9emF0GteCDw8Z8/yTl/EHOmtOu+WAuLS8srq5k1e31jc2s7m9upqyiRFGs04pFsBqCQM4E1zTTHZiwRwoBjI7g9H/cbdygVi8SNHsbYDqEvWI9R0EaqDjvZvFtwJ+XMgzeDPJlVpZOzDvxuRJMQhaYclGp5bqzbKUjNKMeR7ScKY6C30MeWQQEhqnY6mXTkHBil6/QiaY7QzkT9fiOFUKlhGBhnCHqgfvfG4l+9VqJ7Z+2UiTjRKOj0o17CHR0547WdLpNINR8aACqZmdWhA5BAtQnH9i/Q7CLxyrx7HaMEHcmj1AfZD+F+ZHbr+8dj+s/IxJfRkG2bYL3fMc5DvVjwTgrFailfLs0izpA9sk8OiUdOSZlckgqpEUqQPJBH8mQ9W6/Wm/U+tS5Yszu75EdZH59aOq3G
z1AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeCF48V7Qe0oWy2k3bpZhN2N0IN/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3H1FpHssHM0nQj+hQ8pAzaqx0/9T3+uWKW3XnIKvEy0kFcjT65a/eIGZphNIwQbXuem5i/Iwqw5nAaamXakwoG9Mhdi2VNELtZ/NTp+TMKgMSxsqWNGSu/p7IaKT1JApsZ0TNSC97M/E/r5ua8NrPuExSg5ItFoWpICYms7/JgCtkRkwsoUxxeythI6ooMzadkg3BW355lbRqVe+iWru7rNRreRxFOIFTOAcPrqAOt9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AELOo2W
z2AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mioMeCF48V7Qe0oWy2k3bpZhN2N0It/QlePCji1V/kzX/jts1BWx8MPN6bYWZemAqujet+O4W19Y3NreJ2aWd3b/+gfHjU1EmmGDZYIhLVDqlGwSU2DDcC26lCGocCW+HoZua3HlFpnsgHM04xiOlA8ogzaqx0/9Tze+WKW3XnIKvEy0kFctR75a9uP2FZjNIwQbXueG5qgglVhjOB01I305hSNqID7FgqaYw6mMxPnZIzq/RJlChb0pC5+ntiQmOtx3FoO2NqhnrZm4n/eZ3MRNfBhMs0MyjZYlGUCWISMvub9LlCZsTYEsoUt7cSNqSKMmPTKdkQvOWXV0nTr3oXVf/uslLz8ziKcAKncA4eXEENbqEODWAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QMMvo2X
z3AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0laQY8FLx4r2lZoQ9lsN+3SzSbsToQa+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T316/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvXq1drtRaVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8OQo2Y
It was rejected by 58 % of its members who voted in the ballot .
Of the members who voted , 58 % opposed the contract transaction .
Of the members who participated in the vote , 58 % opposed the contract .
p(y|z, x)AAACoXicfVFdb9MwFHXD1whfHTzyYlEVbahUSUHaBEKaBA/wgCio3SY1xXKc29ZaEkf2DWrm5QfyE/gVvMIbThck2BBXsnR07rnHvsdxkUqDQfCt4125eu36ja2b/q3bd+7e627fPzSq1AKmQqVKH8fcQCpzmKLEFI4LDTyLUziKT143/aMvoI1U+QSrAuYZX+ZyIQVHR7Gu6Bc71dl61+9HXC8zvma2YuEgEolCM6jYpKaRkQkYQFtbZvFVWH+e1FGhVULdKMOzitnwBT4N68H6ZYQrQO7cTtnIb5xPB+td1u0Fw2BT9DIIW9AjbY3ZdudxlChRZpCjSLkxszAocG65RilSqP2oNFBwccKXMHMw5xmYud2kUdO+YxK6UNqdHOmG/XPC8syYKoudMuO4Mhd7Dfmv3qzExf7cyrwoEXJxftGiTCkq2kRLE6lBYFo5wIWW7q1UrLjmAt0H+H70BtwyGt474w8FaI5KP7Ft7LVbbhkNGvQ/ocx/Cx3yfZdseDHHy+BwNAyfDUcfn/cORm3GW+QheUR2SEj2yAF5S8ZkSgT5Sr6TH+Sn1/PeeWPv07nU67QzD8hf5c1+AUsgz5g=
-
!7
Previous Attempt: Conditional VAE
log p(y|x; ✓) � Eq(z|x,y;�)[log p(y|z, x; ✓)]�DKL(q(z|x, y;�)kp(z|x; ✓))AAACXXicbVFda9swFJW9dmuzrs22hz30RSwMEuiCnQ026EthHQzWhxaWthAZIys3iahsa9L1iOv4T/Zte9lfmZympR+7IDg65x6u7lGilbQYBL89/8na+tNnG5ut51svtnfaL1+d2rwwAoYiV7k5T7gFJTMYokQF59oATxMFZ8nFl0Y/+wXGyjz7gaWGKOXTTE6k4OiouI1M5VOqu+Vivs9wBsh7lE2BspTjLEno17j62b1czPfKfaZnslePbg2Xe7eWiL6nh3HFEOZYfT+q6+49E1vo5nrT3YvbnaAfLIs+BuEKdMiqjuP2FRvnokghQ6G4taMw0BhV3KAUCuoWKyxoLi74FEYOZjwFG1XLdGr6zjFjOsmNOxnSJXvXUfHU2jJNXGeztH2oNeT/tFGBk89RJTNdIGTietCkUBRz2kRNx9KAQFU6wIWR7q1UzLjhAt2HtFwI4cOVH4PTQT/80B+cfOwcDFZxbJBd8pZ0SUg+kQPyjRyTIRHkj0e8Ta/l/fXX/S1/+7rV91ae1+Re+W/+Aeq4s8M=
(Kingma & Welling, 2014; Zhang et al., 2016)
“Posterior collapse” in language modeling, the latent variable is ignored (Bowman et al., 2016)
Gaussian , zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw==
p(y|x; ✓) =Z
zp(z|x; ✓)p(y|z, x; ✓)
AAACGnicbVDLSgMxFM34rPVVdekmWIQKUmaqoCBCwY3LCvYBbRkyadqGZjJDckec1n6HG3/FjQtF3Ikb/8a0HXy0HgicnHMuyT1eKLgG2/605uYXFpeWUyvp1bX1jc3M1nZFB5GirEwDEaiaRzQTXLIycBCsFipGfE+wqte7GPnVG6Y0D+Q1xCFr+qQjeZtTAkZyM06Yi+9uzxrQZUAOzhtcgtvHYa7/I44S/cPvq5vJ2nl7DDxLnIRkUYKSm3lvtAIa+UwCFUTrumOH0BwQBZwKNkw3Is1CQnukw+qGSuIz3RyMVxvifaO0cDtQ5kjAY/X3xID4Wse+Z5I+ga6e9kbif149gvZpc8BlGAGTdPJQOxIYAjzqCbe4YhREbAihipu/YtolilAwbaZNCc70yrOkUsg7R/nC1XG2WEjqSKFdtIdyyEEnqIguUQmVEUX36BE9oxfrwXqyXq23SXTOSmZ20B9YH1+7c6Ce
-
!8
Our Approach: Mixture Model
p(y|x; ✓) =KX
z=1
p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl
Simplest, enumerable, exact marginal
Multinomial , taking values in zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw== {1, · · · ,K}AAAChHicfVBdaxNBFJ2sWuv6lbaPviyGQJQl7KZKC6WloKBQxAqmLWRiuDu5SYbO7g4zd6XJuj/JX+OToP/F2TR+teKFgTPnnHtn7km0kpai6GvDu3Hz1trt9Tv+3Xv3Hzxsbmye2LwwAvsiV7k5S8Cikhn2SZLCM20Q0kThaXL+otZPP6KxMs/e01zjMIVpJidSADlq1HzV1p35p4s9TjMkeLLPbZGOysV+XH04CnRn8VuqfYvw19XnZRxyMc7Jhke8GjVbUTdaVnAdxCvQYqs6Hm002nyciyLFjIQCawdxpGlYgiEpFFY+LyxqEOcwxYGDGaRoh+Vy4ypoO2YcTHLjTkbBkv2zo4TU2nmaOGcKNLNXtZr8lzYoaLI7LGWmC8JMXD40KVRAeVDHF4ylQUFq7gAII91fAzEDA4JcyD5/iW4Xg2/c3LcaDVBunpYczDSFi8rtNuVhjf5nlNlPo0O+74KNr8Z4HZz0uvF2t/fuWeuwt4p4nT1ij1mHxWyHHbLX7Jj1mWCf2Rf2jX331rzQ2/aeX1q9xqpni/1V3sEP8onElw==
-
!9
Even simpler:
p(y|x; ✓) =KX
z=1
p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl
=1
K
KX
z=1
p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXYQa8YLvyzOyqXUc2M9GAJ9albsNpws28JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RCwS8JQ==
p(z|x; ✓) = 1/KAAAChHicbVFda9swFJXdreuyr7R93ItYCMugZFba0kLJKOxlkJcWmrYQZ0ZWrhNRyTbSdWni+pf0X+1t/2ZKGrY22QXB0TnncqVz41xJi0Hw2/M3XrzcfLX1uvbm7bv3H+rbO5c2K4yAvshUZq5jbkHJFPooUcF1boDrWMFVfPN9rl/dgrEySy9wmsNQ83EqEyk4OiqqPzTz1vT+7iTECSD/0g1toaNy1mXVzx7NW7N/0tw32/t7rTW7YWK4KFlV9qqVthVnOAb61Kz5XTRbsz2b1mVfe1G9EbSDRdF1wJagQZZ1FtV/haNMFBpSFIpbO2BBjsOSG5RCQVULCws5Fzd8DAMHU67BDstFiBVtOmZEk8y4kyJdsE87Sq6tnerYOTXHiV3V5uT/tEGByfGwlGleIKTicVBSKIoZnW+EjqQBgWrqABdGurdSMeEuLXR7q7kQ2OqX18Flp832253zg8ZpZxnHFvlIPpEWYeSInJIf5Iz0ifA877MXeMzf9Pf8ff/w0ep7y55d8qz8b38AWi3A2A==
—set , each component is equally likely a priori
Our Approach: Mixture Model
Multinomial , taking values in zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw== {1, · · · ,K}AAAChHicfVBdaxNBFJ2sWuv6lbaPviyGQJQl7KZKC6WloKBQxAqmLWRiuDu5SYbO7g4zd6XJuj/JX+OToP/F2TR+teKFgTPnnHtn7km0kpai6GvDu3Hz1trt9Tv+3Xv3Hzxsbmye2LwwAvsiV7k5S8Cikhn2SZLCM20Q0kThaXL+otZPP6KxMs/e01zjMIVpJidSADlq1HzV1p35p4s9TjMkeLLPbZGOysV+XH04CnRn8VuqfYvw19XnZRxyMc7Jhke8GjVbUTdaVnAdxCvQYqs6Hm002nyciyLFjIQCawdxpGlYgiEpFFY+LyxqEOcwxYGDGaRoh+Vy4ypoO2YcTHLjTkbBkv2zo4TU2nmaOGcKNLNXtZr8lzYoaLI7LGWmC8JMXD40KVRAeVDHF4ylQUFq7gAII91fAzEDA4JcyD5/iW4Xg2/c3LcaDVBunpYczDSFi8rtNuVhjf5nlNlPo0O+74KNr8Z4HZz0uvF2t/fuWeuwt4p4nT1ij1mHxWyHHbLX7Jj1mWCf2Rf2jX331rzQ2/aeX1q9xqpni/1V3sEP8onElw==
-
—assume is large for one , but nearly zero for others
a particular translation is only explained by a particular component
!10
Even simpler:
p(y|x; ✓) =KX
z=1
p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl
=1
K
KX
z=1
p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXYQa8YLvyzOyqXUc2M9GAJ9albsNpws28JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RCwS8JQ==
� 1K
maxz
p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXo7iDXjhV8WZ+VS7LkzGAJ96lXsNpwsu8JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RAgy8JQ==
p(y|z, x; ✓)AAAClXicbVFdi9NAFJ3Erxq/6vrggy+DpVBBalIFFySwfiBCH3YFu7vQ1DCZ3rRDZ5IwcyNtY/6Rv8Y3/43TbnF3Uy8MnDnnXO7MuUkhhUHf/+O4N27eun2ndde7d//Bw0ftxwenJi81hxHPZa7PE2ZAigxGKFDCeaGBqUTCWbL4uNHPfoA2Is++4aqAiWKzTKSCM7RU3P7VLXqrn8t3Ec4B2YswMqWKq3UY1N+HtOitL6WNb/3y39XrhlGqGa+CuhrWjbaGM5oBvWpWbBmv923XxoXBq6HXsMTtjt/3t0X3QbADHbKrk7j9O5rmvFSQIZfMmHHgFzipmEbBJdReVBooGF+wGYwtzJgCM6m2qda0a5kpTXNtT4Z0y17tqJgyZqUS61QM56apbcj/aeMS08NJJbKiRMj4xaC0lBRzulkRnQoNHOXKAsa1sG+lfM5sfGgX6dkQguaX98HpoB+87g++vukcDXZxtMgz8pz0SEDekiPyhZyQEeHOgXPovHc+uE/d0P3kfr6wus6u5wm5Vu7xX1S3xzI=
zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw==
Our Approach: Mixture Model
Multinomial , taking values in zAAACPHicfZDLSgMxFIYzXut4a3XpZrAURKTM1IIuC7pwI7ZgL9ApciY9rcFMZkgyYh36BG71dXwP9+7ErWvTi6BVPBD4+M+f5Jw/iDlT2nVfrLn5hcWl5cyKvbq2vrGZzW01VJRIinUa8Ui2AlDImcC6ZppjK5YIYcCxGdycjPrNW5SKReJSD2LshNAXrMcoaCPV7q+yebfojsv5Dd4U8mRa1aucVfC7EU1CFJpyUKrtubHupCA1oxyHtp8ojIHeQB/bBgWEqDrpeNKhUzBK1+lF0hyhnbH6/UYKoVKDMDDOEPS1mu2NxL967UT3jjspE3GiUdDJR72EOzpyRms7XSaRaj4wAFQyM6tDr0EC1SYc2z9Fs4vEc/PuRYwSdCT3Ux9kP4S7odmt7x+M6D8jE19GQ7ZtgvVmY/wNjVLROyyWauV8pTyNOEN2yC7ZIx45IhVyRqqkTihB8kAeyZP1bL1ab9b7xDpnTe9skx9lfXwCXBKtxw== {1, · · · ,K}AAAChHicfVBdaxNBFJ2sWuv6lbaPviyGQJQl7KZKC6WloKBQxAqmLWRiuDu5SYbO7g4zd6XJuj/JX+OToP/F2TR+teKFgTPnnHtn7km0kpai6GvDu3Hz1trt9Tv+3Xv3Hzxsbmye2LwwAvsiV7k5S8Cikhn2SZLCM20Q0kThaXL+otZPP6KxMs/e01zjMIVpJidSADlq1HzV1p35p4s9TjMkeLLPbZGOysV+XH04CnRn8VuqfYvw19XnZRxyMc7Jhke8GjVbUTdaVnAdxCvQYqs6Hm002nyciyLFjIQCawdxpGlYgiEpFFY+LyxqEOcwxYGDGaRoh+Vy4ypoO2YcTHLjTkbBkv2zo4TU2nmaOGcKNLNXtZr8lzYoaLI7LGWmC8JMXD40KVRAeVDHF4ylQUFq7gAII91fAzEDA4JcyD5/iW4Xg2/c3LcaDVBunpYczDSFi8rtNuVhjf5nlNlPo0O+74KNr8Z4HZz0uvF2t/fuWeuwt4p4nT1ij1mHxWyHHbLX7Jj1mWCf2Rf2jX331rzQ2/aeX1q9xqpni/1V3sEP8onElw==
-
!11
Training Objective
p(y|x; ✓) =KX
z=1
p(z|x; ✓)p(y|z, x; ✓)AAACcXicbVHdShtBGJ1dbavpX9TeiLQMBiGlJexGoYUSCHgjeGOhUSGbLrOTb5PBmd1l5ltJsu59n693voQ3voCTGNSu/WDgzPlhZs5EmRQGPe/acVdWX7x8tbZee/3m7bv39Y3NU5PmmkOPpzLV5xEzIEUCPRQo4TzTwFQk4Sy6OJzrZ5egjUiTXzjNYKDYKBGx4AwtFdb/ZM3p1eRHgGNA9rkTmFyFxazjl7+PadacPUpz3+zrw7a21wlizXjhl8VxWYlVnMEI6FOzYpNwVrWF9YbX8hZDnwN/CRpkOSdh/W8wTHmuIEEumTF938twUDCNgksoa0FuIGP8go2gb2HCFJhBsWispHuWGdI41XYlSBfs00TBlDFTFVmnYjg2VW1O/k/r5xh/HxQiyXKEhN8fFOeSYkrn9dOh0MBRTi1gXAt7V8rHzHaD9pNqtgS/+uTn4LTd8vdb7Z8HjW57Wcca2SG7pEl88o10yRE5IT3CyY3zwfnofHJu3W2Xurv3VtdZZrbIP+N+uQMXJLwl
=1
K
KX
z=1
p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXYQa8YLvyzOyqXUc2M9GAJ9albsNpws28JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RCwS8JQ==
� 1K
maxz
p(y|z, x; ✓)AAACcXicbVHbatswGJbdbW2zQ9PDzSgboiGQ0RLstNDCCAR2U+hNB0tbiDMjK78TUck20u/SxPN9n693e4nd9AWqHNjadD8IPn0HJH2KMikMet5vx1159frN6tp65e279x82qptbFybNNYcuT2WqryJmQIoEuihQwlWmgalIwmV0/W2qX96ANiJNfuA4g75iw0TEgjO0VFi9q2eN8a/brwGOANmXdmByFRaTtl/+PKNZY/JPmvomB3+3lXo7iDXjhV8WZ+VS7LkzGAJ96lXsNpwsu8JqzWt6s6Evgb8ANbKY87B6HwxSnitIkEtmTM/3MuwXTKPgEspKkBvIGL9mQ+hZmDAFpl/MGitp3TIDGqfargTpjH2aKJgyZqwi61QMR2ZZm5L/03o5xif9QiRZjpDw+UFxLimmdFo/HQgNHOXYAsa1sHelfMRsN2g/qWJL8Jef/BJctJr+YbP1/ajWaS3qWCO7ZI80iE+OSYecknPSJZz8cXacT85n58H96FJ3b251nUVmmzwbd/8RAgy8JQ==
L(✓) = E(x,y)⇠datahminz
� log p(y|z, x; ✓)i
AAACunicbVFba9swFJa9W5ddmm2PexELBQfaYGeDDrpBYRsM1ocOlrZgGSMrsi0iXyYdjziOf+T2tn8zOfFKbwcEn75zvnONSik0uO5fy753/8HDRzuPB0+ePnu+O3zx8kwXlWJ8xgpZqIuIai5FzmcgQPKLUnGaRZKfR4tPnf/8F1daFPkPqEseZDTJRSwYBUOFw997RBYJLp16vTwikHKgY0wSjklGIY0i/CVsfjqr9XK/PiJlKsatfylY7V9KAnyAP4cNAb6E5ttJ2zrXRGRddt//0ePBJjujEp84PffxSkHHCMdEi2ybb06Bti2RPAafZCIPVwd39UCUSFIIwuHInbgbw7eB14MR6u00HP4h84JVGc+BSaq177klBA1VIJjk7YBUmpeULWjCfQNzmnEdNJvVt3jPMHMcF8q8HPCGvapoaKZ1nUUmsptP3/R15F0+v4L4fdCIvKyA52xbKK4khgJ3d8RzoTgDWRtAmRKmV8xSqigDc+2BWYJ3c+Tb4Gw68d5Opt/fjY6n/Tp20Gv0BjnIQ4foGH1Fp2iGmHVoBVZsJfYHO7KFvdiG2laveYWumQ3/AFra16I=
-
!12
EM Training
E-step (hard): estimate the responsibility of each component
M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=
Take a mini-batch
L(✓) = E(x,y)⇠datahminz
� log p(y|z, x; ✓)i
AAACunicbVFba9swFJa9W5ddmm2PexELBQfaYGeDDrpBYRsM1ocOlrZgGSMrsi0iXyYdjziOf+T2tn8zOfFKbwcEn75zvnONSik0uO5fy753/8HDRzuPB0+ePnu+O3zx8kwXlWJ8xgpZqIuIai5FzmcgQPKLUnGaRZKfR4tPnf/8F1daFPkPqEseZDTJRSwYBUOFw997RBYJLp16vTwikHKgY0wSjklGIY0i/CVsfjqr9XK/PiJlKsatfylY7V9KAnyAP4cNAb6E5ttJ2zrXRGRddt//0ePBJjujEp84PffxSkHHCMdEi2ybb06Bti2RPAafZCIPVwd39UCUSFIIwuHInbgbw7eB14MR6u00HP4h84JVGc+BSaq177klBA1VIJjk7YBUmpeULWjCfQNzmnEdNJvVt3jPMHMcF8q8HPCGvapoaKZ1nUUmsptP3/R15F0+v4L4fdCIvKyA52xbKK4khgJ3d8RzoTgDWRtAmRKmV8xSqigDc+2BWYJ3c+Tb4Gw68d5Opt/fjY6n/Tp20Gv0BjnIQ4foGH1Fp2iGmHVoBVZsJfYHO7KFvdiG2laveYWumQ3/AFra16I=
{(x(i), y(i))}mi=1AAAC2XicbVJba9swFJa9W5fdsu1xL2IhYEMa4mzQQSkUtsFgfehgSQuWa2RFdkQl27WORxzHD3vYGHvdP9vb/sV+wpTLSm8HhD5953znIinKpdAwGPyx7Fu379y9t3W/9eDho8dP2k+fjXVWFoyPWCaz4jiimkuR8hEIkPw4LzhVkeRH0enbpf/oCy+0yNLPUOU8UDRJRSwYBUOF7b9dIrME5061mO0SmHKgLiYJx0RRmEYRfh/WZ858MetVuySfCrfxzwXz3rkkwNv4XVgT4DOoPx40jXNJRBb58vg/2m11V+kZlfjA2ZB7Fyo6RukSLdQ64YQCbRoieQw+USIN59s3NUEKkUwhaBGjP6kd02uvWu8uacJa7HnNiQrbnUF/sDJ8HXgb0EEbOwzbv8kkY6XiKTBJtfa9QQ5BTQsQTPKmRUrNc8pOacJ9A1OquA7q1cs0uGuYCY6zwqwU8Iq9qKip0rpSkYlcTq+v+pbkTT6/hPhNUIs0L4GnbF0oLiWGDC+fGU9EwRnIygDKCmF6xWxKC8rAfIaWuQTv6sjXwXjY9171h59ed/aHm+vYQi/QS+QgD+2gffQBHaIRYtbYWljfrO+2b3+1f9g/16G2tdE8R5fM/vUPfLfjPg==
Just like training mixture of Gaussians but in text space and conditioned on source
r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=
r(i)z 1[z = argmaxz0
p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=
-
!13
Parameterizationlog p(y|x; ✓)
AAACsHicfZFdb9MwFIbd8LWFrw4ud2NRTevQVCUFCRBCmgQX3CCGRLdKTYhOnJPUzIkj29naZvl5/Ah+A7dwj9N2EnSII1l69J7XOTmv41JwbTzve8e5cfPW7Ttb2+7de/cfPOzuPDrRslIMR0wKqcYxaBS8wJHhRuC4VAh5LPA0Pnvb9k/PUWkui89mXmKYQ1bwlDMwVoq60Z6KFl/qPj9oAoGpAaXkBQ1yMNNE134zWbwJQGU5zKJ6sd/Qsj9fuS8X+4ezFb4OzBQNHIRuIGTWWi5nV1rU7XkDb1n0Ovhr6JF1HUc7nVdBIlmVY2GYAK0nvleasAZlOBPYuEGlsQR2BhlOLBaQow7rZRIN3bNKQlOp7CkMXap/3qgh13qex9bZrqg3e634z16i2w9uTDfpy7DmRVkZLNhqeFoJaiRto6YJV8iMmFsAprj9f8qmoIAZ+yCuG7xDu6DCD3bYxxIVGKme1su0edHYhbPgsKX/GWF2ZbTk2rD9zWivw8lw4D8bDD897x0N17FvkV3yhPSJT16QI/KeHJMRYeQb+UF+kl/O0Bk7kQMrq9NZ33lM/irn628KD9dQ
Before:
-
log p(y|z, x; ✓)AAACsnicfZFdb9MwFIbd8DXCx7pxyY1FNa1DVZWUSTAhpElwwQ1iSHSbaEJ06pyk1pw4sh1ok+X38Rv4EdzCLU7bSdAhjmTp0Xte5+S8nhaCa+N53zvOjZu3bt/Zuuveu//g4XZ3Z/dUy1IxHDMppDqfgkbBcxwbbgSeFwohmwo8m168bvtnX1BpLvOPZlFgmEGa84QzMFaKurCnoupz3ecHTSAwMaCU/EqDDMws1rXfTKpXAag0g3lUV/sNLfqLlfuy2h/MV/gyMDM0cBC6gZBpa7msBvMrNer2vKG3LHod/DX0yLpOop3OURBLVmaYGyZA64nvFSasQRnOBDZuUGosgF1AihOLOWSow3qZRUP3rBLTRCp7ckOX6p83asi0XmRT62yX1Ju9VvxnL9btBzemm+RFWPO8KA3mbDU8KQU1krZh05grZEYsLABT3P4/ZTNQwIx9EtcN3qBdUOE7O+x9gQqMVE/rZd48b+zCaTBo6X9GmF8ZLbk2bH8z2utwOhr6z4ajD4e949E69i3ymDwhfeKT5+SYvCUnZEwY+UZ+kJ/kl3PofHLAYSur01nfeUT+Kkf8Btp22Ao=
!14
Parameterization
zAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3EAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3EAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3EAAACEXicbVDLSsNAFJ34rPFVdekmWASRUhIRdFnQhRuxBfuAJpTJ9CYdOpmEmYlQQ7/Apfox7sStX+C3uHHSZqGtBwYO59x7597jJ4xKZdtfxtLyyuraemnD3Nza3tkt7+23ZZwKAi0Ss1h0fSyBUQ4tRRWDbiIARz6Djj+6yv3OAwhJY36vxgl4EQ45DSjBSkvNx365YtfsKaxF4hSkggo0+uVvdxCTNAKuCMNS9hw7UV6GhaKEwcR0UwkJJiMcQk9TjiOQXjZddGIda2VgBbHQjytrqv7uyHAk5TjydWWE1VDOe7n4n9dLVXDpZZQnqQJOZh8FKbNUbOVXWwMqgCg21gQTQfWuFhligYnS2ZjuNehbBNzquXcJCKxicZq5WIQR5RN9W+hWc2bqtJz5bBZJ+6zm2DWneV6pV4vcSugQHaET5KALVEc3qIFaiCBAT+gFvRrPxpvxbnzMSpeMoucA/YHx+QM42p3E
After:
-
!15
Testing
Generate hypotheses by greedily decoding KAAACtXicfZFdi9NAFIan8WuNH9vVS28GS9mulJJUYRURFvRCEHEFu7vQxHA6OUmHnWTCzETbZPMH/Qf+C2/1yknbBe2KBwYe3vNOTs47s0JwbTzve8e5dv3GzVs7t907d+/d3+3uPTjRslQMJ0wKqc5moFHwHCeGG4FnhULIZgJPZ+ev2/7pF1Say/yTWRYYZpDmPOEMjJWibtxXUfW5HvCDJhCYGFBKfqVBBmYe69pvptWrAFSawSKqq/2GFoPl2n1R7Q8Xa3wZmDkaOAjdfiBk2nouquHiUnbfRd2eN/JWRa+Cv4Ee2dRxtNd5EcSSlRnmhgnQeup7hQlrUIYzgY0blBoLYOeQ4tRiDhnqsF7F0dC+VWKaSGVPbuhK/fNGDZnWy2xmne2eervXiv/sxbr94NZ0kzwPa54XpcGcrYcnpaBG0jZvGnOFzIilBWCK2/+nbA4KmLGv4rrBG7QLKnxvh30oUIGR6km9ipznjV04DYYt/c8Ii0ujJdeG7W9HexVOxiP/6Wj88VnvaLyJfYc8Io/JgPjkkByRt+SYTAgj38gP8pP8cg6d0ImdZG11Ops7D8lf5cjfnXDYog== p(y|z, x; ✓), z = 1, · · · ,KAAACB3icbVDJSgNBEO1xjXEb9ShIYxAiDGEmCgoiBLwIXiKYBZIQejo9SZOehe4aMRnjyYu/4sWDIl79BW/+jZ3loIkPCh7vVVFVz40EV2Db38bc/MLi0nJqJb26tr6xaW5tl1UYS8pKNBShrLpEMcEDVgIOglUjyYjvClZxuxdDv3LLpOJhcAO9iDV80g64xykBLTXNvSjbu+9bd2d16DAgh9YD7p87Vp22QlDWVdPM2Dl7BDxLnAnJoAmKTfOr3gpp7LMAqCBK1Rw7gkZCJHAq2CBdjxWLCO2SNqtpGhCfqUYy+mOAD7TSwl4odQWAR+rviYT4SvV8V3f6BDpq2huK/3m1GLzTRsKDKAYW0PEiLxYYQjwMBbe4ZBRETxNCJde3YtohklDQ0aV1CM70y7OknM85R7n89XGmkJ/EkUK7aB9lkYNOUAFdoiIqIYoe0TN6RW/Gk/FivBsf49Y5YzKzg/7A+PwBEjeYGg==
Solely depend on the latent variable to produce different hypotheses
Computationally efficient and parallelizable
No heuristic diverse decoding methods
-
!16
Try It Out
参与投票的成员中,58% 反对该合同交易易。Source
Hypotheses z = 1AAADXHicfVJdaxQxFM3sVq1Tq62CL4IEl4GtTJeZKijKQkEFYRUruG1hsw6ZbHY2NPNBckd2dpxHf42v+mN88beY2Y617UovBE7OPcm99yRhJoUGz/tltdpr167fWL9pb9zavH1na/vuoU5zxfiQpTJVxyHVXIqED0GA5MeZ4jQOJT8KT17V+aMvXGmRJp+gyPg4plEipoJRMFSwbT10sm7xdf6SwIwD3ekTncdBuej71ecBzrqLf6lat3DPtrZjRC5hkxS0O7CdwnZITGHGqMTvumfX1VQY4jdB2Z27xQ7RIibA51BOKNCqIpJPYbRLZBotKy+urEmUiGYwNpVmFHARQJ9QFcV0HhS4ljZ06b+AXb9yL3S70v6g5s7VWpWYAYOtjtfzloFXgd+ADmriwDjqkEnK8pgnwCTVeuR7GYxLqkAwySub5JpnlJ3QiI8MTGjM9bhcvmSFHcNM8DRVZiWAl+z5EyWNtS7i0ChrY/XlXE3+LzfKYfp8XIoky4En7LTQNJcYUlx/CzwRijOQhQGUKWF6xWxGFWVgPo9NXnMzi+Lvzb0fMq4opOpx2Xhfmdki4tboKqFI/goNsm1jrH/ZxlVwuNfzn/T2Pj7t7LuNxevoAXqEushHz9A+eosO0BAx65v13fph/Wz9bq+1N9qbp9KW1Zy5hy5E+/4f2j0OpA==
z = 2AAADXHicfVJdaxNBFJ1Nqtat1baCL4IMhoVUtiEbhYoSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8YXf4uz6VrbRnph4cy5Z+bcOTtBKoWGbveX1Wiu3bh5a/22vXFn8+69re2dI51kivEhS2SiTgKquRQxH4IAyU9SxWkUSH4cnL6q+sdfuNIiiT9BnvJxRMNYTAWjYCh/23rkpO386/wlgRkHutsnOov8YtH3ys8DnLYX/1qVbuGeL23HiFzCJglod2A7ue2QiMKMUYnftc+Pq6ggwG/8oj13812iRUSAz6GYUKBlSSSfwmiPyCRcOi+u9SRKhDMYG6cZBZz70CdUhRGd+zmupDVdeC9gzyvdS9OujD+ouAteq5JFv+dvtbqd7rLwKvBq0EJ1HZpEHTJJWBbxGJikWo+8bgrjgioQTPLSJpnmKWWnNOQjA2MacT0uln+yxI5hJniaKPPFgJfsxR0FjbTOo8Aoq2D11V5F/q83ymD6fFyIOM2Ax+zMaJpJDAmungWeCMUZyNwAypQws2I2o4oyMI/HJq+5uYvi7825H1KuKCTqSVFnX5q7hcSt0HVCEf8VGmTbJljvaoyr4KjX8Z52eh+ftQ7cOuJ19BA9Rm3koX10gN6iQzREzPpmfbd+WD8bv5trzY3m5pm0YdV77qNL1XzwB9wVDqU=
z = 3AAADXHicfVJdaxNBFJ1Nqtatta2CL4IMhoVUtiHbCooSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8aX/pbOpmttG+mFhTPnnplz5+wEqRQaut0/VqO5cuv2ndW79tq99fsbm1sPDnWSKcYHLJGJOg6o5lLEfAACJD9OFadRIPlRcPKm6h9940qLJP4CecpHEQ1jMRGMgqH8LeuJk7bz77PXBKYc6HaP6Czyi3nPK7/2cdqe/2tVurl7sbQdI3IJGyeg3b7t5LZDIgpTRiX+0L44rqKCAL/zi/bMzbeJFhEBPoNiTIGWJZF8AsMdIpNw4Ty/0ZMoEU5hZJymFHDuQ49QFUZ05ue4ktZ04b2CHa90r0y7NH6/4i55LUvmvT1/s9XtdBeFl4FXgxaq68Ak6pBxwrKIx8Ak1XrodVMYFVSBYJKXNsk0Tyk7oSEfGhjTiOtRsfiTJXYMM8aTRJkvBrxgL+8oaKR1HgVGWQWrr/cq8n+9YQaTl6NCxGkGPGbnRpNMYkhw9SzwWCjOQOYGUKaEmRWzKVWUgXk8NnnLzV0U/2jO/ZRyRSFRz4o6+9LcLSRuhW4Siviv0CDbNsF612NcBoe7HW+vs/v5eWvfrSNeRY/RU9RGHnqB9tF7dIAGiFk/rJ/WL+t347S50lxrrp9LG1a95yG6Us1HZ93tDqY=
p(y|z, x; ✓) ! p(y|x; ✓)AAAC7XicfVFda9swFJXdfXTeV9o97kUshKbDBNvZSMMoFLaHQRnrYGkLkWcUWXFEZctI8pbY9c/YW9nrftP+zeR8QBe2XRAczj3S0T13knOmtOf9suydO3fv3d994Dx89PjJ09be/rkShSR0RAQX8nKCFeUsoyPNNKeXuaQ4nXB6Mbl62/QvvlKpmMg+60VOwxQnGZsygrWhotZNR0bll6rLDmvE6VRjKcU3iFKsZ7Gq/HpcHiMskxTPo6o8qGHeXazU1+WBO1/BN0jPqMaHodNBXCSN5rp05xva6Zw6WxSSLJmtvZrWLW157LuIxEIr9zRqtb3eYOj1+6+h1/OCo8Dzl2DQHwbQ73nLaoN1nUV71hDFghQpzTThWKmx7+U6rLDUjHBaO6hQNMfkCid0bGCGU6rCahljDTuGieFUSHMyDZfs7RsVTpVapBOjbPJR272G/GsvVs2DW+56ehRWLMsLTTOyMp8WHGoBmz3BmElKNF8YgIlk5v+QzLDERJttOg56R82Akn4wZh9zKrEW8mW1XBXLajNwgtwG/U+I5xuhQY4Je5Mo/Dc4D3p+vxd8etU+Cdax74Ln4AXoAh8MwAl4D87ACBDLtrqWbwW2sL/bN/aPldS21neegT/K/vkbPenqmA==
The latent variable is ignored (as in VAE)
Sharing too many parameters that does not differentiate?
—use independently parameterized decoders
p(y|z, x; ✓)AAADhnicfVJbb9MwFHYWLiXcNnjkxaKK1KGsajrQEKhSJUCaVBBDotukukSO66bWcpN9Ak1DfhO/hgde4K/gtGGs67QjWfr8ne/43OynoVDQ6fwytswbN2/dbtyx7t67/+Dh9s6jY5VkkvEhS8JEnvpU8VDEfAgCQn6aSk4jP+Qn/tmbyn/ylUslkvgz5CkfRzSIxVQwCprydoxDO23l3+evCcw40N0eUVnkFYueW34Z4LS1+O+qdAvn/GrZWuQQNklAOQPLzi2bRBRmjIb4fev8uYryffzOK1pzJ98lSkQE+ByKCQValiTkUxjtkTAJlpkX1+YkUgQzGOtMMwo496BHqAwiOvdyXElrunBfwZ5bOmvVbpQ/qLgLuawrGtzfiFuVQKVMvuG1yXnbzU67szS8CdwaNFFtR3r2NpkkLIt4DCykSo3cTgrjgkoQLOSlRTLFU8rOaMBHGsY04mpcLHdeYlszEzxNpD4x4CV7MaKgkVJ55GtltQJ12VeRV/lGGUxfjgsRpxnwmK0STbMQQ4KrD4QnQnIGYa4BZVLoWjGbUUkZ6G9mkbdc9yL5B/3ux5RLCol8VtRbKnVvAXEqdJ1QxP+EGlmWHqx7eYyb4Ljbdvfb3U/Pm/1uPeIGeoKeohZy0QHqo0N0hIaIGT+Mn8Zv44/ZMNvmC/NgJd0y6pjHaM3M/l/o5R6h
Fifty-eight per cent of those voting opposed the contract deal . Fifty-eight per cent of those voting opposed the contract deal . Fifty-eight per cent of those voting opposed the contract deal .
-
“Rich gets richer”—once a component is better than others, it receives more gradients while others starve and eventually die (Teh, 2010)
!17
Try Again with Independent Decoders
参与投票的成员中,58% 反对该合同交易易。Source
Hypotheses z = 1AAADXHicfVJdaxQxFM3sVq1Tq62CL4IEl4GtTJeZKijKQkEFYRUruG1hsw6ZbHY2NPNBckd2dpxHf42v+mN88beY2Y617UovBE7OPcm99yRhJoUGz/tltdpr167fWL9pb9zavH1na/vuoU5zxfiQpTJVxyHVXIqED0GA5MeZ4jQOJT8KT17V+aMvXGmRJp+gyPg4plEipoJRMFSwbT10sm7xdf6SwIwD3ekTncdBuej71ecBzrqLf6lat3DPtrZjRC5hkxS0O7CdwnZITGHGqMTvumfX1VQY4jdB2Z27xQ7RIibA51BOKNCqIpJPYbRLZBotKy+urEmUiGYwNpVmFHARQJ9QFcV0HhS4ljZ06b+AXb9yL3S70v6g5s7VWpWYAYOtjtfzloFXgd+ADmriwDjqkEnK8pgnwCTVeuR7GYxLqkAwySub5JpnlJ3QiI8MTGjM9bhcvmSFHcNM8DRVZiWAl+z5EyWNtS7i0ChrY/XlXE3+LzfKYfp8XIoky4En7LTQNJcYUlx/CzwRijOQhQGUKWF6xWxGFWVgPo9NXnMzi+Lvzb0fMq4opOpx2Xhfmdki4tboKqFI/goNsm1jrH/ZxlVwuNfzn/T2Pj7t7LuNxevoAXqEushHz9A+eosO0BAx65v13fph/Wz9bq+1N9qbp9KW1Zy5hy5E+/4f2j0OpA==
z = 2AAADXHicfVJdaxNBFJ1Nqtat1baCL4IMhoVUtiEbhYoSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8YXf4uz6VrbRnph4cy5Z+bcOTtBKoWGbveX1Wiu3bh5a/22vXFn8+69re2dI51kivEhS2SiTgKquRQxH4IAyU9SxWkUSH4cnL6q+sdfuNIiiT9BnvJxRMNYTAWjYCh/23rkpO386/wlgRkHutsnOov8YtH3ys8DnLYX/1qVbuGeL23HiFzCJglod2A7ue2QiMKMUYnftc+Pq6ggwG/8oj13812iRUSAz6GYUKBlSSSfwmiPyCRcOi+u9SRKhDMYG6cZBZz70CdUhRGd+zmupDVdeC9gzyvdS9OujD+ouAteq5JFv+dvtbqd7rLwKvBq0EJ1HZpEHTJJWBbxGJikWo+8bgrjgioQTPLSJpnmKWWnNOQjA2MacT0uln+yxI5hJniaKPPFgJfsxR0FjbTOo8Aoq2D11V5F/q83ymD6fFyIOM2Ax+zMaJpJDAmungWeCMUZyNwAypQws2I2o4oyMI/HJq+5uYvi7825H1KuKCTqSVFnX5q7hcSt0HVCEf8VGmTbJljvaoyr4KjX8Z52eh+ftQ7cOuJ19BA9Rm3koX10gN6iQzREzPpmfbd+WD8bv5trzY3m5pm0YdV77qNL1XzwB9wVDqU=
z = 3AAADXHicfVJdaxNBFJ1Nqtatta2CL4IMhoVUtiHbCooSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8aX/pbOpmttG+mFhTPnnplz5+wEqRQaut0/VqO5cuv2ndW79tq99fsbm1sPDnWSKcYHLJGJOg6o5lLEfAACJD9OFadRIPlRcPKm6h9940qLJP4CecpHEQ1jMRGMgqH8LeuJk7bz77PXBKYc6HaP6Czyi3nPK7/2cdqe/2tVurl7sbQdI3IJGyeg3b7t5LZDIgpTRiX+0L44rqKCAL/zi/bMzbeJFhEBPoNiTIGWJZF8AsMdIpNw4Ty/0ZMoEU5hZJymFHDuQ49QFUZ05ue4ktZ04b2CHa90r0y7NH6/4i55LUvmvT1/s9XtdBeFl4FXgxaq68Ak6pBxwrKIx8Ak1XrodVMYFVSBYJKXNsk0Tyk7oSEfGhjTiOtRsfiTJXYMM8aTRJkvBrxgL+8oaKR1HgVGWQWrr/cq8n+9YQaTl6NCxGkGPGbnRpNMYkhw9SzwWCjOQOYGUKaEmRWzKVWUgXk8NnnLzV0U/2jO/ZRyRSFRz4o6+9LcLSRuhW4Siviv0CDbNsF612NcBoe7HW+vs/v5eWvfrSNeRY/RU9RGHnqB9tF7dIAGiFk/rJ/WL+t347S50lxrrp9LG1a95yG6Us1HZ93tDqY=
Only one component gets trained is poor except for onep(y|z, x; ✓)AAAC1HicfVFda9swFJW9j3beR9PtZWMvYiE0HcbYzkYaRqGwPQzKWAdLWoiyoMhKIipbRpK3xI6fyl73//YT9i8mOwls3ccFweHcc+/VPXeScqa073+37Bs3b93e2b3j3L13/8FeY//hQIlMEtonggt5McGKcpbQvmaa04tUUhxPOD2fXL6u8uefqVRMJB/1MqWjGM8SNmUEa0ONG1ctOc4/FW12WCJOpxpLKb5AFGM9j1QRlMP8GGE5i/FiXOQHJUzby7V6lR+4izV8hfScanw4clqIi1mlWeXuYks7rVPnDyo/DlxEIqGVezpuNH2v2/M7nZfQ9/zwKPSDGnQ7vRAGnl9HE2zibLxv9VAkSBbTRBOOlRoGfqpHBZaaEU5LB2WKpphc4hkdGpjgmKpRUbtVwpZhIjgV0rxEw5r9taLAsVLLeGKUlQ3qeq4i/5qLVNXw2nQ9PRoVLEkzTROyHj7NONQCVueAEZOUaL40ABPJzP8hmWOJiTZHcxz0hpoFJX1nhr1PqcRayOdFfRGWlGbhGXIr9D8hXmyFBjnG7K2j8N9gEHpBxws/vGiehBvbd8FT8Ay0QQC64AS8BWegDwj4Ye1Zj60n9sBe2Vf217XUtjY1j8BvYX/7CR4G4P0=
zAAAC8HicfVFdb9MwFHUCgxG+OnjkxaKK1kFVJemmrkKTJsED0oQYEt0m1aFyHTe15sSR7WxtsvwP3iZe+Uf8G5x+CKiAK1k6OvfYx/fcccaZ0p73w7Lv3N26d3/7gfPw0eMnTxs7z86UyCWhAyK4kBdjrChnKR1opjm9yCTFyZjT8/Hl27p/fkWlYiL9rOcZDRMcp2zCCNaGGjVuXTkqvpQttlchTicaSymuIUqwnkaq9KthcYSwjBM8G5XFbgWz1nypvil227MlfIP0lGq8Fzou4iKuNTdFe7amHffEcTc4JFk8XZnVrV/iwnGLI7+NSCS0ap+MGk2v0+t73e4B9DpecBh4/gL0uv0A+h1vUU2wqtPRjtVHkSB5QlNNOFZq6HuZDkssNSOcVg7KFc0wucQxHRqY4oSqsFwkWUHXMBGcCGlOquGC/f1GiROl5snYKOuI1GavJv/ai1T94Ia7nhyGJUuzXNOULM0nOYdawHpVMGKSEs3nBmAimfk/JFMsMdFmoY6D3lEzoKQfjNnHjEqshXxVLrbF0soMHKN2jf4nxLO10CDHhL1OFP4bnAUdv9sJPu03j4NV7NvgBXgJWsAHPXAM3oNTMADE2rJeW/vWgS3tr/at/W0pta3Vnefgj7K//wSQN+tf
Fifty-eight per cent of those voting opposed the contract deal . . .
-
!18
Mixture Models Are Prone to Degeneracies
D1: all components behave the same, the latent variable is ignored
D2: only one component gets trained, other components are poor
Turns out how to train mixture models is not obvious…
Let’s take a closer look
-
!19
EM Training
E-step (hard): estimate the responsibility of each component
M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=
r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=
r(i)z 1[z = argmaxz0
p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=
Shared params, latent variable is ignored
-
!20
Effect of Dropout
Dropout noise here can confuse latent variable assignments
0.0 0.1 0.2 0.3 0.4 0.5dropout probability
0.0
0.1
0.2
0.3
0.4
0.5
resp
onsibi
lity
flip
rate
for
hMup
0.0 0.1 0.2 0.3 0.4 0.5dropout probability
0.0
0.1
0.2
0.3
0.4
0.5
resp
onsibi
lity
flip
rate
for
hMup
E-step (hard): estimate the responsibility of each component
M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=
r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=
r(i)z 1[z = argmaxz0
p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=
Shared params, latent variable is ignored
-
!21
Fix Dropout
E-step (hard): estimate the responsibility of each component
M-step: update through each component with gradients✓AAAEB3icnVPNa9RAFJ8mftT4tdWjCIPLsiksS1IFBVkoeBG8VHDbwmYbJrOT3aGTD2Ym2iSbnrz4r3jxoIhX/wVv/jdONqndpK0HH4R58/u993tvHnlezKiQlvV7Q9OvXb9xc/OWcfvO3Xv3O1sP9kWUcEzGOGIRP/SQIIyGZCypZOQw5gQFHiMH3vGrkj94T7igUfhOpjGZBmgeUp9iJBXkbmmPe5nRi81seTJIXzpyQSTaHjk+RzhfoWdYbKbLbPD3WuSOSAI3z/oFVHH9VmB/LbKUT8/5UZ04soujNyo3HaxVgSPYoq9uweiZqmd1OCHyGHIrGDosmpeyR7lJt4vlSXU2q2fwdKVccYO0GXPq4FkkG6pN0UHWlr2qifazzwqfz/rSMusjMXrczapqDiO+RJxHH+A/2i/nUjOwpv5DpDrdTtcaWiuDFx27drqgtj2388uZRTgJSCgxQ0JMbCuW0xxxSTEjheEkgsQIH6M5mSg3RAER03z1Hxewp5AZ9COuvlDCFbqekaNAiDTwVGSA5EK0uRK8jJsk0n8xzWkYJ5KEuCrkJwzKCJZLAWeUEyxZqhyEOVW9QrxAagekWh1DDcFuP/mis78ztJ8Od94+6+5a9Tg2wSPwBJjABs/BLngN9sAYYO2j9ln7qn3TP+lf9O/6jypU26hzHoKG6T//ABWuTt8=
r(i)z ·r✓ log p(y(i)|z, x(i); ✓)AAADz3icfVJba9swFHbiXVrv1svjXsRCqDPSEGeDjZVAYR0M1rEW1gtEqZFlxRGVL5OUNbbisdf9xD3vj0yOnaxNywTGn84533cuOl7CqJDd7u9a3bx3/8HDtXXr0eMnT59tbG6dinjCMTnBMYv5uYcEYTQiJ5JKRs4TTlDoMXLmXb4v/GffCRc0jr7KNCHDEAURHVGMpDa5m7VZE7I4AImdzqZ7UI6JRC0AAwJgiOTY88AHV32zs9m0ne7BZExb+WBJyNpLyhDsggNXQUmmUn06zHP7BgnOkuK6iG5Zzbk8Rgwc2pWxfy2jrZktKGhYCvpIojyHjIzkAIY0crPdu4qAnAZjOdTiWuBC2brYdlr+WzB3Fe07+UVoNbmblda5IuI8virb9YVy8kHWh4gHIZq6KtvJiyRl9CzbaVeyy7atf1rYjyWAEfIYcks3WBS54K/S3Y1Gt9OdH3AbOBVoGNU50u/1DvoxnoQkkpghIQZON5FDhbikmJHcghNBEoQvUUAGGkYoJGKo5nuSg6a2+GAUc/1FEsyt1xkKhUKkoacji2mIVV9hvNPni0JwJbscvR0qGiUTSSJcJh9NGJAxKBYR+JQTLFmqAcKc6voBHiOOsNTralnwgOgGOfmsk31JCEcy5i9V9TC5bjiA7QL9L5BGi0CNLEtP21md7W1w2us4rzq949eN/V419zXjufHCsA3HeGPsGx+NI+PEwLU/9fX6Vn3bPDavzB/mzzK0Xqs428aNY/76C1FsO58=
r(i)z 1[z = argmaxz0
p(y(i)|z0, x(i); ✓)]AAACnHicfZBraxNBFIYn662ul6b6UZDBIE2lhN1YqCJCwX4QSrGCSQvZdTmZPZsMnb0wc1aTrPuj/DXiN/0nziYRNBUPDDy85505c95xoaQhz/vecq5dv3Hz1tZt987de/e32zsPhiYvtcCByFWuL8ZgUMkMByRJ4UWhEdKxwvPx5Zumf/4JtZF59oHmBYYpTDKZSAFkpah9oqPFx6or9+pAYUKgdf6ZBynQNDaVX48WrwPQkxRmUbXYrXnRna/cXxa7+7MVvgpoigR7YdTueD1vWfwq+GvosHWdRTutl0GcizLFjIQCY0a+V1BYgSYpFNZuUBosQFzCBEcWM0jRhNVy65o/tUrMk1zbkxFfqn/eqCA1Zp6OrbPZx2z2GvGfvdg0D25Mp+RFWMmsKAkzsRqelIpTzptYeSw1ClJzCyC0tP/nYgoaBNnwXTc4RrugxlM77F2BGijXz6pltDKr7cKTYL+h/xlh9ttoybVh+5vRXoVhv+c/7/XfH3SO+uvYt9gj9oR1mc8O2RF7y87YgAn2lX1jP9hP57Fz7Jw4pyur01rfecj+Kmf4C67Yz8A=
Shared params, latent variable is ignored
no dropout
dropout
-
!22
Try Our Modified Dropout Strategy
参与投票的成员中,58% 反对该合同交易易。Source
Hypotheses z = 1AAADXHicfVJdaxQxFM3sVq1Tq62CL4IEl4GtTJeZKijKQkEFYRUruG1hsw6ZbHY2NPNBckd2dpxHf42v+mN88beY2Y617UovBE7OPcm99yRhJoUGz/tltdpr167fWL9pb9zavH1na/vuoU5zxfiQpTJVxyHVXIqED0GA5MeZ4jQOJT8KT17V+aMvXGmRJp+gyPg4plEipoJRMFSwbT10sm7xdf6SwIwD3ekTncdBuej71ecBzrqLf6lat3DPtrZjRC5hkxS0O7CdwnZITGHGqMTvumfX1VQY4jdB2Z27xQ7RIibA51BOKNCqIpJPYbRLZBotKy+urEmUiGYwNpVmFHARQJ9QFcV0HhS4ljZ06b+AXb9yL3S70v6g5s7VWpWYAYOtjtfzloFXgd+ADmriwDjqkEnK8pgnwCTVeuR7GYxLqkAwySub5JpnlJ3QiI8MTGjM9bhcvmSFHcNM8DRVZiWAl+z5EyWNtS7i0ChrY/XlXE3+LzfKYfp8XIoky4En7LTQNJcYUlx/CzwRijOQhQGUKWF6xWxGFWVgPo9NXnMzi+Lvzb0fMq4opOpx2Xhfmdki4tboKqFI/goNsm1jrH/ZxlVwuNfzn/T2Pj7t7LuNxevoAXqEushHz9A+eosO0BAx65v13fph/Wz9bq+1N9qbp9KW1Zy5hy5E+/4f2j0OpA==
z = 2AAADXHicfVJdaxNBFJ1Nqtat1baCL4IMhoVUtiEbhYoSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8YXf4uz6VrbRnph4cy5Z+bcOTtBKoWGbveX1Wiu3bh5a/22vXFn8+69re2dI51kivEhS2SiTgKquRQxH4IAyU9SxWkUSH4cnL6q+sdfuNIiiT9BnvJxRMNYTAWjYCh/23rkpO386/wlgRkHutsnOov8YtH3ys8DnLYX/1qVbuGeL23HiFzCJglod2A7ue2QiMKMUYnftc+Pq6ggwG/8oj13812iRUSAz6GYUKBlSSSfwmiPyCRcOi+u9SRKhDMYG6cZBZz70CdUhRGd+zmupDVdeC9gzyvdS9OujD+ouAteq5JFv+dvtbqd7rLwKvBq0EJ1HZpEHTJJWBbxGJikWo+8bgrjgioQTPLSJpnmKWWnNOQjA2MacT0uln+yxI5hJniaKPPFgJfsxR0FjbTOo8Aoq2D11V5F/q83ymD6fFyIOM2Ax+zMaJpJDAmungWeCMUZyNwAypQws2I2o4oyMI/HJq+5uYvi7825H1KuKCTqSVFnX5q7hcSt0HVCEf8VGmTbJljvaoyr4KjX8Z52eh+ftQ7cOuJ19BA9Rm3koX10gN6iQzREzPpmfbd+WD8bv5trzY3m5pm0YdV77qNL1XzwB9wVDqU=
z = 3AAADXHicfVJdaxNBFJ1Nqtatta2CL4IMhoVUtiHbCooSKKggRLGCaQuZuMxOJpuhsx/M3JVs1n301/iqP8aX/pbOpmttG+mFhTPnnplz5+wEqRQaut0/VqO5cuv2ndW79tq99fsbm1sPDnWSKcYHLJGJOg6o5lLEfAACJD9OFadRIPlRcPKm6h9940qLJP4CecpHEQ1jMRGMgqH8LeuJk7bz77PXBKYc6HaP6Czyi3nPK7/2cdqe/2tVurl7sbQdI3IJGyeg3b7t5LZDIgpTRiX+0L44rqKCAL/zi/bMzbeJFhEBPoNiTIGWJZF8AsMdIpNw4Ty/0ZMoEU5hZJymFHDuQ49QFUZ05ue4ktZ04b2CHa90r0y7NH6/4i55LUvmvT1/s9XtdBeFl4FXgxaq68Ak6pBxwrKIx8Ak1XrodVMYFVSBYJKXNsk0Tyk7oSEfGhjTiOtRsfiTJXYMM8aTRJkvBrxgL+8oaKR1HgVGWQWrr/cq8n+9YQaTl6NCxGkGPGbnRpNMYkhw9SzwWCjOQOYGUKaEmRWzKVWUgXk8NnnLzV0U/2jO/ZRyRSFRz4o6+9LcLSRuhW4Siviv0CDbNsF612NcBoe7HW+vs/v5eWvfrSNeRY/RU9RGHnqB9tF7dIAGiFk/rJ/WL+t347S50lxrrp9LG1a95yG6Us1HZ93tDqY=
It works! : )
Fifty-eight per cent of the members who voted opposed the contract deal . Of the members who voted , 58 % opposed the deal . Fifty-eight per cent of the voting members opposed the contract deal .
-
!23
Design Space
Model variantshard mixture
Regularization no dropout at E-step, dropout at M-step
maxz(1/K) · p(y|z, x; ✓)AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn+l7fe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbJatcVcWnPQeAjEgut/J6HFIupotpUJiwrW9kUtoMXvc5csJ7rBnneLv8m+F9YiM+iyrQqpIoUvguzcLvR3e3ODa6DYAkaYGl929M+igUpUpppwrFSw6Cb65HBUjPCaeWhQtEck3Oc0KGFGU6pGpn5LCvYtEwMx0Lam2k4Z69GGJwqNUsjq6xnpFZ9NXmjL1b1g9eym1qlheBqpSg93h8ZluWFphlZ1DQuONQC1jsEYyYp0XxmASaS2W9BMsESE203zfPQe2r/LelH+/qnnEqshXxu5lvEssr2IUF+jf4lxNNLoUWenUGw2vF1cLy3G7zc3fv8qnHoL6exCZ6CZ6ANAvAaHIIPoA8GgDjfnR/OT+eXu+UG7r77ZiHdcJYxT8A1c4/+AFwbI0U=
uniform prior
Training scheduleonline
Parameterizationshared
-
!24
Design Space
Model variantshard mixture
Regularization no dropout at E-step, dropout at M-step
maxz(1/K) · p(y|z, x; ✓)AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn+l7fe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbJatcVcWnPQeAjEgut/J6HFIupotpUJiwrW9kUtoMXvc5csJ7rBnneLv8m+F9YiM+iyrQqpIoUvguzcLvR3e3ODa6DYAkaYGl929M+igUpUpppwrFSw6Cb65HBUjPCaeWhQtEck3Oc0KGFGU6pGpn5LCvYtEwMx0Lam2k4Z69GGJwqNUsjq6xnpFZ9NXmjL1b1g9eym1qlheBqpSg93h8ZluWFphlZ1DQuONQC1jsEYyYp0XxmASaS2W9BMsESE203zfPQe2r/LelH+/qnnEqshXxu5lvEssr2IUF+jf4lxNNLoUWenUGw2vF1cLy3G7zc3fv8qnHoL6exCZ6CZ6ANAvAaHIIPoA8GgDjfnR/OT+eXu+UG7r77ZiHdcJYxT8A1c4/+AFwbI0U=
uniform prior
Training schedule
Parameterization
online
shared
D2: only one component gets trainedindependent
offline perform E-step for all training examples before M-step
interleave E-step and M-step for each mini-batch
-
!25
Design Space
Training schedule
Model variantshard mixture
soft mixture
Regularization no dropout at E-step, dropout at M-step
Xzp(z|x; ✓) · p(y|z, x; ✓)
AAADfnicfVJdb9MwFHUWPkb42kA88RJRhbYo65qBxCQ0aQgekCZEkeg2qe4ix3FTa04c2Q60ccOP4RX+EP8Gp+1gayeuEuno3HPutX1vlDMqVbf729qwb9y8dXvzjnP33v0HD7e2Hx1LXghM+pgzLk4jJAmjGekrqhg5zQVBacTISXT+rs6ffCVCUp59UdOcDFOUZHREMVKGCretJ54IyzPdou0KMjJSSAj+zYUpUuNY6qAalAcQiSRFk1CXzcrNW9OFelY2/ckCvoFqTBRqDx0PMp7UmlnpTy5oxztyvBUOCpqMl83q1CVxab6DwIc45kr6xgplkYbld7cV7B615/RaByhpTCRRutJhWdUGIyn/Vb3e5f21hegsqnRz4XwbZuFWo9vpzsNdB8ESNMAyeuYhezDmuEhJpjBDUg6Cbq6GGglFMSOVAwtJcoTPUUIGBmYoJXKo5wOsXM8wsTviwvyZcufsZYdGqZTTNDLKejByNVeT1+ZiWRe80l3XKsU5kyuHUqP9oaZZXiiS4cWZRgVzFXfrxXFjKghWbGoAwoKaa7l4jATCyqyX48D3xNxbkI+m+qecCKS4eKHnq0OzyrxDAv0a/U+IJhdCgxwzg2D1xdfB8V4neNnZ+/yqcegvp7EJnoJnoAUC8Bocgg+gB/oAWzPrh/XT+mUD+7m9Y+8upBvW0vMYXAl7/w++Cx42
Xz(1/K) · p(y|z, x; ✓)
AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn3mvfe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbFatcVcWnPQeAjEgut/J6HFIupotpUJiwrpIoUtoMXvc5csF7rBnneLv8W+F9YiM+iyrQWke/CLNxudHe7c4PrIFiCBlha3/a0j2JBipRmmnCs1DDo5npksNSMcFp5qFA0x+QcJ3RoYYZTqkZmPssKNi0Tw7GQ9mYaztmrEQanSs3SyCrrGalVX03e6ItVnfBadVOrtBBcrTxKj/dHhmV5oWlGFm8aFxxqAesdgjGTlGg+swATyey3IJlgiYm2m+Z56D21/5b0o83+KacSayGfm/kWsayyfUiQX6N/CfH0UmiRZ2cQrHZ8HRzv7QYvd/c+v2oc+stpbIKn4BlogwC8BofgA+iDASDOd+eH89P55W65gbvvvllIN5xlzBNwzdyjP6K6I2M=
maxz p(z|x; ✓) · p(y|z, x; ✓)AAADiXicfVJdb9MwFHUWPkbGxwaPvFhUVVsUlWYgMYEmDY0HpAlRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7bWsnrhLp6Nxzfa59b5RzpnSv99fZcO/cvXd/84G39fDR4yfbO0+PlSgkoQMiuJCnEVaUs4wONNOcnuaS4jTi9CQ6P6zzJ9+pVExk3/Qsp6MUJxkbM4K1pcIdBzZlWJ6ZNutUiNOxxlKKHxClWE9iZYJqWO4jLJMUT0NTtiqYt2cL9UXZ8qcL+B7pCdW4M/KaiIuk1lyU/vSS9ppHXnOFQ5Ilk6VZnbomLu23H/iIxEIr35YixWKqqDaVCcsKqSKF7eDVUWeuWDNbUdvGraS8Mri96sokxGdRZVoLnw9hFm43et3ePOA6CJagAZbRt2/aR7EgRUozTThWahj0cj0yWGpGOK08VCiaY3KOEzq0MMMpVSMzn2UFm5aJ4VhI+2caztnrFQanSs3SyCrrGanVXE3emotVfeANd1OrtBBcrTSlx3sjw7K80DQji57GBYdawHqHYMwkJZrPLMBEMnstSCZYYqLtpnke+kjtvSX9bE//klOJtZAvzXyLWFbZd0iQX6P/CfH0UmiRZ2cQrL74Ojje7Qavu7tf3zQO/OU0NsFz8AK0QQDeggPwCfTBABDnp/PL+e38cbfcwN1z3y2kG86y5hm4Ee7hP3bdI1Q=
maxz(1/K) · p(y|z, x; ✓)AAADiXichVJdb9MwFHUWPkbGxwaPvFhUVVsUjWYgMYEmDY0HpApRJLpNqrvIcdzUmhNHtgNtvDzza3iF38K/wWk7sbUTXNnS0bnn+l7fe6OcM6W73d/Ohnvr9p27m/e8rfsPHj7a3nl8rEQhCR0QwYU8jbCinGV0oJnm9DSXFKcRpyfR+VHtP/lKpWIi+6JnOR2lOMnYmBGsLRXuOLApw/LMtFmnQpyONZZSfIMoxXoSKxNUw/IAYZmkeBqaslXBvD1bqC/Klj9dwLdIT6jGnZHXRFwkteai9KeXtNfsec0VDkmWTJbJatcVcWnPQeAjEgut/J6HFIupotpUJiwrW9kUtoMXvc5csJ7rBnneLv8m+F9YiM+iyrQqpIoUvguzcLvR3e3ODa6DYAkaYGl929M+igUpUpppwrFSw6Cb65HBUjPCaeWhQtEck3Oc0KGFGU6pGpn5LCvYtEwMx0Lam2k4Z69GGJwqNUsjq6xnpFZ9NXmjL1b1g9eym1qlheBqpSg93h8ZluWFphlZ1DQuONQC1jsEYyYp0XxmASaS2W9BMsESE203zfPQe2r/LelH+/qnnEqshXxu5lvEssr2IUF+jf4lxNNLoUWenUGw2vF1cLy3G7zc3fv8qnHoL6exCZ6CZ6ANAvAaHIIPoA8GgDjfnR/OT+eXu+UG7r77ZiHdcJYxT8A1c4/+AFwbI0U=
uniform priorlearned prior
uniform prior
learned prior
offline
Parameterizationindependent
online
shared
-
!26
Metrics
BLEU (Papineni et al., 2002): modified n-gram precision metric for sentence similarity
from 0 (no overlap) to 100 (same)
-
!27
Metrics
• BLEU (quality): average BLEU of each hypothesis against the references
Source: Thanks a lot!
Hypo1: Merci! Hypo2: Merci merci! Hypo3: Merci beaucoup!
Ref1: Merci beaucoup! Ref2: Merci beaucoup. Ref3: Merci!
-
!28
Metrics
• BLEU (quality): average BLEU of each hypothesis against the references • Pairwise-BLEU (diversity): average BLEU over each pair of hypotheses
Source: Thanks a lot!
Hypo1: Merci! Hypo2: Merci merci! Hypo3: Merci beaucoup!
-
!29
Metrics
• BLEU (quality): average BLEU of each hypothesis against the references • Pairwise-BLEU (diversity): average BLEU over each pair of hypotheses
Also compute human BLEU and Pairwise-BLEU
-
!30
Datasets
WMT’17 English-German:
WMT’14 English-French:
WMT’17 Chinese-English:
#train, #ref #test, #ref
1
1
1
4.5M,
36M,
20M,
500,
500,
2001,
10
10
3
-
!31
Goal: High Quality and Diversity
WMT English-German (K=3)
bad, diverse
good, similar good, diverse : )
bad, similar
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
-
!32
Model Exploration
good, similar good, diverse : )
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
WMT English-German (K=3)
bad, diversebad, similar
-
!33
Model Exploration
WMT English-German (K=3)
bad, diverse
good, similar good, diverse : )
bad, similar
D1: latent var is ignored
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
-
!34
Model Exploration
WMT English-German (K=3)
bad, diverse
good, similar good, diverse : )
bad, similar
D1: latent var is ignored
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
-
!35
Model Exploration
WMT English-German (K=3)
bad, diverse
good, similar good, diverse : )
bad, similar
D1: latent var is ignored
D2: only one compo gets trained
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
-
!36
Model Exploration
WMT English-German (K=3)
bad, diverse
good, similar good, diverse : )
bad, similar
D1: latent var is ignored
D2: only one compo gets trained
• online shared has higher quality than offline indep • hard mixture is more diverse than soft mixture
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
-
!37
Winning Model
WMT English-German (K=3)
bad, diverse
good, similar good, diverse : )
bad, similar
• only one backward pass
• no prior predictor
• negligible extra params
• no responsibility storage
fast
simple
memory-
efficient
Human
shared params
indep params
hard mixture
soft mixture
online training
uniform prior
learned prior
offline trainingshared params
indep params
uniform prior
learned prior
Variational NMT
-
45
55
65
75
85
90 75 60 45 30
BLEU
35
45
55
65
75
Pairwise-BLEU
90 70 50 30 10
K=510 K=10K=3
20
0
10
20
30
40
90 70 50 30 10
Human Sampling Beam Diverse Beam Mixture Model
WMT English-German (10 refs) WMT English-French (10 refs) WMT Chinese-English (3 refs)
!38
Large Scale Evaluation
-
他 从不不 愿意 与 家⼈人 争吵 。
He never wanted to be in any kind of altercation .
He never liked to quarrel with his family . He never wants to quarrel with his family He never likes to argue with his family .
!39
Latent Variable Captures Consistent Translation Styles
不不断 的 恐怖袭击 显然 已 对 他 造成 很⼤大 打击 。
Repeat terror attacks on Turkey have clearly shaken him too .
The continuing terrorist attacks had apparently hit him hard . He is clearly already being hit hard by the continuing terrorist attacks . Repeated terrorist attacks have apparently hit him hard .
Source
Reference
hMup
Source
Reference
hMup
frequency of was, were, had: z=1’s > 3 * z=3’s
frequency of has, says: z=3’s > 2 * z=1’s
this vs. that, per cent vs. % …
-
• Conditional text generation is multi-model
!40
Conclusions
• Search for multiple modes is difficult
p(y|x)AAAB7nicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY8OKxgv2AdinZNG1Ds9mQZMVl7Y/w4kERr/4eb/4bs+0etPXBwOO9GWbmBZIzbVz32ymsrW9sbhW3Szu7e/sH5cOjto5iRWiLRDxS3QBrypmgLcMMp12pKA4DTjvB9CbzOw9UaRaJe5NI6od4LNiIEWys1JHV5OnxvDQoV9yaOwdaJV5OKpCjOSh/9YcRiUMqDOFY657nSuOnWBlGOJ2V+rGmEpMpHtOepQKHVPvp/NwZOrPKEI0iZUsYNFd/T6Q41DoJA9sZYjPRy14m/uf1YjO69lMmZGyoIItFo5gjE6HsdzRkihLDE0swUczeisgEK0yMTSgLwVt+eZW06zXvola/u6w06nkcRTiBU6iCB1fQgFtoQgsITOEZXuHNkc6L8+58LFoLTj5zDH/gfP4Ab0KO7A==
z1AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeCF48V7Qe0oWy2k3bpZhN2N0IN/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4Zua3H1FpHssHM0nQj+hQ8pAzaqx0/9T3+uWKW3XnIKvEy0kFcjT65a/eIGZphNIwQbXuem5i/Iwqw5nAaamXakwoG9Mhdi2VNELtZ/NTp+TMKgMSxsqWNGSu/p7IaKT1JApsZ0TNSC97M/E/r5ua8NrPuExSg5ItFoWpICYms7/JgCtkRkwsoUxxeythI6ooMzadkg3BW355lbRqVe+iWru7rNRreRxFOIFTOAcPrqAOt9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AELOo2W
z2AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mioMeCF48V7Qe0oWy2k3bpZhN2N0It/QlePCji1V/kzX/jts1BWx8MPN6bYWZemAqujet+O4W19Y3NreJ2aWd3b/+gfHjU1EmmGDZYIhLVDqlGwSU2DDcC26lCGocCW+HoZua3HlFpnsgHM04xiOlA8ogzaqx0/9Tze+WKW3XnIKvEy0kFctR75a9uP2FZjNIwQbXueG5qgglVhjOB01I305hSNqID7FgqaYw6mMxPnZIzq/RJlChb0pC5+ntiQmOtx3FoO2NqhnrZm4n/eZ3MRNfBhMs0MyjZYlGUCWISMvub9LlCZsTYEsoUt7cSNqSKMmPTKdkQvOWXV0nTr3oXVf/uslLz8ziKcAKncA4eXEENbqEODWAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QMMvo2X
z3AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0laQY8FLx4r2lZoQ9lsN+3SzSbsToQa+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IJHCoOt+O4W19Y3NreJ2aWd3b/+gfHjUNnGqGW+xWMb6IaCGS6F4CwVK/pBoTqNA8k4wvp75nUeujYjVPU4S7kd0qEQoGEUr3T316/1yxa26c5BV4uWkAjma/fJXbxCzNOIKmaTGdD03QT+jGgWTfFrqpYYnlI3pkHctVTTixs/mp07JmVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbzyM6GSFLlii0VhKgnGZPY3GQjNGcqJJZRpYW8lbEQ1ZWjTKdkQvOWXV0m7VvXq1drtRaVRy+Mowgmcwjl4cAkNuIEmtIDBEJ7hFd4c6bw4787HorXg5DPH8AfO5w8OQo2Y
It was rejected by 58 % of its members who voted in the ballot .
Of the members who voted , 58 % opposed the contract transaction .
Of the members who participated in the vote , 58 % opposed the contract .
• explicitly model uncertainty with latent variables
argmaxy1,··· ,yT
YTt=1
p(yt|y1:t�1, x; ✓)AAACknicfZBdb9MwFIbd8DXCVwfccWNRDW2oq+KCBBpCGhoXXIAYUrtNakrkOKettcSO7BPUyMsf459wxy38CtwuSLAhjmTp0XveY/u8aZlLi1H0rRNcuXrt+o2Nm+Gt23fu3utu3j+yujICxkLn2pyk3EIuFYxRYg4npQFepDkcp6cHq/7xFzBWajXCuoRpwedKzqTg6KWkO9oqt+uz5U4YczMv+DJxdcL6scg02n6djBoaW5mBBXSNSxy+Zs3nUROXRmfUTyZ4VieO7eEua/rLVzEuAPlO0u1Fg2hd9DKwFnqkrcNks/MkzrSoClAocm7thEUlTh03KEUOTRhXFkouTvkcJh4VL8BO3Xr9hm55JaMzbfxRSNfqnxOOF9bWReqdBceFvdhbif/qTSqcvZw6qcoKQYnzh2ZVTlHTVZY0kwYE5rUHLoz0f6ViwQ0X6BMPw/gt+GUMfPAXfyzBcNTmqWuDbvxy87i/ov8Zpfpt9BSGPll2McfLcDQcsGeD4afnvf1hm/EGeUQek23CyAuyT96RQzImgnwl38kP8jN4GOwFb4KDc2vQaWcekL8qeP8LYmHKxQ==
p(y|z, x)AAACoXicfVFdb9MwFHXD1whfHTzyYlEVbahUSUHaBEKaBA/wgCio3SY1xXKc29ZaEkf2DWrm5QfyE/gVvMIbThck2BBXsnR07rnHvsdxkUqDQfCt4125eu36ja2b/q3bd+7e627fPzSq1AKmQqVKH8fcQCpzmKLEFI4LDTyLUziKT143/aMvoI1U+QSrAuYZX+ZyIQVHR7Gu6Bc71dl61+9HXC8zvma2YuEgEolCM6jYpKaRkQkYQFtbZvFVWH+e1FGhVULdKMOzitnwBT4N68H6ZYQrQO7cTtnIb5xPB+td1u0Fw2BT9DIIW9AjbY3ZdudxlChRZpCjSLkxszAocG65RilSqP2oNFBwccKXMHMw5xmYud2kUdO+YxK6UNqdHOmG/XPC8syYKoudMuO4Mhd7Dfmv3qzExf7cyrwoEXJxftGiTCkq2kRLE6lBYFo5wIWW7q1UrLjmAt0H+H70BtwyGt474w8FaI5KP7Ft7LVbbhkNGvQ/ocx/Cx3yfZdseDHHy+BwNAyfDUcfn/cORm3GW+QheUR2SEj2yAF5S8ZkSgT5Sr6TH+Sn1/PeeWPv07nU67QzD8hf5c1+AUsgz5g=
-
!41
• Mixture models work pretty well but hardly explored for text generation
• Training is not obvious, sub-optimal design choices can lead to degeneracies
Conclusions Poster #106 tonight!
• A strong baseline for work on latent variable text modeling
• More applications to dialogue, image captioning, summarization…
• Code: https://github.com/pytorch/fairseqTraining schedule
Model variants
Parameterizationsharedindependent
hard Mixture
soft Mixture
online
Regularization no dropout at E-step, dropout at M-step
uniform priorlearned prior
uniform priorlearned prior
offline