{
    "version": "https://jsonfeed.org/version/1.1",
    "title": "Welcome on Shuyan Mei",
    "description": "Recent content in Welcome on Shuyan Mei",
    "home_page_url": "https://shuyanmei.github.io/",
    "feed_url": "https://shuyanmei.github.io/index.json",
    "language": "en-GB",
    "icon": "https://shuyanmei.github.io/apple-touch-icon.png",
    "favicon": "https://shuyanmei.github.io/apple-touch-icon.png",
    "authors": [
        {
            "name": "Shuyan Mei",
            "url": "shuyanmei.github.io",
            "avatar": "https://shuyanmei.github.io/path/to/some-image.jpg"
        }
    ],
    "items": [
        {
            "title": "A Summary of Book Trustworthy Online Controlled Experiments",
            "date_published": "2021-01-18T14:48:25-05:00",
            "date_modified": "2021-01-18T14:48:25-05:00",
            "id": "https://shuyanmei.github.io/documentation/fourth_post/",
            "url": "https://shuyanmei.github.io/documentation/fourth_post/",
            "content_html": "\u003cp\u003eRecently I finished reading the book Trustworthy Online Controlled Experiments, and here I put together my reading notes.\u003c/p\u003e\n\u003cp\u003eThe book has five parts. The first two parts are a high-level introduction of online controlled experiments. Why do we need to conduct controlled experiments? What is the evaluation metric for the experiments? The last three parts are more technically focused. The third part introduces some alternative methods when controlled experiments are not feasible. The fourth part focused on building the experimental platform. The last part mainly focused on how to analyze results from our experiments, what are the potential pitfalls and methods to improve it.\u003c/p\u003e\n\u003ch1 id=\"1-why-should-we-use-ab-testing\"\u003e1. Why should we use A/B Testing?\u003c/h1\u003e\n\u003cp\u003eImagine we introduce a new feature into our online service, and we see an increase in traffic. Can we claim that if we roll out the feature to all customers can increase user engagement? Not necessarily. The feature can be positively or negatively correlated with traffic or have nothing to do with traffic. A/B testing can help with establishing causality with high confidence, have more power to detect small/unexpected changes.\u003c/p\u003e\n\u003ch1 id=\"2-how-to-design-the-experiment\"\u003e2. How to design the experiment?\u003c/h1\u003e\n\u003ch2 id=\"21-create-metric-based-on-objective\"\u003e2.1 Create metric based on objective\u003c/h2\u003e\n\u003cp\u003eIn reality, we usually have more than one business metric to use in experimental design. These metrics must be measurable, attributable, sensitive, and timely. One way to combine multiple metrics is to normalize each metric and then create a weighted combination of them, such as credit score. To decide how many metrics to use, one rough rule of thumb is to limit the number of metrics to five. Because of multiple testing problem, if we have too many metrics, the probability of seeing a significant result is high.\u003c/p\u003e\n\u003ch2 id=\"22-aa-test\"\u003e2.2 A/A test\u003c/h2\u003e\n\u003ch3 id=\"why-we-need-aa-test\"\u003eWhy we need A/A test?\u003c/h3\u003e\n\u003cp\u003eA/A test is almost the same as the A/B test, but the treatment and control group receive the same treatment.\nSome benefits of running the A/A test.\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eEnsure no bias between the treatment and control group. For example, if we use the same users in the last experiment in the current experiment, there might be residual effects, tests from the last experiment can influence the current experiment.\u003c/li\u003e\n\u003cli\u003eAssess metric variability. If we have more and more data over time, we want to see how the data distribution change over time.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch3 id=\"how-to-run-aa-test\"\u003eHow to run A/A test?\u003c/h3\u003e\n\u003cp\u003eSimulate thousands of experiments, check if the distribution of p-value is uniform. Sometimes running thousands of tests can be expensive. One workaround is to use data from the previous experiment. For example, we stored the experiment results from last week, and then reassign the user into the treatment and control group, and then calculate the p-value. To check if the distribution follows a uniform distribution, we can run the goodness-of-fit test such as Anderson-Darling and Kolmogorov–Smirnov (KS) to check if it follows a uniform distribution.\u003c/p\u003e\n\u003ch3 id=\"aa-test-fails\"\u003eA/A test fails?\u003c/h3\u003e\n\u003cp\u003eThe reason could be outliers in the data or\nmetric has a highly skewed distribution. In this case, we can cap the data.\u003c/p\u003e\n\u003ch2 id=\"23-choose-significance-level-power\"\u003e2.3 Choose significance level, power\u003c/h2\u003e\n\u003cp\u003eThe significance level is the one we compare with the p-value, if the p-value is less than the significance level, then we can say that the p-value is significant, which means that we should reject the null hypothesis.\u003c/p\u003e\n\u003cp\u003ePower is the probability the test can detect the significance difference(positive) when it is actually positive.\u003c/p\u003e\n\u003ch2 id=\"24-calculate-sample-size\"\u003e2.4 Calculate sample size\u003c/h2\u003e\n\u003cp\u003eBased on the power and significance level, we can calculate the sample size we need for the experiment.\u003c/p\u003e\n\u003cp\u003eBesides, to play around with significance level and power, we can also transform or cap the metric to change the required sample size.\u003c/p\u003e\n\u003cp\u003eFor metrics that have higher skewness, if we do some transformation or cap it. Then the required sample size will be reduced.\u003c/p\u003e\n\u003ch2 id=\"25-decide-timepopulationunit-to-run-the-experiments\"\u003e2.5 Decide time/population/unit to run the experiments\u003c/h2\u003e\n\u003cp\u003eSince some users will only visit the website once in the online experiment, thus duration is also important when considering sample size.\u003c/p\u003e\n\u003cp\u003eThe usual approach is the randomization unit and analysis unit are the same. One example is the randomization unit is user, and the analysis unit is click-per-user, then the calculation is easier to compute.  But if the randomization unit is different than the analysis unit, for example, if the randomization unit is user, but the analysis unit is click-through-rate-per-page, then if a bot exists, it can generate thousands of page view using one user ID, in this case, we can limit the number of page-view per user to avoid such outliers. We need to use the bootstrap and delta method.\u003c/p\u003e\n\u003cp\u003eThe most common one is user-based randomization. We can track it by user login-ID, cookie ID.\u003c/p\u003e\n\u003ch2 id=\"26-analyze-results\"\u003e2.6 Analyze results\u003c/h2\u003e\n\u003ch3 id=\"irrelevant-metric-significant-multiple-testing-problem\"\u003eIrrelevant metric significant: multiple testing problem\u003c/h3\u003e\n\u003cp\u003eWhen we run thousands of tests, or on different metrics, it is likely we get significant results even they do not make sense. This is also known as the multiple testing problem. One solution is to separate metrics into different tiers. For each tier, we give them a different significance level.\nAnother common solution to the multiple testing problem is Bonferroni correction.\u003c/p\u003e\n\u003ch3 id=\"improve-powersensitivity\"\u003eImprove power/sensitivity\u003c/h3\u003e\n\u003col\u003e\n\u003cli\u003eChoose the metric that has a smaller variance\u003c/li\u003e\n\u003cli\u003eTransform metric through cap, binarization, log transformation.\u003c/li\u003e\n\u003cli\u003eTriggered analysis\u003c/li\u003e\n\u003cli\u003eStratification\u003c/li\u003e\n\u003cli\u003eRandomization at a more granular level\u003c/li\u003e\n\u003cli\u003ePaired experiment\u003c/li\u003e\n\u003cli\u003ePool control groups\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch3 id=\"sample-ratio-mismatch-srm\"\u003eSample Ratio Mismatch (SRM)\u003c/h3\u003e\n\u003cp\u003eSRM is a guardrail metric that ensures the validity of the experiment results. When we set up the experiment, we have a ratio of users between the treatment and control group, the experimental results should close to the experimental design ratio. When the p-value from the t-test or chi-square test is low, then there is a problem of SRM. The metrics we used are likely to be invalid.\u003c/p\u003e\n\u003ch3 id=\"novelty-effects\"\u003eNovelty effects\u003c/h3\u003e\n\u003cp\u003eWhen the new feature introduced, users might be uses a lot due to it is new, and as time goes by, the users might use it much less. One way to detect this is to plot usage over time.\u003c/p\u003e\n\u003ch1 id=\"3-when-is-ab-testing-not-a-good-idea\"\u003e3. When is AB testing not a good idea?\u003c/h1\u003e\n\u003cp\u003eThere are cases when A/B testing is not working.\u003c/p\u003e\n\u003ch3 id=\"1-we-can-not-control-user-behavior\"\u003e1. We can not control user behavior.\u003c/h3\u003e\n\u003cp\u003eFor example, we can not control certain user behavior. such as ask the user to switch their phone.\u003c/p\u003e\n\u003ch3 id=\"2-high-opportunity-cost\"\u003e2. High opportunity cost\u003c/h3\u003e\n\u003cp\u003eIf users do not receive treatment, we might lose money. For example, we want to run the ad on the event only happens once a year.\u003c/p\u003e\n\u003ch3 id=\"3-leakage-and-interference-between-variants\"\u003e3. Leakage and Interference between variants\u003c/h3\u003e\n\u003cp\u003eIf the users are interacting with each other, also known as the network effect, then control and treatment groups are not independent. In this case, we need to create isolation to make sure that the units in the treatment and control group are independent. For example, we can use geometric-based isolation when conducting design on a social network.\u003c/p\u003e\n\u003ch3 id=\"4-experiments-require-a-long-time-to-take-effects\"\u003e4. Experiments require a long time to take effects\u003c/h3\u003e\n\u003cp\u003eThere are some experiments that require a longer time to run. The long term and short term effect can be different. Below are several reasons.\u003c/p\u003e\n\u003cp\u003eUser-learned effect: The user might need a longer time to learn the new feature.\u003c/p\u003e\n\u003cp\u003eDelayed effect: There is a large time gap between the feature launch to the time the treatment takes effect. For example, there could be months between a customer book a hotel to actually go there.\u003c/p\u003e\n\u003cp\u003eNetwork effect: For example, in a two-sided marketplace, introduce a new feature can increase the demand, but the supply needs time to catch up, thus the treatment effects take longer to measure.\u003c/p\u003e\n\u003cp\u003eEcosystem change: Policy changes, seasonality, competitor\u0026rsquo;s similar features.\u003c/p\u003e\n\u003cp\u003eKeep the experiment running for a long time can introduce survivor bias, and the feature can also interact with other new features as time evolves.\u003c/p\u003e\n\u003cp\u003eAlternative methods such as cohort analysis, the reverse experiment can be used to measure the long-term effect.\u003c/p\u003e\n\u003ch1 id=\"4-alternative-methods-when-ab-testing-is-expensive-or-not-feasible\"\u003e4. Alternative methods when AB testing is expensive or not feasible\u003c/h1\u003e\n\u003cp\u003eThe observational causal study is one method when the controlled experiment is not feasible.\u003c/p\u003e\n\u003ch2 id=\"41-observational-causal-study\"\u003e4.1 observational causal study\u003c/h2\u003e\n\u003cp\u003eOutcome for treated -Outcome for untreated\n= Outcome for treated - Outcome for treated if not treated +\nOutcome for treated if not treated - Outcome for untreated if treated\n= Impact of treatment on treated + Selection Bias\u003c/p\u003e\n\u003cp\u003eIf it is a randomized controlled experiment, then the expected value of selection bias is zero. But in cases mentioned in part 4, it is not. That is why causal study comes into play. In contrast to A/B testing, the causal study has no randomized assignment on the unit, it looks at historical data. Though both causal study and retrospective data analysis are using historical study. The goals are different, the goal of the causal study is to find the causality relationship.\u003c/p\u003e\n\u003ch3 id=\"methods\"\u003eMethods\u003c/h3\u003e\n\u003ch4 id=\"1-interrupted-time-series-its\"\u003e1. Interrupted time series (ITS)\u003c/h4\u003e\n\u003cp\u003eITS is a quasi-experimental design, which we can control the change, but not randomize the unit. We test the treatment/control of the same population over time. The main confounding effect is the time-based effect such as seasonality.\u003c/p\u003e\n\u003ch4 id=\"2-interleaved-experiment\"\u003e2. Interleaved experiment\u003c/h4\u003e\n\u003cp\u003eInterleaved experiment is commonly used to evaluate the ranking algorithm. One example is to mix results from two algorithms, and then compare the click-through-rate from two algorithms.\u003c/p\u003e\n\u003ch4 id=\"3-regression-discontinuity-design-rdd\"\u003e3. Regression Discontinuity Design (RDD)\u003c/h4\u003e\n\u003cp\u003eRDD is commonly used when there is a clear threshold that identifies the treatment group. We select the group which has a threshold right above the threshold as the treatment group, and those with a threshold right below the threshold as the control group. By doing this, it can reduce selection bias. One main issue of RDD is again the confounding effects. The results can be contaminated if there is another important factor that has the same threshold. For example, if we want to test the alcohol assumption at legal drinking age 21, but gambling legal age is also 21.\u003c/p\u003e\n\u003ch4 id=\"4-instrumental-variableiv-and-natural-experiment\"\u003e4. Instrumental variable(IV) and Natural Experiment\u003c/h4\u003e\n\u003cp\u003eIV approximate the random assignment.  For example, to compare the earnings between Veterans, an instrument can be the Vietnam war draft lottery. Natural experiment\n, such as twins in twins medicine study.\u003c/p\u003e\n\u003ch4 id=\"5-propensity-score-matching\"\u003e5. Propensity score matching\u003c/h4\u003e\n\u003cp\u003eSimilar to stratified sampling, PSM segment users into groups by matching on a constructed propensity score.\u003c/p\u003e\n\u003ch4 id=\"6-difference-in-difference-dd\"\u003e6. Difference in difference (DD)\u003c/h4\u003e\n\u003cp\u003eWe make a treatment to treatment group at a certain time T, and then we note the difference of treatment group before and after time T, compare it with the difference of control group before and after time T. The difference in the control group over time can capture external factors such as seasonality, inflation.\u003c/p\u003e\n\u003ch3 id=\"pitfalls\"\u003ePitfalls\u003c/h3\u003e\n\u003cp\u003eConfounding effects and deceptive correlations are common pitfalls in observational causal studies.\u003c/p\u003e\n\u003ch2 id=\"42-more-methods\"\u003e4.2 More methods\u003c/h2\u003e\n\u003col\u003e\n\u003cli\u003eUser experience research\u003c/li\u003e\n\u003cli\u003eFocus group\u003c/li\u003e\n\u003cli\u003eSurvey\u003c/li\u003e\n\u003cli\u003eLog-based analysis\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe methods above can also be used when A/B testing is not feasible or expensive to run.\u003c/p\u003e\n\u003ch1 id=\"references\"\u003eReferences:\u003c/h1\u003e\n\u003cp\u003eKohavi, R., Tang, D., \u0026amp; Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge: Cambridge University Press. doi:10.1017/9781108653985\u003c/p\u003e\n"
        },
        {
            "title": "Optimization Learning Notes",
            "date_published": "2020-12-28T16:28:27-05:00",
            "date_modified": "2020-12-28T16:28:27-05:00",
            "id": "https://shuyanmei.github.io/documentation/optimization-notes/",
            "url": "https://shuyanmei.github.io/documentation/optimization-notes/",
            "content_html": "\u003cp\u003e\u003clink rel=\"stylesheet\" href=\"https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.6.0/katex.min.css\"\u003e\n  \u003cscript src=\"https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.6.0/katex.min.js\"\u003e\u003c/script\u003e\n  \u003cscript src=\"https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.6.0/contrib/auto-render.min.js\"\u003e\u003c/script\u003e\u003c/p\u003e\n\n\u003cp\u003eIn the past year, I started to pick up some optimization algorithms in work to solve problems like finding optimal prices to maximize business' profits with constraints. While memory is still fresh, I decided to write down my learning notes here. This is not an exhaustive survey of optimization algorithms, it only serves as the learning notes of the optimization algorithms which I have exposed so far.\u003c/p\u003e\n\n\u003ch2 id=\"optimization-overview\"\u003eOptimization Overview\u003c/h2\u003e\n\n\u003cp\u003eThere are different ways to categorize the optimization algorithm. Depends on the objective function, we can have linear or non-linear optimization. Based on the input type, we can have numeric optimization and discrete optimization. There are optimizations with constraints and without any constraints. Depends on the number of objective functions, we can have single and multiple objective optimizations.\u003c/p\u003e\n\n\u003ch2 id=\"1-no-constraints-and-differentiable-objective-function\"\u003e1. No constraints and differentiable objective function\u003c/h2\u003e\n\n\u003cp\u003eThe first scenario that comes to my mind is when we have a differentiable objective function without any constraints.\u003c/p\u003e\n\n\u003ch2 id=\"11-gradient-descent\"\u003e1.1 Gradient Descent\u003c/h2\u003e\n\n\u003cp\u003eWhen we are searching the values, Gradient descent tries to go in the direction such that the value of cost function f(x+\\delta x) at the next step is smaller than the current one f(x).\nTo find the direction of the movement, we take the derivative of the function at each step, assume the function is differentiable. Depends on how far we move each step, the algorithm can take a long time to converge, or even not converges.\u003c/p\u003e\n\n\u003ch2 id=\"12-newton-method\"\u003e1.2 Newton Method\u003c/h2\u003e\n\n\u003cp\u003eIf the cost function is also twice differnetiable, then we can use newton method, and quasi newton method according to Taylor expansion.\u003c/p\u003e\n\n\u003ch4 id=\"taylor-expansion\"\u003eTaylor Expansion\u003c/h4\u003e\n\n\u003cp\u003eGiven a real or complex twice differentiable function f, then the value at point \u003cspan  class=\"math\"\u003e\\(x_0\\)\u003c/span\u003e can be approximated as \u003cspan  class=\"math\"\u003e\\( f(x_0) + f'(x_0)(x-x_0) + \\frac{1}{2}f'(x_0)(x-a)^2 \\)\u003c/span\u003e\u003c/p\u003e\n\n\u003cp\u003eThe Newton Method, not only takes the direction of the movement but also the velocity(second derivative) into account. Therefore, using the Newton method is more efficient when updating each step. But sometimes we don't have the second derivative.\u003c/p\u003e\n\n\u003ch2 id=\"13-quasinewton-method\"\u003e1.3 Quasi-Newton Method\u003c/h2\u003e\n\n\u003cp\u003eTo solve the problem of the newton method in the case we don't have the second derivative, Quasi-Newton can be used. The main difference is that Quasi-Newton uses an approximation of the second derivative to replace the derivative to do the computation.\u003c/p\u003e\n\n\u003ch2 id=\"14-why-not-use-an-analytical-solution\"\u003e1.4 Why not use an analytical solution?\u003c/h2\u003e\n\n\u003cp\u003eConsider that since we can take the derivatives, why not just set the derivative of the objective function as zero, and then solve analytically. One main reason is that sometimes we have a huge dataset and multiple variables, the computation time can be longer if we need to do matrix transformation, but gradient descent or the newton method is iterative, so it can be less expensive.\u003c/p\u003e\n\n\u003ch2 id=\"2-not-differentiable\"\u003e2. Not differentiable?\u003c/h2\u003e\n\n\u003cp\u003eIn reality, we do not have such optimistic cases. Not every objective function is differentiable. Consider a discrete case below.\u003c/p\u003e\n\n\u003ch3 id=\"example-the-traveling-salesman\"\u003eExample, the traveling salesman\u003c/h3\u003e\n\n\u003cp\u003eThe traveling salesman is a classical discrete optimization problem. The salesman starting from city A, and travel N cities, and only one time for each city, and eventually come back to city A, what is the shortest path?\u003c/p\u003e\n\n\u003cp\u003eIn this case, we can not find an analytical solution.\nThe brute force solution is that we iterate all permutation which has a time complexity of O(N!). There are algorithms we can use here such as simulated annealing, GA, random hill climbing.\u003c/p\u003e\n\n\u003cp\u003eI summarize the algorithms below. These algorithms can be effective in discrete cases.\u003c/p\u003e\n\n\u003ch2 id=\"21-genetic-algorithm\"\u003e2.1 Genetic Algorithm\u003c/h2\u003e\n\n\u003cp\u003eGenetic algorithm is one type of evolutionary algorithm. The algorithm uses the idea from biology to mimic natural selection.\nTake the traveling salesman as an example. The genetic algorithm first randomly generates a population (a set of routes), and then rank the routes by fitness, in this case, it is the shortest distance. The next step is to randomly select two routes as the 'parent route'\nand pass the elements in each parent route to make a 'child'. This process is known as crossover. To explore more possibilities, the final step is to perform mutation which is randomly select two cities in each parent route to swap with a predefined probability(say 3%)\nThe child serves as the next generation and we repeat to full. Over time, it will generate a better(shorter distance) generation.\u003c/p\u003e\n\n\u003cp\u003eBecause of the mutation and crossover, We do not always reach the global optimal but can reach the local optimum fairly quickly.\u003c/p\u003e\n\n\u003ch2 id=\"22-simulated-annealing\"\u003e2.2 Simulated Annealing\u003c/h2\u003e\n\n\u003cp\u003eThis algorithm's idea comes from annealing the metal. If we cool the meta fast, then the irons in the meta are randomly spread, but if we cool it slowly, then it will be more structured, and more stable.\nThe algorithms work in the following way. We have an initial temperature, and in the next step, we evaluate the fitness of the route and decide whether to switch to the next possible route with a probability. The probability is associated with temperature. We decrease temperature over time, so we are less likely to back to the previous path. By doing this, we are less likely to be stuck at a local minimum. More likely to reach the global optimum.\u003c/p\u003e\n\n\u003ch2 id=\"23--hillclimbing-with-random-restart\"\u003e2.3  Hill-Climbing with Random Restart\u003c/h2\u003e\n\n\u003cp\u003eHill climbing is straightforward as its name suggests.  We start with a random path and find the neighbor path, compare it with the current path to see if it is better, if it is, then we select the next path. The problem is also about stuck at a local minimum. Then we introduce random restart into it, so it does not get into local optimum.\u003c/p\u003e\n\n\u003ch2 id=\"3-optimization-with-constraints\"\u003e3. Optimization with constraints\u003c/h2\u003e\n\n\u003cp\u003eIn reality, we usually have constraints when doing optimization. Based on the constraint type, there are different methods to optimize.\u003c/p\u003e\n\n\u003ch3 id=\"31-lagrange-multiplier-for-equality-constraint-only\"\u003e3.1 Lagrange multiplier for Equality constraint only\u003c/h3\u003e\n\n\u003cp\u003eIf the constraint can be expressed as equality,  Then we can use Lagrange Multiplier to solve the algorithm. For example, a retail business wants to maximize its profits given certain constraints of the budget. The cost is labor and raw material. Revenue is a function of labor and raw material. In this scenario, we want to maximize the revenue function f. Let x, y denote the labor cost and raw material. Then both f and the cost function g are functions of x and y. We want to max out the budget, thus g(x,y) ideally should be equal to budget (c).\u003c/p\u003e\n\n\u003cp\u003eThe optimization problem can be formulated as the following.\u003c/p\u003e\n\n\u003cp\u003e\u003cspan  class=\"math\"\u003e\\[\nmax f(x,y)\n\\]\u003c/span\u003e\u003c/p\u003e\n\n\u003cp\u003egiven the constraint that\n\u003cspan  class=\"math\"\u003e\\(\ng(x,y)= c\n\\)\u003c/span\u003e\u003c/p\u003e\n\n\u003cp\u003ewhere c is a constant.\u003c/p\u003e\n\n\u003cp\u003e\u003cimg src=\"/lm.png\" alt=\"drawing\" width=\"400\" height=\"400\"/\u003e\u003c/p\u003e\n\n\u003cp\u003eWe want the coutour to barely touch the constraints. To do that, the vector perpendicular to the tangent line at that intersection point should go the same direction as the gradient of the constraint function.\u003c/p\u003e\n\n\u003cp\u003eThat is to say,\u003c/p\u003e\n\n\u003cp\u003e\u003cspan  class=\"math\"\u003e\\[\n\\nabla f = \\lambda \\nabla g\n\\]\u003c/span\u003e\u003c/p\u003e\n\n\u003cp\u003ewhich is equivalent to\u003c/p\u003e\n\n\u003cp\u003e\u003cspan  class=\"math\"\u003e\\[\n\\frac{\\partial f}{\\partial x}  = \\lambda \\frac{\\partial g}{\\partial x}\n\\]\u003c/span\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cspan  class=\"math\"\u003e\\[ \n\\frac{\\partial f}{\\partial y}  = \\lambda \\frac{\\partial g}{\\partial y} \n\\]\u003c/span\u003e\u003c/p\u003e\n\n\u003cp\u003ewhere  \u003cspan  class=\"math\"\u003e\\( \\lambda \\)\u003c/span\u003e is a constant.\nSolve the equation above, we can get the value of x and y.\u003c/p\u003e\n\n\u003ch3 id=\"32-interior-point-method-for-inequality-constraints\"\u003e3.2 Interior point method for inequality constraints\u003c/h3\u003e\n\n\u003cp\u003eHowever, the above case is a very strict constraint. There are times we face an inequality constraint. In this case, we can use the interior point method such as the barrier function to convert it to a non-constrain problem and then solve it.\u003c/p\u003e\n\n\u003ch2 id=\"references\"\u003eReferences\u003c/h2\u003e\n\n\u003cp\u003e[1] \u003ca href=\"https://en.wikipedia.org/wiki/Quasi-Newton_method\"\u003ehttps://en.wikipedia.org/wiki/Quasi-Newton_method\u003c/a\u003e\u003c/p\u003e\n\n\u003cp\u003e[2] \u003ca href=\"https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/lagrange-multipliers-and-constrained-optimization/v/lagrange-multiplier-example-part-1\"\u003ehttps://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/lagrange-multipliers-and-constrained-optimization/v/lagrange-multiplier-example-part-1\u003c/a\u003e\u003c/p\u003e\n\n\u003cp\u003e[3] \u003ca href=\"https://en.wikipedia.org/wiki/Interior-point_method\"\u003ehttps://en.wikipedia.org/wiki/Interior-point_method\u003c/a\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cscript\u003erenderMathInElement(document.body);\u003c/script\u003e\u003c/p\u003e\n"
        },
        {
            "title": "Are you satisfied with your job as a developer?",
            "date_published": "2018-09-05T00:00:00Z",
            "date_modified": "2018-09-05T00:00:00Z",
            "id": "https://shuyanmei.github.io/documentation/third-post/",
            "url": "https://shuyanmei.github.io/documentation/third-post/",
            "content_html": "\u003cp\u003e\u003cimg src=\"/stackoverflow.jpg\" alt=\"\"\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eIntroduction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTech industry has been booming for years. We hear a lot of stories of peaks such as competitive salaries, work-life balance, unlimited vacation one can get working in a tech company.  But are the employees really happy with their jobs? What drives their satisfaction and what makes them to leave?\u003c/p\u003e\n\u003cp\u003eI used data from Stackoverflow\u0026rsquo;s 2017 Annual Developer Survey to investigate this problem.\u003c/p\u003e\n\u003cp\u003eThis survey has around 64000 reviews from 213 countries. The survey\u0026rsquo;s responses are mostly collected from developers and the questions asked in the survey are related to many aspects of developer\u0026rsquo;s job and career. Some of the aspects covered:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eHow do they break into this field at the first place?\u003c/li\u003e\n\u003cli\u003eThe developer\u0026rsquo;s education, especially coding background\u003c/li\u003e\n\u003cli\u003eThe developers' job responsibility and satisfaction\u003c/li\u003e\n\u003cli\u003eWhat makes them to looking for new opportunity? What they value most when they look for the next position?\u003c/li\u003e\n\u003cli\u003eThe developers' interaction with Stackoverflow.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eHere I am interested in dig dive into data to figure out three problems.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePart I How satisfied are you as a developer ?\u003c/strong\u003e\nThere is one rating question in the survey asking about the job satisfaction. The answer is rated from 0-10 which 10 represents highly satisfied and 0 represents highly dissatisfied. I first filter out responses with NA values. Below is a table showing the response counts and percentage for each rating.\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth style=\"text-align:center\"\u003eJob Satisfaction\u003c/th\u003e\n\u003cth style=\"text-align:center\"\u003eRating Counts\u003c/th\u003e\n\u003cth style=\"text-align:center\"\u003ePercentage\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e8.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e8983\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e22.25%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e7.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e7969\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e19.74%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e9.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e5573\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e13.8%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e6.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4726\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e11.7%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e10.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4148\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e10.27%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e5.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e3749\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e9.29%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e4.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1865\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4.62%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e3.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1635\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4.05%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e2.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e888\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e2.2%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e0.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e467\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1.16%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003e1.0\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e373\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e0.92%\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eHere, I used a metric called \u0026lsquo;top 3 box\u0026rsquo; to measure satisfaction. A Top 3 Box score summarizes the positive responses from a scale survey question. It combines the highest 3 responses of the scale to create one single number.\u003c/p\u003e\n\u003cp\u003eBelow plot shows the job satisfaction by country using the metric top 3 box. The countries I selected here have a response threshold of 500.\n\u003cimg src=\"/JSbyCty.png\" alt=\"\"\u003e\u003c/p\u003e\n\u003cp\u003eTop countries for satisfaction score are Netherlands, Canada,  Sweden and United States. All these four countries has over 50% top 3 box score.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePart II Does salary drive satisfaction? Is there anything else?\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThere are many factors which can drive job satisfaction, such as salary, health benefits and vacation. To figure out does salary drives job satisfaction. I check the average salary of the top five countries which has the highest average salary.\u003c/p\u003e\n\u003cp\u003eBelow tables shows the average salary of these countries.\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth style=\"text-align:center\"\u003eCountry\u003c/th\u003e\n\u003cth style=\"text-align:center\"\u003eAverage Salary\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eUnited States\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e86862.40\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eCanada\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e60821.54\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eUnited Kingdom\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e56086.99\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eGermany\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e44121.32\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eIndia\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e11603.47\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe top countries with high average salaries also have a high job satisfaction (except India). Salary does have some impact on the job satisfaction. In addition to salary, does the benefits also influence employees\u0026rsquo;s satisfaction?\u003c/p\u003e\n\u003cp\u003eOne of the survey\u0026rsquo;s question is:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u003cem\u003eWhen it comes to compensation and benefits, other than base salary, which of the following are most important to you?\u003c/em\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe following table shows the counts of each factors people think is most important to them.\u003c/p\u003e\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth style=\"text-align:center\"\u003eImportant Benefits\u003c/th\u003e\n\u003cth style=\"text-align:center\"\u003eCounts\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eVacation/days off\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e5757\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eHealth benefits\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4455\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eExpected work hours\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4288\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eRemote options\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e5008\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eRetirement\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e2658\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eAnnual bonus\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e2983\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eEquipment\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e4002\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eProfessional development sponsorship\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e3615\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eStock options\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1300\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eChild/elder care\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e694\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eLong-term leave\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1240\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eMeals\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1258\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eOther\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e247\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003ePrivate office\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e872\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eEducation sponsorship\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e1287\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eCharitable match\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e199\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"text-align:center\"\u003eNone of these\u003c/td\u003e\n\u003ctd style=\"text-align:center\"\u003e82\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe top three factosr are vacation, health benefits and expected work hours.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePart III Why people leaved their job?\u003c/strong\u003e\nTo figure out why people are leaving their job, I took a closer look at below question in the survey.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u003cem\u003eYou said before that you used to code as part of your job, but no longer do. To what extent do you agree or disagree with the following statements?\u003c/em\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe top three reasons for people to quit coding are:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eI don\u0026rsquo;t think my coding skills are up to date\u003c/li\u003e\n\u003cli\u003eIf money weren\u0026rsquo;t an issue, I would take a coding job again\u003c/li\u003e\n\u003cli\u003eMy career is going the way I thought it would 10 years ago\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eand they counts for 17%, 17% and 15% of the total respectively.\u003c/p\u003e\n\u003cp\u003eThe technical skills is the most essential for developer. One need to keep updated for their coding skills. Just as important as coding skills, money also factors into developer\u0026rsquo;s career decision. At the same time, some of the developers are looking for a career change and they do want to try out different things. That is also one reason they left their job.\u003c/p\u003e\n"
        },
        {
            "title": "Hello World!",
            "date_published": "2018-08-25T00:00:00Z",
            "date_modified": "2018-08-25T00:00:00Z",
            "id": "https://shuyanmei.github.io/documentation/heyyyy/",
            "url": "https://shuyanmei.github.io/documentation/heyyyy/",
            "content_html": "\u003cp\u003eHello, I finally got my first personal webpage set up!\u003c/p\u003e\n\u003cp\u003eI will use this website to share some of my projects in statistics, machine learning or programming.\u003c/p\u003e\n\u003cp\u003eThis website is hosted on Github and I used \u003ca href=\"https://gohugo.io/getting-started/quick-start/%22Hugo%22\"\u003eHugo\u003c/a\u003e and \u003ca href=\"https://themes.gohugo.io/gohugo-theme-ananke/\"\u003eAnanke\u003c/a\u003e to set up.\u003c/p\u003e\n"
        },
        {
            "title": "About",
            "date_published": "0001-01-01T00:00:00Z",
            "date_modified": "0001-01-01T00:00:00Z",
            "id": "https://shuyanmei.github.io/about/",
            "url": "https://shuyanmei.github.io/about/",
            "content_html": "\u003cp\u003eHey, nice to meet you here!\u003c/p\u003e\n\u003cp\u003eMy name is Shuyan, and I am a data scientist working in finance, located in Toronto.\u003c/p\u003e\n\u003cp\u003eThis is the page to share my thoughts on work, life, self reflection and some random staff.\u003c/p\u003e\n"
        },
        {
            "title": "Contact",
            "date_published": "0001-01-01T00:00:00Z",
            "date_modified": "0001-01-01T00:00:00Z",
            "id": "https://shuyanmei.github.io/contact/",
            "url": "https://shuyanmei.github.io/contact/",
            "content_html": "\n\n\n\u003cform class=\"black-80 sans-serif\" accept-charset=\"UTF-8\" action=\"https://formspree.io/f/mnqoobdo\" method=\"POST\" role=\"form\"\u003e\n\n    \u003clabel class=\"f6 b db mb1 mt3 sans-serif mid-gray\"  for=\"name\"\u003e\u003c/label\u003e\n    \u003cinput type=\"text\" id=\"name\" name=\"name\" class=\"w-100 f5 pv3 ph3 bg-light-gray bn\"  required placeholder=\"Your name..\"  aria-labelledby=\"name\"/\u003e\n\n    \u003clabel class=\"f6 b db mb1 mt3 sans-serif mid-gray\" for=\"email\"\u003e\u003c/label\u003e\n    \u003cinput type=\"email\" id=\"email\" name=\"email\" class=\"w-100 f5 pv3 ph3 bg-light-gray bn\"  required placeholder=\"Your email..\"  aria-labelledby=\"email\"/\u003e\n    \u003cdiv class=\"requirements f6 gray glow i ph3 overflow-hidden\"\u003e\n      \n    \u003c/div\u003e\n\n    \u003clabel class=\"f6 b db mb1 mt3 sans-serif mid-gray\" for=\"message\"\u003e\u003c/label\u003e\n    \u003ctextarea id=\"message\" name=\"message\" class=\"w-100 f5 pv3 ph3 bg-light-gray bn h4\" aria-labelledby=\"message\" placeholder=\"Your message..\"\u003e\u003c/textarea\u003e\n\n    \u003cinput class=\"db w-100 mv2 white pa3 bn hover-shadow hover-bg-black bg-animate bg-black\" type=\"submit\" value=\"send\" /\u003e\n\n\u003c/form\u003e\n\n"
        },
        {
            "title": "Search",
            "date_published": "0001-01-01T00:00:00Z",
            "date_modified": "0001-01-01T00:00:00Z",
            "id": "https://shuyanmei.github.io/search/",
            "url": "https://shuyanmei.github.io/search/",
            "content_html": "\u003cp class=\"error message js-hidden\"\u003eYou must have Javascript enabled to use this function.\u003c/p\u003e\n\u003cp class=\"search-loading status message hidden\"\u003eLoading search index…\u003c/p\u003e\n\n\u003cdiv class=\"search-input hidden\"\u003e\n  \u003cform id=\"search-form\" class=\"search-form\" action=\"#\" method=\"post\" accept-charset=\"UTF-8\" role=\"search\"\u003e\n    \u003clabel for=\"query\" class=\"visually-hidden\"\u003eSearch\u003c/label\u003e\n    \u003cinput type=\"search\" id=\"query\" name=\"query\" class=\"search-text\" placeholder=\"Enter the terms you wish to search for.\" maxlength=\"128\"\u003e\n    \u003cbutton type=\"submit\" name=\"submit\" class=\"form-submit\" \u003eSearch\u003c/button\u003e\n  \u003c/form\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"search-results\"\u003e\u003c/div\u003e\n\n\u003ctemplate\u003e\n  \u003carticle class=\"search-result list-view\"\u003e\n    \u003cheader\u003e\n      \u003ch2 class=\"title title-submitted\"\u003e\u003ca href=\"#\"\u003eTitle here\u003c/a\u003e\u003c/h2\u003e\n      \u003cdiv class=\"submitted\"\u003e\u003ctime class=\"created-date\"\u003eDate here\u003c/time\u003e\u003c/div\u003e\n    \u003c/header\u003e\n    \u003cdiv class=\"content\"\u003eSummary here\u003c/div\u003e\n  \u003c/article\u003e\n\u003c/template\u003e\n\n"
        }
        ]
}
