Ha Pham

Data enthusiast.

Udacity Data Engineering Nanodegree Review (2022)

Introduction As someone transitioning from an analyst to a data engineering role, I struggled to find a standardized road map to follow, since there were so many resources to learn from, and it was very easy to get lost. After trying out a few free courses and also crafted my own learning path, I decided to take on the paid Udacity Data Engineering Nanodegree last September just to know if it has anything better than the free resources....

<span title='2023-07-26 00:00:00 +0700 +0700'>July 26, 2023</span>&nbsp;·&nbsp;6 min&nbsp;·&nbsp;Ha Pham

Generate Run-Length ID With SQL

What is run-length ID? Sometimes during analysis work, you need to group consecutive sequence of values into different “runs”, and calculate metrics for each run. For example: Given a time series recording some entity’s state, you want to calculate on average, how long does an entity stay in a particular state. Or a more specific example: Given a series of event data of several users, you want to group the users’ events into sessions with a session cut-off threshold of your choice This technique is also called sessionization....

<span title='2022-02-07 20:42:18 +0700 +0700'>February 7, 2022</span>&nbsp;·&nbsp;5 min&nbsp;·&nbsp;Ha Pham

How to run dbt CI with GitHub Action

If you are familiar with the modern data stack, probably dbt is no stranger. dbt tries to bring the best practices from the software engineering world into data development, and one of such practices is the idea of automated testing and continuous integration (CI). While dbt Cloud provides a “slim CI” feature that satisfies most basic needs, you will have more control over your CI jobs if you make use of your git provider’s CI/CD functions....

<span title='2021-11-13 12:39:24 +0700 +0700'>November 13, 2021</span>&nbsp;·&nbsp;9 min&nbsp;·&nbsp;Ha Pham

Vietnam bombing history with data - Part 2: Rolling Thunder Operation

You can read Part 1 here. Full code to produce the charts and report: https://github.com/hoanghapham/vietnam_war_bombing There are many notable operations happened during the Vietnam War: Rolling Thunder, Steel Tiger, Barrel Roll, Line Backer, Line Backer II… However, I chose to explore Rolling Thunder because of its interesting nature: the bombing strategy of the operation changed over time due to the U.S. policy against China and the Soviet. The evolution of the operation After Geneva conference in 1954, the U....

<span title='2019-04-13 00:00:00 +0700 +0700'>April 13, 2019</span>&nbsp;·&nbsp;8 min&nbsp;·&nbsp;Ha Pham

Vietnam bombing history with data - Part 1

History as taught in Vietnam schools is boring. Modern war history is even more boring, because of the very unattractive way textbooks present the narrative of war. We were taught that our army is brave, noble and great, and we had impossible feats considering the size and technology level of our country. However, I am always skeptical about all those teachings. History as told by only one side is never complete, and I want to know what “the other side” can tell me about the war....

<span title='2018-11-04 00:00:00 +0700 +0700'>November 4, 2018</span>&nbsp;·&nbsp;10 min&nbsp;·&nbsp;Ha Pham