hello-algo/en/docs/chapter_backtracking/subset_sum_problem.md

# Subset-Sum Problem

## Without Duplicate Elements

!!! question

    Given a positive integer array `nums` and a target positive integer `target`, find all possible combinations where the sum of elements in the combination equals `target`. The given array has no duplicate elements, and each element can be selected multiple times. Return these combinations in list form, where the list should not contain duplicate combinations.

For example, given the set $\{3, 4, 5\}$ and target integer $9$, the solutions are $\{3, 3, 3\}, \{4, 5\}$. Note the following two points:

- Elements in the input set can be selected repeatedly without limit.
- Subsets do not distinguish element order; for example, $\{4, 5\}$ and $\{5, 4\}$ are the same subset.

### Reference to Full Permutation Solution

Similar to the full permutation problem, we can imagine the process of generating subsets as a series of choices, and update the "sum of elements" in real-time during the selection process. When the sum equals `target`, we record the subset to the result list.

Unlike the full permutation problem, **elements in this problem's set can be selected unlimited times**, so we do not need to use a `selected` boolean list to track whether an element has been selected. We can make minor modifications to the full permutation code and initially obtain the solution:

```src
[file]{subset_sum_i_naive}-[class]{}-[func]{subset_sum_i_naive}
```

When we input array $[3, 4, 5]$ and target element $9$ to the above code, the output is $[3, 3, 3], [4, 5], [5, 4]$. **Although we successfully find all subsets that sum to $9$, there are duplicate subsets $[4, 5]$ and $[5, 4]$**.

This is because the search process distinguishes the order of selections, but subsets do not distinguish selection order. As shown in the figure below, selecting 4 first and then 5 versus selecting 5 first and then 4 are different branches, but they correspond to the same subset.

![Subset search and boundary pruning](subset_sum_problem.assets/subset_sum_i_naive.png)

To eliminate duplicate subsets, **one straightforward idea is to deduplicate the result list**. However, this approach is very inefficient for two reasons:

- When there are many array elements, especially when `target` is large, the search process generates many duplicate subsets.
- Comparing subsets (arrays) is very time-consuming, requiring sorting the arrays first, then comparing each element in them.

### Pruning Duplicate Subsets

**We consider deduplication through pruning during the search process**. Observing the figure below, duplicate subsets occur when array elements are selected in different orders, as in the following cases:

1. When the first and second rounds select $3$ and $4$ respectively, all subsets containing these two elements are generated, denoted as $[3, 4, \dots]$.
2. Afterward, when the first round selects $4$, **the second round should skip $3$**, because the subset $[4, 3, \dots]$ generated by this choice is completely duplicate with the subset generated in step `1.`

In the search process, each level's choices are tried from left to right, so the rightmost branches are pruned more.

1. The first two rounds select $3$ and $5$, generating subset $[3, 5, \dots]$.
2. The first two rounds select $4$ and $5$, generating subset $[4, 5, \dots]$.
3. If the first round selects $5$, **the second round should skip $3$ and $4$**, because subsets $[5, 3, \dots]$ and $[5, 4, \dots]$ are completely duplicate with the subsets described in steps `1.` and `2.`

![Different selection orders leading to duplicate subsets](subset_sum_problem.assets/subset_sum_i_pruning.png)

In summary, given an input array $[x_1, x_2, \dots, x_n]$, let the selection sequence in the search process be $[x_{i_1}, x_{i_2}, \dots, x_{i_m}]$. This selection sequence must satisfy $i_1 \leq i_2 \leq \dots \leq i_m$; **any selection sequence that does not satisfy this condition will cause duplicates and should be pruned**.

### Code Implementation

To implement this pruning, we initialize a variable `start` to indicate the starting point of traversal. **After making choice $x_{i}$, set the next round to start traversal from index $i$**. This ensures that the selection sequence satisfies $i_1 \leq i_2 \leq \dots \leq i_m$, guaranteeing subset uniqueness.

In addition, we have made the following two optimizations to the code:

- Before starting the search, first sort the array `nums`. When traversing all choices, **end the loop immediately when the subset sum exceeds `target`**, because subsequent elements are larger, and their subset sums must exceed `target`.
- Omit the element sum variable `total` and **use subtraction on `target` to track the sum of elements**. Record the solution when `target` equals $0$.

```src
[file]{subset_sum_i}-[class]{}-[func]{subset_sum_i}
```

The figure below shows the complete backtracking process when array $[3, 4, 5]$ and target element $9$ are input to the above code.

![Subset-sum I backtracking process](subset_sum_problem.assets/subset_sum_i.png)

## With Duplicate Elements in Array

!!! question

    Given a positive integer array `nums` and a target positive integer `target`, find all possible combinations where the sum of elements in the combination equals `target`. **The given array may contain duplicate elements, and each element can be selected at most once**. Return these combinations in list form, where the list should not contain duplicate combinations.

Compared to the previous problem, **the input array in this problem may contain duplicate elements**, which introduces new challenges. For example, given array $[4, \hat{4}, 5]$ and target element $9$, the output of the existing code is $[4, 5], [\hat{4}, 5]$, which contains duplicate subsets.

**The reason for this duplication is that equal elements are selected multiple times in a certain round**. In the figure below, the first round has three choices, two of which are $4$, creating two duplicate search branches that output duplicate subsets. Similarly, the two $4$'s in the second round also produce duplicate subsets.

![Duplicate subsets caused by equal elements](subset_sum_problem.assets/subset_sum_ii_repeat.png)

### Pruning Equal Elements

To solve this problem, **we need to limit equal elements to be selected only once in each round**. The implementation is quite clever: since the array is already sorted, equal elements are adjacent. This means that in a certain round of selection, if the current element equals the element to its left, it means this element has already been selected, so we skip the current element directly.

At the same time, **this problem specifies that each array element can only be selected once**. Fortunately, we can also use the variable `start` to satisfy this constraint: after making choice $x_{i}$, set the next round to start traversal from index $i + 1$ onwards. This both eliminates duplicate subsets and avoids selecting elements multiple times.

### Code Implementation

```src
[file]{subset_sum_ii}-[class]{}-[func]{subset_sum_ii}
```

The figure below shows the backtracking process for array $[4, 4, 5]$ and target element $9$, which includes four types of pruning operations. Combine the illustration with the code comments to understand the entire search process and how each pruning operation works.

![Subset-sum II backtracking process](subset_sum_problem.assets/subset_sum_ii.png)