398. 随机数索引 [medium]

https://leetcode-cn.com/problems/random-pick-index/

给定一个可能含有重复元素的整数数组，要求随机输出给定的数字的索引。您可以假设给定的数字一定存在于数组中。

注意：数组大小可能非常大。使用太多额外空间的解决方案将不会通过测试。

示例:

int[] nums = new int[] {1,2,3,3,3};
Solution solution = new Solution(nums);

// pick(3) 应该返回索引 2,3 或者 4。每个索引的返回概率应该相等。
solution.pick(3);

// pick(1) 应该返回 0。因为只有nums[0]等于1。
solution.pick(1);

通过次数7,389 | 提交次数11,809

Second Try

2020-09-21

传说中的蓄水池算法，在一个非常长且不确定长度的数据流中随机选择m个数据。可以认为是开水龙头随机选水滴，要求关水龙头的时候被选中的m个水滴是随机的。

对于当前题目，其实理解了原理之后就很简单了，统计当前target已经出现了count次，那么新出现的target所属的index是否要替代当前记录的index？只要用随机数在[1, count + 1]中进行整数筛选，如果数值为count + 1, 则更换为新index，否则保留。用这样的方法，可以保证新出现的target，其实跟前面所有已经出现并消失的target始终拥有相同的权限，因此可以认为符合随机筛选。

class Solution:

    def __init__(self, nums: List[int]):
        self.nums = nums 

    def pick(self, target: int) -> int:
        cache = [-1, 0]
        for idx, n in enumerate(self.nums):
            if n == target:
                ori_idx, count = cache 
                if 1 <= random.randint(1, count + 1) <= count:
                    cache = [ori_idx, count + 1]
                else:
                    cache = [idx, count + 1]
        return cache[0]
        

执行用时：420 ms, 在所有 Python3 提交中击败了27.19%的用户
内存消耗：16.9 MB, 在所有 Python3 提交中击败了80.63%的用户

improved version

不使用列表，写法看起来简化了一些，速度提升了一点。

class Solution:

    def __init__(self, nums: List[int]):
        self.nums = nums 

    def pick(self, target: int) -> int:
        sel_idx = -1
        count = 0 
        for idx, n in enumerate(self.nums):
            if n == target:
                if random.randint(1, count + 1) == count + 1:
                    sel_idx = idx                     
                count += 1
        return sel_idx 

执行用时：364 ms, 在所有 Python3 提交中击败了72.36%的用户
内存消耗：16.7 MB, 在所有 Python3 提交中击败了100.00%的用户

failed try

这个写法本身没有问题，之所以错误仅仅是对题意和测试集的理解不同导致。

这个写法理解是有超长的数据流只流过一次，但是可以pick多个不同的target，这样需要保留每个数字的概率index。也就是数据流过的时候进行概率处理，但是pick的时候直接取值就行了。

看测试集，只load了一次数据，然后不停执行pick，因此其意思是每次都要重新过一遍数据流，这个写法每次pick的结果都是固定，也就不通过了。

class Solution:

    def __init__(self, nums: List[int]):
        self.cache = dict() 
        for idx, n in enumerate(nums):
            if n not in self.cache:
                self.cache[n] = [idx, 1]
                continue 
            oidx, count = self.cache[n]
            # 新的数量为n + 1,使用区间随机正数进行概率筛选，确保与之前的所有数字都在同样的保留概率
            # 在区间(1, n + 1)的结果为n + 1，则保留新数值，否则还是保留旧数值
            if 1 <= random.randint(1, count + 1) <= count:
                self.cache[n] = [oidx, count + 1]
            else:
                self.cache[n] = [idx, count + 1]

    def pick(self, target: int) -> int:
        return self.cache[target][0]

First Try

2020-09-21

在不了解什么是蓄水池算法的时候写出来的直观版本，只要记录每个numer的index列表，最后进行random choice就行了，很朴素，结果也通过了。

class Solution:

    def __init__(self, nums: List[int]):
        self.cache = collections.defaultdict(list)
        for idx, n in enumerate(nums):
            self.cache[n].append(idx)

    def pick(self, target: int) -> int:
        return random.choice(self.cache[target])
        


# Your Solution object will be instantiated and called as such:
# obj = Solution(nums)
# param_1 = obj.pick(target)

执行用时：476 ms, 在所有 Python3 提交中击败了6.52%的用户
内存消耗：22.6 MB, 在所有 Python3 提交中击败了9.09%的用户