HashMap

发表于 2020-06-17

HashMap 是一种 key-value 结构的容器。

HashMap 在 JDK 1.7 和 JDK 1.8 中的实现有所不同，下面就这两种版本分别进行学习。

JDK 1.7

主要属性

首先列举出 HashMap 中一些主要的属性：

// 默认初始容量的取值, 必须为 2 的 n 次方
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16, Also Known As 16

// 默认负载因子的取值
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// 实际存放数据的数组
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

// Map 中实际存放的 key-value 的个数
transient int size;

// 下次对 table 进行扩容的阈值(threshold = capacity * load factor)
int threshold;

// 负载因子
final float loadFactor;

内部类

HashMap 中有一个 Entry 内部类，用于存放 Map 中元素的相关信息：

static class Entry<K,V> implements Map.Entry<K,V> {
    // 用户设置的 key
    final K key;

    // 用户存放的 value
    V value;

    // 用于实现链表数据结构
    Entry<K,V> next;

    // 保存 key 的 hash
    int hash;

    // 省略无关代码
}

根据 HashMap 的一些属性不难看出，HashMap 中采用数组 + 链表的数据结构。数组中的一个个位置被称为 bucket，每一个 bucket 都有唯一的序号与之对应。

也就是说，当不同的 key 在数组中的位置产生冲突时，HashMap 会采取链表的方式来解决冲突。

主要方法

构造方法

HahsMap 中提供了三种构造函数，分别为：

public HashMap();

public HashMap(int initialCapacity);

public HashMap(int initialCapacity, float loadFactor);

可以看出在创建 HashMap 时，可以直接指定初始容量和负载因子来调用第三个构造函数。

也可以不提供相关的参数，也就是使用第一个或者第二个构造函数，HashMap 会使用默认参数去调用第三个构造函数。

在构造函数中，HashMap 会对参数进行校验，然后初始化一些属性。但是此时不会创建 table。

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                            initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                            loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();     // HashMap 中的 init 方法什么也没做
}

put 方法

put 方法是常用的两个方法之一，用于将 key-value 存入到 HashMap 中。

HashMap 在初始化 table 属性上，采用了懒初始化的策略。构造函数中并没有初始化 table，而是在使用 put 操作时，对 table 进行初始化。

public V put(K key, V value) {
    // 判断 table 是否初始化
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    // 判断 key 是否为 null
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

put 操作有以下步骤：

判断 table 是否为空，若空则初始化
判断 key 是否为 null，若为 null 则使用 putForNullKey 方法来进行接下来的操作，说明 HashMap 支持 key 为 null
计算 key 的 hash
根据 hash 计算 key 在 table 中的下标
遍历相应下标位置的链表中的 Entry，如果 Entry 中的 key 和给定的 key 重复，则覆盖原来的 value，并返回 oldValue
如果进行到这一步，则说明没有重复的 key，则需要创建一个 Entry 来插入到链表中去
最后返回 null，说明当前 key 没有重复

get 方法

get 方法是 HashMap 中的另外一个常用方法，用于获取指定的 key 对应的 value。

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

在 get 方法中，先对 key 进行了判断，若 key 不为 null，则调用 getEntry 方法来查找相应的 Entry。

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
            e != null;
            e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

在 getEntry 方法中，用以下步骤来查找相应的 Entry。

根据 HashMap 中元素的数量来进行快速的判断。若 HashMap 为空，则直接返回 null
计算 key 的 hash
根据 hash 获取对应的 bucket，并遍历链表，查找并返回目标 Entry
若进行到这一步，则说明没有查找到对应的 Entry，应该返回 null

实现细节

前面学习了 HashMap 中的主要方法，那么在这些方法中用到的一些工具方法是如何实现的呢？

下面学习这些更细节的方法，对应了 HashMap 的其他功能，如计算 hash、计算 bucket 序号、扩容等。

hash 方法

hash() 方法是 HashMap 的关键方法，被用来计算 key 的 hash。

final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

在 hash 方法中，采用了以下步骤来计算 key 的 hash:

如果 hashSeed 不为 0 并且 key 是 String 类型，则使用 sun.misc.Hashing.stringHash32 来计算 hash，并直接返回
调用 key 的 hashCode
对 hashCode 进行位运算，最终返回结果

在 hash() 方法中，进行了多次位运算，目的是使 hash 结果更加随机，减少冲突。

indexFor 方法

indexFor 方法用于计算 hash 对应的 bucket 序号。这个方法非常简单，其实际上就相当于拿 hash 对 table 的长度取模。

static int indexFor(int h, int length) {
    return h & (length-1);
}

在实现时为了提高效率，使用了位运算代替取模的运算。能这么做的原因是 HashMap 规定了 table 的长度必须为 2 的次方，也就是说 a % b = a & (b - 1)，a = 2 ^ n。

addEntry 方法

addEntry() 方法用于创建并向 bucket 中添加一个 Entry，也就是包含了 key-value、hash、nextEntry 信息的对象。在 put 方法中，如果 key 没有重复，就会调用这个方法。

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

这个方法中先判断了当前 HashMap 是否需要扩容，再将 Entry 插入。

在扩容前必须满足必须满足以下两点条件：

HashMap 中已有的元素数量大于等于扩容阈值
被添加的 Entry 所在的 bucket 没有元素

其中，比较容易被忽略的是条件 2。这点也很容易理解，扩容的原因是防止每个 bucket 下的链表过长导致降低查询效率。但是如果要向空的 bucket 中插入 Entry，显然不会降低查询效率，因此可以延迟扩容时机。

transfer 方法

对 HashMap 扩容时，会先在 resize 方法中创建一个新的 table，再调用 tansfer 方法将原来的 Entry 转移到 newTable 中。

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

可以看出在这个方法中，有两层循环。第一层循环遍历了 table 中的 bucket，第二层循环遍历了每个 bucket 下的链表中的 Entry。

在遍历到 Entry 时，会根据参数 rehash 来判断是否重新计算 hash，接着再计算该 Entry 在 newTable 中的 bucket 序号。同时将该 Entry 插入到 bucket 的链表头部。

如果没有经过重新计算 hash 的话，那么一个元素在新的 bucket 中的下标只有两种可能。

多线程下的问题

首先要强调的是我们不应该在多线程下使用 HashMap。因为 HashMap 在多线程下可能会遇到同步的问题，这一点在 JDK 的文档中也提到了：

/**
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 */

具体来说， HashMap 在多线程下存在出现无限循环的可能。

让我们回到 transfer 方法中，在这个方法里会将 HashMap 中已有的 Entry 移动到 newTable 中。值得注意的是，在移动时采用了头插法，也就是说在移动完成后，每个 bucket 下的链表都会变成原来的倒序。

假设以下场景：

有两个线程共享了同一个 HashMap 实例。在某时刻，线程一认为 HashMap 需要扩容，假设线程一正在对含有三个 Entry 的链表进行操作，在while 循环中第一个语句（如下）刚执行完成后，就被线程调度系统挂起了。

Entry<K,V> next = e.next;

此时线程二也发现了需要扩容，假设线程二顺利完成了扩容操作，线程二完成对 HashMap 的操作后又运行了一段时间也被挂起了。这时看似没有问题，但是其实已经给线程一埋下了坑。

轮到线程一继续执行时，该线程操作的链表已经被倒序了，也就是说之前记录的 next 和 e 的值已经是不正确的了。当该线程继续完成转移操作后，链表中就会出现环。

当线程一完成扩容后，HashMap 的结构已经存在了错误，此时如果使用 get 方法查询某个 key，而这个 key 对应的 bucket 下的链表中又刚好存在环，那么就有无限循环的可能。

JDK 1.8

在 JDK 1.7 中使用了链表来解决冲突的问题。但是当冲突过多时，就会导致链表过长，影响查询效率。因此在 JDK 1.8 中采用了红黑树对这个问题进行了优化。

此外，JDK 1.8 中还添加了 putIfAbsent 等方法，下面的分析中也会提到。

主要属性

与之前的版本相比，主要多了几个属性：

// 将链表树化的阈值
static final int TREEIFY_THRESHOLD = 8;
// 将树还原成链表的阈值
static final int UNTREEIFY_THRESHOLD = 6;
// 树化时 HashMap 中至少含有 bucket 的数量
static final int MIN_TREEIFY_CAPACITY = 64;

增加树化的阈值，也就是当一个 bucket 中的元素达到这个阈值时，就会将链表转换为红黑树。

这里的 TREEIFY_THRESHOLD 是根据 bucket 内的元素个数的概率分布来确定的。一个 bucket 内的元素个数满足泊松分布，大致概率如下：

/**
* 0:    0.60653066
* 1:    0.30326533
* 2:    0.07581633
* 3:    0.01263606
* 4:    0.00157952
* 5:    0.00015795
* 6:    0.00001316
* 7:    0.00000094
* 8:    0.00000006
* more: less than 1 in ten million
*/

可以看见一个 bucket 内的元素大于 8 的概率已经很小了，所以此处选择了 8 作为树化的阈值。

对应的还有一个还原成链表的阈值 UNTREEIFY_THRESHOLD，这个阈值比树化的阈值小了 2，是为了防止频繁地在红黑树和链表之间转换。

最后还有一个 MIN_TREEIFY_CAPACITY，这个属性在 treeifyBin 中用到了。如果当前 table 中的 bucket 没有达到这个值，那么将会对 table 进行扩容，而不是直接进行树化的操作：

final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

内部类

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    // 省略无关代码
}

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;

    // 省略其他代码
}

在 JDK 1.8 中，将原来的 Entry 改名为 Node，同时又添加了用于红黑树的数据结构 TreeNode。

主要方法

put 方法

折叠/展开代码

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

public V putIfAbsent(K key, V value) {
    return putVal(hash(key), key, value, true, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

JDK 1.8 中的 put 方法复杂了很多，主要是按照以下的步骤进行：

判断 table 是否需要初始化（使用 resize() 进行初始化）
根据 hash 计算桶的位置，并判断桶是否为空，若是则直接创建新的 Node 插入并跳到第 7 步
判断 bucket 中的第一个 Node 的 key 与提供的 key 是否相等，相等则记录这个 Node 的引用并跳到第 6 步
判断第一个 Node 是否为 TreeNode 实例，也就是 bucket 中是否为红黑树，若是则调用 putTreeVal() 来写入数据，并跳转到第 6 步（调用 putTreeVal() 时，若提供的 key 在红黑树中已存在，则并不会修改原 TreeNode 中的 value，而是得到这个 TreeNode 的引用；若不存在，才会插入一个新的 TreeNode 并得到 null）
遍历链表，若提供的 key 在链表中已存在，则不会修改数据，得到一个已存在的 Node 的引用；若提供的 key 在链表中不存在，才会插入一个新的 Node 并得到 null，然后判断是否需要将链表进行树化
若在第3、4、5步中得到的是 null，则说明提供的 key 在 map中并不存在，可以直接跳到第 7 步；若得到的是一个 Node 的引用，就说明提供的 key 在 map 中已经存在，继续对 onlyIfAbsent 和 oldValue 进行判断，只有当 onlyIfAbsent 为 true 且 oldValue 不为 null时，才不会修改数据，最后直接返回 oldValue
判断是否需要对 map 进行扩容，最后返回 null

可以看到 JDK 1.8 中的 put 方法和 putIfAbsent 方法都是调用了 putVal 方法。

get 方法

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

get 方法中也是区分链表和红黑树进行查找的，具体步骤如下：

先快速判断 table 是否为空、对应 bucket 是否为空
判断 bucket 中的第一个 Node 是否为要查找的，若是则直接返回
判断 bucket 中的是否为红黑树，若是，则按照红黑树的方式进行查找并返回
遍历链表中的 Node，若找到结果则返回
进行到这一步则说明没有查找到，返回 null

实现细节

hash 方法

JDK 1.8 中的 hash 方法相比 1.7 中的要简化了不少，但是依然保留了将 hashcode 的高位参与运算。

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

resize 方法

JDK 1.8 中的 resize 方法的改动也很大，主要变化有以下两点：

添加了对树形 bucket 的扩容
对链表进行扩容时，保留了链表的顺序

在进行扩容时，链表内的元素会保持原来的顺序，这样不会导致 JDK 1.7 中的无限循环，但是要注意 HashMap 依然不是线程安全的。

折叠/展开代码

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }