# 概览

![img.png](/files/MNBJQqbcitvvTOW9rZwH)

## repository

[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT/)

[![Stargazers over time](https://starchart.cc/collabH/repository.svg)](/doc/readme.md)

### 概述

* 个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
* [在线文档](https://repository-1.gitbook.io/bigdata-growth/)

### RoadMap

![roadMap](/files/Ct4Cpb3NYWJedO2vWSst)

### 基础能力

#### 数据结构

#### 分布式理论

* [分布式架构](/doc/base/fen-bu-shi-li-lun/fen-bu-shi-jia-gou.md)

#### 计算机理论

* [LSM存储模型](/doc/base/ji-suan-ji-li-lun/lsm-cun-chu-mo-xing.md)

#### Scala

* [ScalaOverView](/doc/base/scala/scalaoverview.md)

#### JVM

#### Java

**并发编程**

* [认识并发编程](/doc/base/java/bing-fa-bian-cheng/ren-shi-bing-fa-bian-cheng.md)
* [并发工具包](/doc/base/java/bing-fa-bian-cheng/bing-fa-gong-ju-lei-concurrent.md)

**JDK源码**

**todo**

### 算法

* [算法题解](/doc/base/algorithm/suan-fa-ti-jie.md)

### BigData

#### cache

**数据编排技术**

**alluxio**

* [Alluxio概览](/doc/bigdata/cache/alluxio/alluxiooverview.md)
* [Alluxio部署](/doc/bigdata/cache/alluxio/alluxiodeployment.md)
* [Alluxio整合计算引擎](/doc/bigdata/cache/alluxio/alluxiowithengine.md)

#### datalake

**hudi**

* [Hudi概览](/doc/bigdata/datalake/hudi/hudioverview.md)
* [Hudi整合Spark](/doc/bigdata/datalake/hudi/hudiwithspark.md)
* [Hudi整合Flink](/doc/bigdata/datalake/hudi/hudiwithflink.md)
* [Hudi调优实践](/doc/bigdata/datalake/hudi/hudi-tiao-you-shi-jian.md)
* [Hudi原理分析](/doc/bigdata/datalake/hudi/hudi-yuan-li-fen-xi.md)
* [hudi数据湖实践](/doc/bigdata/datalake/hudi/hudi-shu-ju-hu-shi-jian.md)

**iceberg**

* [IceBerg概览](/doc/bigdata/datalake/iceberg/icebergoverview.md)
* [IceBerg整合Flink](/doc/bigdata/datalake/iceberg/icebergwithflink.md)
* [IceBerg整合Hive](/doc/bigdata/datalake/iceberg/icebergwithhive.md)
* [IceBerg整合Spark](/doc/bigdata/datalake/iceberg/icebergwithspark.md)

#### kvstore

**K-V结构存储,如Hbase、RocksDb(内嵌KV存储)等**

**rocksDB**

* [rocksDB概述](/doc/bigdata/kvstore/rocksdb/rocksdboverview.md)
* [rocksDB配置](/doc/bigdata/kvstore/rocksdb/rocksdb-pei-zhi.md)
* [rocksDB组件描述](/doc/bigdata/kvstore/rocksdb/rocksdb-zu-jian-miao-shu.md)
* [rocksdb on flink](/doc/bigdata/kvstore/rocksdb/rocksdb-on-flink.md)
* [rocksdb API](https://github.com/collabH/repository/blob/master/bigdata/kvstore/rocksdb/RocksDB%20API.xmind)

#### HBase

* [HBase概览](/doc/bigdata/kvstore/hbase/hbaseoverview.md)
* [HBaseShell](https://github.com/collabH/repository/blob/master/bigdata/kvstore/hbase/HBase%20Shell.xmind)
* [HBaseJavaAPI](https://github.com/collabH/repository/blob/master/bigdata/kvstore/hbase/HBase%20Java%20API.xmind)
* [HBase整合MapReduce](/doc/bigdata/kvstore/hbase/hbase-zheng-he-di-san-fang-zu-jian.md)
* [HBase过滤器](/doc/bigdata/kvstore/hbase/hbase-guo-lv-qi.md)

#### Hadoop

**广义上的Hadoop生态圈的学习笔记，主要记录HDFS、MapReduce、Yarn相关读书笔记及源码分析等。**

**HDFS**

* [Hadoop快速入门](https://github.com/collabH/repository/blob/master/bigdata/hadoop/Hadoop快速开始.xmind)
* [HDFSOverView](https://github.com/collabH/repository/blob/master/bigdata/hadoop/HDFS/HDFSOverView.xmind)
* [Hadoop广义生态系统](https://github.com/collabH/repository/blob/master/bigdata/hadoop/Hadoop广义生态系统.xmind)
* [Hadoop高可用配置](/doc/bigdata/hadoop/hadoop-gao-ke-yong-pei-zhi.md)
* [HadoopCommon分析](https://github.com/collabH/repository/blob/master/bigdata/hadoop/HDFS/HadoopCommon包分析.pdf)
* [HDFS集群相关管理](/doc/bigdata/hadoop/hdfs/hdfs-ji-qun-guan-li.md)
* [HDFS Shell](/doc/bigdata/hadoop/hdfs/hdfs-shell-ming-ling.md)

**MapReduce**

* [分布式处理框架MapReduce](/doc/bigdata/hadoop/mapreduce/fen-bu-shi-chu-li-kuang-jia-mapreduce.md)
* [MapReduce概览](https://github.com/collabH/repository/blob/master/bigdata/hadoop/MapReduce/MapReduceOverView.xmind)
* [MapReduce调优](https://github.com/collabH/repository/blob/master/bigdata/hadoop/MapReduce/MapReduce调优.xmind)
* [MapReduce数据相关操作](/doc/bigdata/hadoop/mapreduce/mapreduce-shu-ju-cao-zuo.md)
* [MapReduce输入输出剖析](/doc/bigdata/hadoop/mapreduce/mapreduce-shu-ru-shu-chu-pou-xi.md)
* [MapReduce的工作机制](/doc/bigdata/hadoop/mapreduce/mapreduce-de-gong-zuo-yuan-li-pou-xi.md)

**Yarn**

* [Yarn快速入门](/doc/bigdata/hadoop/yarn/yarn-kuai-su-ru-men.md)

**生产配置**

* [Hadoop高可用配置](/doc/bigdata/hadoop/hadoop-gao-ke-yong-pei-zhi.md)
* [Hadoop生产相关配置](/doc/bigdata/hadoop/yarn/hadoop-xiang-guan-zu-jian-sheng-chan-ji-bie-pei-zhi.md)

#### Engine

**计算引擎相关，主要包含Flink、Spark等**

**Flink**

* 主要包含对Flink文档阅读的总结和相关Flink源码的阅读，以及Flink新特性记录等等

**Core**

* [FlinkOverView](/doc/bigdata/engine/flink/core/flinkoverview.md)
* [CheckPoint机制](/doc/bigdata/engine/flink/core/checkpoint-ji-zhi.md)
* [TableSQLOverview](/doc/bigdata/engine/flink/core/tablesqloverview.md)
* [DataStream API](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/FlinkDataStream%20API.xmind)
* [ProcessFunction API](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/ProcessFunction%20API.xmind)
* [Data Source](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/Data%20Source.xmind)
* [Table API](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/TABLE%20API.xmind)
* [Flink SQL](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/FlinkSQL.xmind)
* [Flink Hive](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/Flink%20Hive.xmind)
* [Flink CEP](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/Flink%20Cep.xmind)
* [Flink Function](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/Flink%20Function.xmind)
* [DataSource API](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/core/Data%20Source.xmind)

**SourceCode**

* [FlinkCheckpoint源码分析](/doc/bigdata/engine/flink/sourcecode/flinkcheckpoint-yuan-ma-fen-xi.md)
* [FlinkSQL源码解析](/doc/bigdata/engine/flink/sourcecode/flinksql-yuan-ma-jie-xi.md)
* [Flink内核源码分析](/doc/bigdata/engine/flink/sourcecode/flink-nei-he-yuan-ma-fen-xi.md)
* [Flink网络流控及反压](/doc/bigdata/engine/flink/sourcecode/flink-wang-luo-liu-kong-ji-fan-ya.md)
* [TaskExecutor内存模型原理深入](/doc/bigdata/engine/flink/sourcecode/taskexecutor-nei-cun-mo-xing-yuan-li-shen-ru.md)
* [Flink窗口实现应用](/doc/bigdata/engine/flink/sourcecode/flink-chuang-kou-shi-xian-ying-yong-yuan-li.md)
* [Flink运行环境源码解析](/doc/bigdata/engine/flink/sourcecode/flink-yun-hang-huan-jing-yuan-ma-jie-xi.md)
* [FlinkTimerService机制分析](/doc/bigdata/engine/flink/sourcecode/flinktimerservice-ji-zhi-fen-xi.md)
* [StreamSource源解析](/doc/bigdata/engine/flink/sourcecode/streamsource-yuan-jie-xi.md)
* [Flink状态管理与检查点机制](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/sourcecode/Flink状态管理与检查点机制.xmind)

**Book**

**Flink内核原理与实现**

* [1-3章读书笔记](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/1-3章读书笔记.xmind)
* [第4章时间与窗口](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/第4章时间与窗口.xmind)
* [5-6章读书笔记](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/5-6章类型序列化和内存管理读书笔记.xmind)
* [第7章状态原理](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/第7章状态原理.xmind)
* [第8章作业提交](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/第8章作业提交.xmind)
* [第9章资源管理](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/第9章资源管理.xmind)
* [第10章作业调度](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/books/Flink内核原理与实现/第10章作业调度.xmind)
* [第11-13章Task执行数据交换等](/doc/bigdata/engine/flink/books/flink-nei-he-yuan-li-yu-shi-xian/di-1113-zhang-task-zhi-hang-shu-ju-jiao-huan-deng.md)

**Feature**

* [Flink1.12新特性](/doc/bigdata/engine/flink/feature/flink1.12-xin-te-xing.md)
* [Flink1.13新特性](/doc/bigdata/engine/flink/feature/flink1.13-xin-te-xing.md)
* [Flink1.14新特性](/doc/bigdata/engine/flink/feature/flink1.14-xin-te-xing.md)

**Practice**

* [Flink踩坑指南](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/practice/Flink踩坑.xmind)
* [记录一次Flink反压问题](/doc/bigdata/engine/flink/practice/ji-lu-yi-ci-flink-fan-ya-wen-ti.md)
* [Flink SQL实践调优](https://github.com/collabH/repository/blob/master/bigdata/engine/flink/practice/Flink%20SQL调优.xmind)
* [Flink On K8s实践](/doc/bigdata/engine/flink/practice/flink-on-k8s.md)

**Connector**

* [自定义Table Connector](/doc/bigdata/engine/flink/connector/zi-ding-yi-tableconnector.md)

**monitor**

* [搭建Flink任务指标监控系统](/doc/bigdata/engine/flink/monitor/da-jian-flink-ren-wu-zhi-biao-jian-kong-xi-tong.md)

**Spark**

**主要包含Spark相关书籍读书笔记、Spark核心组件分析、Spark相关API实践以及Spark生产踩坑等。**

* [Spark基础入门](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/Spark基础入门.xmind)
* [SparkOnDeploy](/doc/bigdata/engine/spark/sparkondeploy.md)
* [Spark调度系统](/doc/bigdata/engine/spark/spark-tiao-du-xi-tong.md)
* [Spark计算引擎和Shuffle](/doc/bigdata/engine/spark/spark-ji-suan-yin-qing-he-shuffle.md)
* [Spark存储体系](/doc/bigdata/engine/spark/spark-cun-chu-ti-xi.md)
* [Spark大数据处理读书笔记](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/Spark大数据处理读书笔记.xmind)

**Spark Core**

* [SparkCore](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/spark%20core/Spark%20Core.xmind)
* [SparkOperator](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/spark%20core/Spark%20Operator.xmind)
* [SparkConnector](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/spark%20core/Spark%20Connector.xmind)

**Spark SQL**

* [SparkSQLAPI](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/spark%20sql/Spark%20SQL%20API.xmind)
* [SparkSQL](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/spark%20sql/Spark%20SQL.xmind)
* [SparkSQL API](/doc/bigdata/engine/spark/spark-sql/sparksql-api.md)
* [SparkSQL优化分析](/doc/bigdata/engine/spark/spark-sql-1/sparksql-you-hua-fen-xi.md)

**Spark Practice**

* [Spark生产实践](/doc/bigdata/engine/spark/practice/spark-sheng-chan-shi-jian.md)

**Spark Streaming**

* [SparkStreaming](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/spark%20streaming/Spark%20Steaming.xmind)
* [SparkStreaming整合Flume](/doc/bigdata/engine/spark/spark-streaming/sparkstreaming-zheng-he-flume.md)

**源码解析**

* [从浅到深剖析Spark源码](/doc/bigdata/engine/spark/cong-qian-dao-shen-pou-xi-spark-yuan-ma.md)
* [源码分析系列](https://github.com/collabH/repository/blob/master/bigdata/engine/spark/源码分析/README.md)

#### Collect

**数据采集框架，主要包含Binlog增量与SQL快照方式框架**

#### Canal

* [CanalOverView](/doc/bigdata/collect/canal/canaloverview.md)

#### Debezium

* [DebeziumOverView](/doc/bigdata/collect/debezium/debeziumoverview.md)
* [Debezium踩坑](https://github.com/collabH/repository/blob/master/bigdata/collect/debezium/Debezium踩坑.xmind)
* [Debezium监控系统搭建](/doc/bigdata/collect/debezium/debezium-jian-kong-xi-tong-da-jian.md)
* [Debezium使用改造](/doc/bigdata/collect/debezium/debezium-shi-yong-gai-zao.md)

**Flume**

* [Flume快速入门](/doc/bigdata/collect/flume/flumeoverwrite.md)
* [Flume对接Kafka](/doc/bigdata/collect/flume/flume-dui-jie-kafka.md)

**Sqoop**

* [SqoopOverview](/doc/bigdata/collect/sqoop/sqoopoverview.md)
* [Sqoop实战操作](/doc/bigdata/collect/sqoop/sqoop-shi-zhan-cao-zuo.md)

#### MQ

**消息中间件相关，主要包含大数据中使用比较多的Kafka和Pulsar**

**Kafka**

* [kafka概览](https://github.com/collabH/repository/blob/master/bigdata/mq/kafka/KafkaOverView.xmind)
* [基本概念](/doc/bigdata/mq/kafka/ji-ben-gai-nian.md)
* [kafka监控](/doc/bigdata/mq/kafka/kafka-jian-kong.md)
* [生产者源码剖析](/doc/bigdata/mq/kafka/sheng-chan-zhe-yuan-ma-pou-xi.md)
* [消费者源码剖析](/doc/bigdata/mq/kafka/xiao-fei-zhe-yuan-ma-pou-xi.md)
* [kafkaShell](https://github.com/collabH/repository/blob/master/bigdata/mq/kafka/KafkaShell.xmind)
* [kafka权威指南读书笔记](https://github.com/collabH/repository/blob/master/bigdata/mq/kafka/kafka权威指南/README.md)
* [深入理解Kafka读书笔记](https://github.com/collabH/repository/blob/master/bigdata/mq/kafka/深入理解Kafka/README.md)

**Pulsar**

* [快速入门](/doc/bigdata/mq/pulsar/1.-kuai-su-ru-men.md)
* [原理与实践](/doc/bigdata/mq/pulsar/2.-yuan-li-yu-shi-jian.md)

#### Zookeeper

* [Zookeeper原理和参数配置](/doc/bigdata/zookeeper/zookeeperoverview.md)
* [Zookeeper操作与部署](/doc/bigdata/zookeeper/zookeeper-cao-zuo-yu-bu-shu.md)

#### schedule

**Azkaban**

* [Azkaban生产实践](/doc/bigdata/scheduler/azkaban-sheng-chan-shi-jian.md)

**DolphinScheduler**

* [DolphinScheduler快速开始](/doc/bigdata/scheduler/dolphinscheduler-kuai-su-kai-shi.md)

#### olap

**主要核心包含Kudu、Impala相关Olap引擎，生产实践及论文记录等。**

**Hive**

* [HiveOverwrite](/doc/bigdata/olap/hive/hiveoverwrite.md)
* [Hive SQL](https://github.com/collabH/repository/blob/master/bigdata/olap/hive/Hive%20SQL.xmind)
* [Hive调优指南](https://github.com/collabH/repository/blob/master/bigdata/olap/hive/Hive调优指南.xmind)
* [Hive踩坑解决方案](https://github.com/collabH/repository/blob/master/bigdata/olap/hive/Hive踩坑解决方案.xmind)
* [Hive编程指南读书笔记](https://github.com/collabH/repository/blob/master/bigdata/olap/hive/hive编程指南/README.md)
* [Hive Shell Beeline](/doc/bigdata/olap/hive/hive-shell-he-beeline-ming-ling.md)
* [Hive分区表和分桶表](/doc/bigdata/olap/hive/hive-fen-qu-biao-he-fen-tong-biao.md)

**Presto**

* [presto概述](/doc/bigdata/olap/presto/prestooverview.md)

**clickhouse**

* [ClickHouse快速入门](/doc/bigdata/olap/clickhouse/clickhouseoverview.md)
* [ClickHouse表引擎](https://github.com/collabH/repository/blob/master/bigdata/olap/clickhouse/ClickHouse表引擎.xmind)

**Druid**

* [Druid概述](/doc/bigdata/olap/druid/druidoverview.md)

**Kylin**

* [Kylin概述](/doc/bigdata/olap/kylin/kylinoverwrite.md)

**Kudu**

* [KuduOverView](/doc/bigdata/olap/kudu/kuduoverview.md)
* [Kudu表和Schema设计](/doc/bigdata/olap/kudu/kuduschemadesgin.md)
* [KuduConfiguration](/doc/bigdata/olap/kudu/kuduconfiguration.md)
* [Kudu原理分析](/doc/bigdata/olap/kudu/kudu-yuan-li-fen-xi.md)
* [Kudu踩坑](https://github.com/collabH/repository/blob/master/bigdata/olap/kudu/Kudu踩坑.xmind)
* [Kudu存储结构架构图](https://github.com/collabH/repository/blob/master/bigdata/olap/kudu/Kudu存储结构/README.md)
* [Kudu生产实践](/doc/bigdata/olap/kudu/kudu-sheng-chan-shi-jian.md)

**paper**

* [Kudu论文阅读](/doc/bigdata/olap/kudu/paper/kudupaper-yue-du.md)

**Impala**

* [ImpalaOverView](/doc/bigdata/olap/impala/impalaoverview.md)
* [ImpalaSQL](https://github.com/collabH/repository/blob/master/bigdata/olap/impala/Impala%20SQL.xmind)
* [Impala操作KUDU](/doc/bigdata/olap/impala/shi-yong-impala-cha-xun-kudu-biao.md)
* [Impala生产实践](/doc/bigdata/olap/impala/impala-sheng-chan-shi-jian.md)

#### graph

**图库相关**

**nebula graph**

* [1.简介](/doc/bigdata/graph/nebula-graph/1.-jian-jie.md)
* [2.快速入门](/doc/bigdata/graph/nebula-graph-1/2.-kuai-su-ru-men.md)

#### tools

**工具集相关，包含计算平台、sql语法Tree等**

**zeppelin**

* [zeppelin](https://github.com/collabH/repository/blob/master/bigdata/tools/zeppelin/Zeppelin.xmind)

**SQL语法树**

**calcite**

* [ApacheCalciteOverView](/doc/bigdata/tools/sqltree/calcite/calciteoverview.md)

### 数据仓库建设

#### 理论

* [数据建模](/doc/datawarehouse/li-lun/datamodeler.md)
* [数据仓库建模](https://github.com/collabH/repository/blob/master/datawarehouse/理论/数据仓库建模.xmind)
* [数据仓库](/doc/datawarehouse/li-lun/shu-ju-cang-ku-shi-zhan.md)

#### 数据中台设计

* [数据中台设计](/doc/datawarehouse/shu-ju-zhong-tai-mo-kuai-she-ji/shu-ju-zhong-tai-she-ji.md)
* [thoth自研元数据平台设计](/doc/datawarehouse/shu-ju-zhong-tai-mo-kuai-she-ji/thoth-zi-yan-yuan-shu-ju-ping-tai-she-ji.md)

#### 方案实践

* [Kudu数据冷备](/doc/datawarehouse/fang-an-shi-jian/kudu-shu-ju-leng-bei-fang-an.md)
* [基于Flink的实时数仓建设](/doc/datawarehouse/fang-an-shi-jian/ji-yu-flink-de-shi-shi-shu-cang-jian-she.md)

#### 读书笔记

* [数据中台读书笔记](/doc/datawarehouse/li-lun/shu-ju-zhong-tai-du-shu-bi-ji.md)

### devops

* [shell命令](https://github.com/collabH/repository/blob/master/devops/Shell学习.xmind)
* [Linux命令](https://github.com/collabH/repository/blob/master/devops/Linux学习.xmind)
* [openshift基础命令](/doc/datawarehouse/li-lun/devops/k8sopenshift-ke-hu-duan-ming-ling-shi-yong.md)

### maven

* [maven骨架制作](/doc/datawarehouse/li-lun/devops/maven/zhi-zuo-maven-gu-jia.md)
* [maven命令](/doc/datawarehouse/li-lun/devops/maven/maven-ming-ling.md)

### 服务监控

* [Prometheus](/doc/servicemonitor/prometheus/prometheus-shi-zhan.md)

### mac

* [iterm2](/doc/mac/iterm2.md)

## 贡献方式

* 欢迎通过[Gitter](https://gitter.im/collabH-repository/community)参与贡献
* [贡献者指南](/doc/contributing.md)

## 技术分享

![](/files/9XW8GwkGKOpOxMZNuzm9)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://shimin-huang.gitbook.io/doc/readme.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
