本文共 25328 字,大约阅读时间需要 84 分钟。
sql索引面试
A Heap table is a table in which, the data rows are not stored in any particular order within each data page. In addition, there is no particular order to control the data page sequence, that is not linked in a linked list. This is due to the fact that the heap table contains no clustered index.
堆表是一个表,其中的数据行未按任何特定顺序存储在每个数据页内。 此外,没有特殊的顺序来控制数据页顺序,即未在链接列表中链接。 这是由于堆表不包含聚簇索引这一事实。
A clustered table is a table that has a predefined clustered index on one column or multiple columns of the table that defines the storing order of the rows within the data pages and the order of the pages within the table, based on the clustered index key.
聚集表是一种在表的一列或多列上具有预定义聚集索引的表,该索引基于聚集索引键定义数据页内行的存储顺序和表内页的顺序。
The heap table can be identified by querying the system object that has one row per each partition with index_id value equal to 0. You can also query the sys.indexes system object also to show the heap table index details, that shows, the id of that index is 0 and the type of it is HEAP.
可以通过查询系统对象来识别堆表,该对象在每个分区中每个行都有一行,且index_id值等于0。您还可以查询sys.indexes系统对象,以显示堆表索引详细信息,该详细信息显示,该索引的ID为0,类型为HEAP。
For more information, see the article: .
有关更多信息,请参见文章: 。
SQL Server Engine uses an Index Allocation Map (IAM) to keep an entry for each page to track the allocation of these available pages. The IAM is considered as the only logical connection between the data pages, that the SQL Server Engine will use to move through the heap.
SQL Server Engine使用索引分配映射 (IAM)为每个页面保留一个条目,以跟踪这些可用页面的分配。 IAM被认为是数据页之间唯一的逻辑连接,SQL Server Engine将使用它们在堆中移动。
For more information, see the article: .
有关更多信息,请参见文章: 。
When a data modification operation is performed on heap table data pages, Forwarding Pointers will be inserted into the heap to point to the new location of the moved data. These forwarding pointers will cause performance issues over time due to visiting the old/original location vs the new location specified by the forwarding pointers to get a specific value.
在堆表数据页上执行数据修改操作时,会将转发指针插入到堆中,以指向已移动数据的新位置。 随着时间的流逝,这些转发指针将导致性能问题,原因是访问转发指针指定的旧位置/原始位置与新位置以获得特定值。
Starting from SQL Server version 2008, a new method was introduced to overcome the forwarding pointers performance issue, by using the ALTER TABLE REBUILD command, that will rebuild the heap table.
从SQL Server 2008版开始,引入了一种新方法,该方法通过使用ALTER TABLE REBUILD命令来克服转发指针性能问题,该方法将重建堆表。
For more information, see the article: .
有关更多信息,请参见文章: 。
A SQL Server index is considered as one of the most important factors in the performance tuning process. Indexes are created to speed up the data retrieval and the query processing operations from a database table or view, by providing swift access to the database table rows, without the need to scan all the table’s data, in order to retrieve the requested data.
SQL Server索引被认为是性能调整过程中最重要的因素之一。 通过提供对数据库表行的快速访问,而无需扫描所有表的数据即可检索索引,从而创建索引以加快从数据库表或视图的数据检索和查询处理操作。
You can imagine a table index akin to a book’s index that allows you to find the requested information very fast within your book, rather than reading all the book pages in order to find a specific item you are looking for.
您可以想象一个类似于书本索引的表索引,它使您可以在书本中快速找到所需的信息,而不是阅读所有书本页以查找所需的特定项目。
For more information, see the article: .
有关更多信息,请参见文章: 。
A SQL Server index is created using the shape of B-Tree structure, that is made up of 8K pages, with each page, in that structure, called an index node. The B-Tree structure provides the SQL Server Engine with a fast way to move through the table rows based on index key, that decides to navigate left or right, to retrieve the requested values directly, without scanning all the underlying table rows. You can imagine the potential performance degradation that may occur due to scanning large database table.
使用B-Tree结构的形状创建SQL Server索引,该结构由8K页组成,该结构中的每一页称为索引节点。 B树结构为SQL Server引擎提供了一种快速的方法,该方法可以基于索引键在表行中移动,该索引键决定向左或向右导航,以直接检索请求的值,而无需扫描所有基础表行。 您可以想象由于扫描大型数据库表可能导致的潜在性能下降。
The B-Tree structure of the index consists of three main levels:
索引的B树结构包含三个主要级别:
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
On Online Transaction Processing (OLTP) databases, workloads are used for transactional systems, in which most of the submitted queries are data modification queries.
在联机事务处理(OLTP)数据库上,工作负载用于事务系统,其中大多数提交的查询是数据修改查询。
In contrast, Online Analytical Processing (OLAP) database workloads are used for data warehousing systems, in which most of the submitted queries are data retrieval queries that filter, group, aggregate and join large data sets quickly.
相反, 在线分析处理(OLAP)数据库工作负载用于数据仓库系统,其中大多数提交的查询是数据检索查询,这些查询可以快速过滤,分组,聚合和联接大型数据集。
Creating a large number of indexes on a database table affects data modification (e.g. Updates) operation performance. When you add or modify a row in the underlying table, the row will also be adjusted appropriately in all related table indexes. Because of that, you need to avoid creating a large number of indexes on the heavily modified tables and create the minimum possible number of indexes, with the least possible number of columns on each index. For Online Analytical Processing (OLAP) workloads, in which tables have low modification requirements, you can create a large number of indexes that improve the performance of the data retrieval operations
在数据库表上创建大量索引会影响数据修改(例如,更新)操作的性能。 在基础表中添加或修改行时,该行还将在所有相关表索引中进行适当调整。 因此,您需要避免在经过大量修改的表上创建大量索引,并创建尽可能少的索引,而在每个索引上创建尽可能少的列。 对于表具有较低修改要求的联机分析处理(OLAP)工作负载,您可以创建大量索引来改善数据检索操作的性能
For more information, see the article: .
有关更多信息,请参见文章: 。
It takes the SQL Server Engine less time scanning the underlying table than traversing the index when searching for specific data. In this case, the index will not be used but it will still negatively affect the performance of data modification operations, as it will be always adjusted when modifying the underlying table’s data.
搜索特定数据时,与遍历索引相比,SQL Server Engine花费更少的时间扫描基础表。 在这种情况下,将不使用该索引,但它仍然会对数据修改操作的性能产生负面影响,因为在修改基础表的数据时将始终对其进行调整。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
Setting the ONLINE option to ON when you create or rebuild the index will enable other data retrieving or modification processes on the underlying table to continue, preventing the index creation process from locking the table. On the other hand, the ONLINE index creation or rebuilding process will take longer time than the offline default index creation process.
在创建或重建索引时,将ONLINE选项设置为ON将使基础表上的其他数据检索或修改过程继续进行,从而防止索引创建过程锁定表。 另一方面,联机索引创建或重建过程将比脱机默认索引创建过程花费更长的时间。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
SQL Server allows us to create only one Clustered index per each table, as the data can be sorted in the table using only one order criteria.
SQL Server允许我们在每个表中仅创建一个聚集索引,因为只能使用一个顺序条件在表中对数据进行排序。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
For GUID columns, that are stored in UNIQUE IDENTIFIER columns, the main challenge that affects the clustered index key sorting performance is the nature of the GUID value that is larger than the integer data types, with 16 bytes size, and that it is generated in random manner, different from the IDENTITY integer values that are increasing continuously.
对于存储在UNIQUE IDENTIFIER列中的GUID列,影响聚簇索引键排序性能的主要挑战是GUID值的性质,该值大于16位字节的整数数据类型,并且在随机方式,不同于不断增加的IDENTITY整数值。
For the CHARACTER columns. The main challenges include limited sorting performance of the character data types, the large size, non-increasing values, non-static values that often tend to change in the business applications and not compared as binary values during the sorting process, as the characters comparison mechanism depends on the used collation.
对于CHARACTER列。 主要挑战包括字符数据类型的排序性能有限,大小大,值不递增,非静态值,这些值通常会在业务应用程序中发生变化,并且在排序过程中不作为二进制值进行比较,因为字符比较机制取决于所使用的排序规则。
For more information, see the article: .
有关更多信息,请参见文章: 。
A Non-clustered index is different from a Clustered index in that, the underlying table rows will not be stored and sorted based on the Non-clustered key, and the leaf level nodes of the Non-clustered index are made of index pages instead of data pages. The index pages of the Non-clustered index contain Non-clustered index key values with pointers to the storage location of these rows in the underlying heap table or the Clustered index.
非聚集索引与聚集索引的不同之处在于,将不会基于非聚集键存储基础表行并对其进行排序,并且非聚集索引的叶级节点由索引页而不是索引页组成数据页。 非聚集索引的索引页包含非聚集索引键值,这些键值具有指向基础堆表或聚集索引中这些行的存储位置的指针。
For more information, see the article: .
有关更多信息,请参见文章: 。
If a Non-Clustered index is built over a Heap table or view (read more about , that have no Clustered indexes) the leaf level nodes of that index hold the index key values and Row ID (RID) pointers to the location of the rows in the heap table. The RID consists of the file identifier, the data page number, and the number of rows on that data page.
如果在堆表或视图(非更多信息,没有聚簇索引)上构建了非聚簇索引,则该索引的叶级节点将保留索引键值和指向行在堆表中的位置。 RID由文件标识符,数据页号和该数据页上的行数组成。
On the other hand, if a Non-clustered index is created over a Clustered table, the leaf level nodes of that index contain Non-clustered index key values and clustering keys for the base table, that are the locations of the rows in the Clustered index data pages.
另一方面,如果在聚簇表上创建了非聚簇索引,则该索引的叶级节点包含基表的非聚簇索引键值和聚簇键,它们是聚簇中行的位置索引数据页。
A RID Lookup operation is performed to retrieve the rest of columns that are not available in the index from the heap table based on the ID of each row.
执行RID查找操作以根据每一行的ID从堆表中检索索引中不可用的其余列。
A Key Lookup operation is performed to retrieve the rest of columns that are not available in the index from the Clustered index, based on the Clustered key of each row,
根据每行的聚集键,执行键查找操作以从聚集索引中检索索引中不可用的其余列,
For more information, see the article: .
有关更多信息,请参见文章: 。
Rather than creating a Non-clustered index with a wide key, large columns that are used to cover the query can be included to the Non-clustered index as non-key columns, up to 1023 non-key columns, using the INCLUDE clause of the CREATE INDEX T-SQL statement, that was introduced in SQL Server 2005, with a minimum of one key column.
除了使用宽键创建非聚集索引外,使用以下内容的INCLUDE子句,用于覆盖查询的大列可以作为非关键列包含在非聚集索引中,最多1023个非关键列。 SQL Server 2005中引入的CREATE INDEX T-SQL语句,至少包含一个键列。
The INCLUDE feature extends the functionality of Non-clustered indexes, by allowing us to cover more queries by adding the columns as non-key columns to be stored and sorted only in the leaf level of the index, without considering that columns values in the root and intermediate levels of the Non-clustered index. In this case, the SQL Server Query Optimizer will locate all required columns from that index, without the need for any extra lookups. Using the included columns can help to avoid exceeding the Non-clustered size limit of 900 bytes and 16 columns in the index key, as the SQL Server Database Engine will not consider the columns in the Non-clustered index non-key when calculating the size and number of columns of the index key. In addition, SQL Server allows us to include the columns with data types that are not allowed in the index key, such as VARCHAR(MAX), NVARCHAR(MAX), text, ntext and image, as Non-clustered index non-key columns.
INCLUDE功能扩展了非聚集索引的功能,它允许我们通过将列添加为仅在索引叶级存储和排序的非关键列来添加更多的查询,而无需考虑根目录中的列值和非聚集索引的中间级别。 在这种情况下,SQL Server查询优化器将找到该索引中所有必需的列,而无需任何额外的查找。 使用包含的列有助于避免超出900字节的非聚集大小限制和索引键中的16列 ,因为SQL Server数据库引擎在计算大小时将不考虑非聚集索引非键中的列和索引键的列数。 此外,SQL Server允许我们将具有索引键中不允许的数据类型的列(如VARCHAR(MAX),NVARCHAR(MAX),text,ntext和image)包括为非聚集索引非键列。
For more information, review .
有关更多信息,请查看 。
Unique Indexes, by ensuring that there are no duplicate values in the index key, and the table rows, on which that index is created.
唯一索引,以确保有索引键的重复值,且表中的行,在其上创建的索引。
For more information, see the article: .
有关更多信息,请参见文章: 。
It uses a filter predicate to improve the performance of queries that retrieve a well-defined subset of rows from the table, by indexing the only portion of the table rows. The smaller size of the Filtered index, that consumes a small amount of the disk space compared with the full-table index size, and the more accurate filtered statistics, that cover the filtered index rows with only minimal maintenance cost, help in improving the performance of the queries by generating a more optimal execution plan.
它使用过滤谓词通过索引表行的仅一部分来提高查询性能,这些查询从表中检索行的明确定义的子集。 与全表索引大小相比,筛选索引的大小较小,占用的磁盘空间较少,而筛选统计信息的准确性更高,可以以最小的维护成本覆盖筛选的索引行,从而有助于提高性能通过生成更优化的执行计划来进行查询。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
Index fragmentation can be resolved by rebuilding and reorganizing SQL Server indexes regularly. The Index Rebuild operation removes fragmentation by dropping the index and creating it again, defragmenting all index levels, compacting the index pages using the Fill Factor values specified in rebuild command, or using the existing value if not specified and updating the index statistics using FULLSCAN of all the data.
索引碎片可以通过定期重建和重新组织SQL Server索引来解决。 索引重建操作通过删除索引并再次创建索引,对所有索引级别进行碎片整理,使用rebuild命令中指定的“填充因子”值压缩索引页面或使用现有值(如果未指定)并使用FULLSCAN更新索引统计信息来消除碎片。所有数据。
The Index Reorganize operation physically reorders leaf level pages of the index to match the logical order of the leaf nodes. The index reorganizes operation will be always performed online. Microsoft recommends fixing index fragmentation issues by rebuilding the index if the fragmentation percentage of the index exceeds 30%, where it recommends fixing the index fragmentation issue by reorganizing the index if the index fragmentation percentage exceeds 5% and less than 30%.
索引重新组织操作对索引的叶级页面进行物理重新排序,以匹配叶节点的逻辑顺序。 索引重组操作将始终在线执行。 Microsoft建议如果索引的碎片百分比超过30% ,则通过重建索引来解决索引碎片问题;如果索引碎片的百分比超过5%且小于 30% ,Microsoft建议通过重组索引来解决索引碎片问题。
For more information, see the article: .
有关更多信息,请参见文章: 。
For more information, see the article: .
有关更多信息,请参见文章: 。
A well-designed index will enhance the performance of your system and speed up the data retrieval process. On the other hand, a badly-designed index will cause performance degradation on your system and will cost you extra disk space and delay in the data insertion and modification operations. It is better always to test the performance of the system before and after adding the index to the development environment, before adding it to the production environment.
精心设计的索引将提高系统性能,并加快数据检索过程。 另一方面,索引设计不当会导致系统性能下降,并浪费额外的磁盘空间,并会延迟数据插入和修改操作。 最好始终在将索引添加到开发环境之前和之后测试该系统的性能,然后再将其添加到生产环境。
翻译自:
sql索引面试
转载地址:http://cbiwd.baihongyu.com/