![]() ![]() That is, it gives the frequency of values smaller than the maximum value of the bucket. The frequency value recorded, is the cumulative frequency. For the other type of histogram, equi-height, each bucket will contain the minimum and maximum value for the range covered by the bucket. That means that each bucket contains the frequency of a single value. In the above case, the histogram type is singleton. The distribution of values can be found in the buckets array of the JSON document. The histogram is stored as a JSON document: For example, the following query will show the content of the histogram for the column l_linenumber in the table lineitem of the dbt3_sf1 database:įROM information_lumn_statistics I do not have a presentation in the MySQL and Friends devroom this year, but it reminded me that I had planned to post a follow-up to my presentation from last year.Īs part of the presentation, I showed how you can inspect the content of a histogram using the information schema table column_statistics. We repeat the sysbench runs with this query, and compare the results for index range scans:įOSDEM is coming up. SUM(l_extendedprice * l_discount) AS revenue To make our query use this resource group, we add a hint to the query: First, I will create a resource group, cpu1, that contains the virtual CPUs of processor 1:ĬREATE RESOURCE GROUP cpu1 TYPE=user VCPU=16-31,48-63 To investigate what effects this have for our case, I will use Resource Groups to make MySQL use only one processor for this query. When there is a lot of thread synchronization, running on multiple processors may increase our problems since the state of the mutex, or whatever is used for synchronization, will have to be synchronized between the caches of the CPUs. When using the AHI, the scaling seem to be even worse, but that is different story. ![]() (I have turned off the adaptive hash index (AHI) when running these tests, so all primary key look-ups will have to go through the root page. Further investigations are needed to understand where, but my current guess is that it is related to accessing the root page of the B-tree when looking up the row corresponding to the index entry. So why are the threads stalled? I think the most likely reason is that there is a synchronization bottleneck related to non-covering secondary index scans. This is an indication that the threads are regularly stalled, and there is a benefit from having other threads that can be scheduled when this happens. For index range scans, on the other hand, we see that we get higher throughput with 64 threads than with 32 threads. In other words, each thread doing table scan is able to execute without any significant interrupts. We can see that for table scan, there is no benefit from hyper-threading the maximum throughput is reached when having one thread per physical core. For index range scan, the increase is only about 7.8x and 9.5x for 32 and 64 threads, respectively. For 32 threads, the throughput with table scans is more than 30 times higher than with 1 thread. However, we see that while the table scan scales almost linearly up to 32 threads, this is not the case for the index range scan. The throughput when using the secondary index is of course higher than for table scan, since we only need to read data for 1 month instead of for all 84 months when using a full table scan. For comparison, I also ran the same query when forcing MySQL to use table scan instead of index range scan. MySQL will use the secondary index on l_shipdate when executing this query. I used a DBT-3 scale factor 1 database, and with a 4 GB InnoDB buffer pool, all data was cached in memory. ![]() The machine used have 2 processors with 16 physical cores each, so with hyper-threading there are in total 64 virtual CPUs. For each query execution, a random month was picked. This query was run by multiple parallel threads using sysbench on MySQL 8.0.17. SELECT SUM(l_extendedprice * l_discount) AS revenueĪND l_shipdate < DATE_ADD('', INTERVAL '1' MONTH)ĪND l_discount BETWEEN 0.05 - 0.01 AND 0.05 + 0.01 While the original Query 6 sums up the revenue for a whole year, my version will only compute it over a single month: I will run a variant of Query 6 of the TPC-H/ DBT-3 benchmark. In this blog post, I will investigate what effects using a multi-processor machine have on the scalability of parallel index range scans. Mark Callaghan reported this several years ago, and this does not seem to have changed. As discussed in the presentation, we observe that parallel range scans on secondary indexes does not scale very well in MySQL. As presented at the Percona Live Conference in Austin in May, at Alibaba we are working on adding support for parallel query execution to POLARDB for MySQL.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |