Tuesday, February 28, 2012

2012 SouthEast LinuxFest

The 2012 SouthEast LinuxFest registration and call for papers are now live on their site.

MySQL Days at SELF!

"This year we are specifically seeking speakers from all over the database spectrum for Open Database Camp. ODC at SELF will consist of a track of database talks selected by SELF on Saturday, then an unconference on Sunday. So if you have a database related talk, please submit!" -- RFP Form 

Monday, February 27, 2012

MySQL Community on Freenode


I recently posted about how we, MySQL Community Managers, are trying to reach out more to the MySQL User groups and the MySQL Community.

The room has been quiet to far but it is a new room. You can find us in #MySQL_Community & #MySQL_user_groups on freenode.

A few common issues have come out that I have heard before. We as a community can work to address these concerns.
  • List of user groups
  • Few volunteers
  • Difficulty in getting good legit speakers
  • Low attendance
  • Location for meetings
Some of these concerns have been addressed by the community in the past. 

If you have thoughts on any of these topics come by the chat room on freenode.


btw...
I already learned about a new MySQLCluster training in Brazil from the room today.  I look forward to hearing more from you all.

Tuesday, February 21, 2012

Oracle at DrupalCon Denver Conference 2012


Oracle is pleased to be a Bronze Sponsor of DrupalCon Denver.
 
DrupalCon will be held the Colorado Convention Center in Denver on  March 19-23, 2012. There are community events scheduled throughout the entire week, some officially by DrupalCon, others by sponsors and other community members.

Come visit Oracle Booth #400 and find out the latest information about MySQL and Linux! 
I will be at the booth so please stop by and say hello.

For event information and to register, click here.


 Oracle Events page with additional information can be found here.  

Monday, February 20, 2012

InnoDB Tablespace Automatically Expanding Issues

I recently received a question asking about InnoDB and the ability to autoextend the ibdata files.

The question was basically this:
If the database has 1 ibdata file that is using autoextend but also has a max value set, will it start a new file once the max value is reached.

The answer is no it will not create a new file.You are able to use autoextend and have more than 1 ibdate file though.

For example I made a simple example via a virtual box.

Started using:
innodb_data_file_path            = ibdata1:20M:autoextend:max:100M
innodb_autoextend_increment        = 50M

 I downloaded the full employees dataset from launchpad.

Started the server and I have the 20M file.
-rw-rw---- 1 mysql mysql  20M Feb 20 10:53 ibdata1

mysql -t < employees.sql
ERROR 1114 (HY000) at line 18 in file: 'load_salaries.dump': The table 'salaries' is full

I see the file grew to 100M and stopped.
-rw-rw---- 1 mysql mysql 100M Feb 20 10:48 ibdata1

I can start over to prove this does work as well.
innodb_data_file_path            = ibdata1:20M:autoextend:max:2G
innodb_autoextend_increment        = 50M



mysql -t < employees.sql
...
+------------------+
| INFO             |
+------------------+
| LOADING salaries |
+------------------+
-rw-rw---- 1 mysql mysql 1020M Feb 20 10:54 ibdata1


You are able to use autoextend when using more than 1 ibdata file as well.

innodb_data_file_path  = ibdata1:20M;ibdata2:20M:autoextend:max:2G
innodb_autoextend_increment             = 50M

These are just simple examples but I hope this answers the question. 

Friday, February 17, 2012

MySQL User Group Day on FreeNode !

As MySQL Community Managers, Dave an myself are always looking for ways to gather feedback on MySQL. 

We want to make ourselves available to you. The current plan is the last Monday of every month Dave and Myself will be on #freenode and join a chat room called MySQL_user_groups. 

We will also try to be available as much as possible in a chat room called MySQL_Community.

You can join us and ask questions or just let us know your opinions. 

We want to hear about everything from your best presentation topics, to the worst. What is the biggest struggle you find with user groups ? 
How could Oracle help? 


Thursday, February 16, 2012

MySQL for Database Administrators Training on Demand

I ran across the "Training on Demand" option for a MySQL for Database Administrators class. I thought I would share in case you missed it.

https://blogs.oracle.com/MySQL/entry/mysql_for_dbas_training_on

http://education.oracle.com/pls/web_prod-plq-dad/db_pages.getCourseDesc?dc=D61762GC20

They also are offering a  new course MySQL Performance Tuning
http://education.oracle.com/pls/web_prod-plq-dad/db_pages.getCourseDesc?dc=D61820GC20

More information can be found on the MySQL Database Administration and Development - Learning Path site.

http://education.oracle.com/pls/web_prod-plq-dad/db_pages.getlppage?page_id=212&path=SDAD

Tuesday, February 14, 2012

Summary Tables with MySQL

I was recently talking with a few people and the concept of summary tables came up as solutions for them. The issue is, they never thought of it. Hopefully this post will help them as well as others.

Summary Tables are not a new concept. They are exactly what they sound like, basically summary of existing data. Aggregated tables, materialized views and summary tables are very dominate in data warehouses. You can also take advantage of these concepts as part of your reporting systems as well.

So summary tables are real tables. So you can take advantage of indexes and etc with them. In the examples I am using, I consider them more of a summary table than aggregated tables . Depending on your application and needs it could grow into more of an aggregated tables and or materialized views situation.

How you separate your data and tables is dependent on your reporting and application needs.

The following is a high level example of how you can use summary tables.



Let us assume that you have a report that is populated with different website traffic aspects.
This could be a report that includes impressions, hits and leads gathered via your site from some other sources. Other in the company, of course, want to spend money where it best makes money. So they need updated information reliably and do not want to wait for it.

Currently you have table for your reporting tools and you calculate ROI per report request via SQL.

For example:
CREATE TABLE `some_report` (
`traffic_source_id` int(5) unsigned NOT NULL DEFAULT '0',
`ad_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliate_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliates_key` varchar(25) DEFAULT '',
`date_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`impressions` int(5) unsigned DEFAULT '0',
`hits` int(5) unsigned DEFAULT '0',
`leads` int(5) unsigned DEFAULT '0',
UNIQUE KEY `traffic` (`traffic_source_id`,`affiliate_id`,`ad_id`,`date_time`,`affiliates_key`),
KEY `date_time` (`date_time`)
) ENGINE=InnoDB ;

This table is populated hourly, so an insert could be similar to the query below. Your system executes queries like this thousands+ of time per day because it is based on different traffic sources as well as affiliate and their affiliate keys. So this table is going to grow dynamically, hours per day * traffic_source_id * affiliate_id * ad_id * affiliates_key.

For example:
INSERT INTO some_report VALUES (100,4343,9839,'SomeID',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),89832,44916,22458);
INSERT INTO some_report VALUES (100,4343,9839,'SomeID2',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),93332,34716,23438);
INSERT INTO some_report VALUES (100,2343,9839,'SomeID',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),543232,44316,458);
INSERT INTO some_report VALUES (100,2343,9839,'SomeID2',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),89832,24916,2458);
INSERT INTO some_report VALUES (100,5343,9839,'SomeID',DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00'),1239832,2344916,2538);

Just for reference lets assume :
100 = Google.com
4343 = An ad words ad for MySQL
9839 = Some Marketing Company
SomeID = affiliates_key of some marketing company

You have reviewed your explains per query and your SQL query is the fastest you can get it. It just takes time to calculate all the ROIs when reports are spanning a lot of different entities. The web pages are taking to long and people are getting upset.

So you decide to change your tables and create a process to roll up your data across different summary tables using an ETL or Events or cron job based scripts. So your system populates the RETURN_VALUED per ad_id & affiliate_id. You do not know what the affiliate is paying out to other affiliates_keys but you are aware of what you spent and what the lead returns to you.

First start with our lowest common denominator. Here you have data per hour.

CREATE TABLE `some_report_hour` (
`traffic_source_id` int(5) unsigned NOT NULL DEFAULT '0',
`ad_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliate_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliates_key` varchar(25) DEFAULT '',
`date_time` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`impressions` int(5) unsigned DEFAULT '0',
`hits` int(5) unsigned DEFAULT '0',
`leads` int(5) unsigned DEFAULT '0',
`cost` decimal(9,2) DEFAULT '00.00',
`roi` decimal(7,2) DEFAULT '00.00',
UNIQUE KEY `traffic` (`traffic_source_id`,`affiliate_id`,`ad_id`,`date_time`,`affiliates_key`),
KEY `date_time` (`date_time`)
) ENGINE=InnoDB ;


Then every hour it populates more data into the new table via your ETL / Event / Stored Procedure / Cron job Script. It depends on how you want to handle and decide your values for

For example:
DELIMITER //
CREATE PROCEDURE some_report_hour_procedure(IN affiliate_id int(6) , RETURN_VALUED int(6) , COST decimal(5,3) )
BEGIN
INSERT INTO some_report_hour
SELECT traffic_source_id, ad_id, affiliate_id, affiliates_key, date_time, impressions, hits, leads, COST, ROUND((SUM(leads * RETURN_VALUED ) / COST),2)
FROM some_report r
WHERE r.date_time = DATE_FORMAT(NOW() ,'%Y-%m-%d %H:00:00')
GROUP BY traffic_source_id, ad_id, affiliate_id, affiliates_key;
END //
DELIMITER ;

CREATE EVENT some_report_hour_event
ON SCHEDULE
EVERY 1 HOUR
COMMENT ' just an example. Depends on you how you want to populate these values. This is for just 1 affiliate_id '
DO
CALL some_report_hour_procedure(9839,5,100000); # (affiliate_id , RETURN_VALUED , COST)

So the data is much like the following:

+-------------------+-------+--------------+----------------+---------------------+-------------+---------+-------+-----------+---------+
| traffic_source_id | ad_id | affiliate_id | affiliates_key | date_time | impressions | hits | leads | cost | roi |
+-------------------+-------+--------------+----------------+---------------------+-------------+---------+-------+-----------+---------+
| 100 | 2343 | 9839 | SomeID | 2012-02-14 16:00:00 | 543232 | 44316 | 458 | 100000.00 | 22.90 |
| 100 | 2343 | 9839 | SomeID2 | 2012-02-14 16:00:00 | 89832 | 24916 | 2458 | 100000.00 | 122.90 |
| 100 | 4343 | 9839 | SomeID | 2012-02-14 16:00:00 | 89832 | 44916 | 22458 | 100000.00 | 1122.91 |
| 100 | 4343 | 9839 | SomeID2 | 2012-02-14 16:00:00 | 93332 | 34716 | 23438 | 100000.00 | 1171.91 |
| 100 | 5343 | 9839 | SomeID | 2012-02-14 16:00:00 | 1239832 | 2344916 | 2538 | 100000.00 | 126.90 |
+-------------------+-------+--------------+----------------+---------------------+-------------+---------+-------+-----------+---------+


Nothing is dynamically different but you have removed the ROI from your query. It will be a simple select now. But now you can use your ETL/Event/Cron Job Script to keep the day table updated as well.

CREATE TABLE `some_report_day` (
`traffic_source_id` int(5) unsigned NOT NULL DEFAULT '0',
`ad_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliate_id` int(10) unsigned NOT NULL DEFAULT '0',
`affiliates_key` varchar(25) DEFAULT '',
`date_time` date NOT NULL DEFAULT '0000-00-00',
`impressions` int(5) unsigned DEFAULT '0',
`hits` int(5) unsigned DEFAULT '0',
`leads` int(5) unsigned DEFAULT '0',
`cost` decimal(9,2) DEFAULT '00.00',
`roi` decimal(7,2) DEFAULT '00.00',
UNIQUE KEY `traffic` (`traffic_source_id`,`affiliate_id`,`ad_id`,`date_time`,`affiliates_key`),
KEY `date_time` (`date_time`)
) ENGINE=InnoDB ;

Notice I left the field date_time labeled as date_time. This is assuming you might have your report presented via column names and you just adjust the FROM table on selects based on what type of report needed. This is of course dependent on your application.

Notice that your inserts are going to be “group by” to match your key.
For example:

CREATE EVENT some_report_day_event
ON SCHEDULE
EVERY 1 HOUR
COMMENT 'updates some_report_day hourly'
DO
REPLACE INTO some_report_day SELECT traffic_source_id, ad_id, affiliate_id, affiliates_key , DATE_FORMAT(NOW() ,'%Y-%m-%d') as date_time, SUM(impressions), SUM(hits), SUM(leads), SUM(cost), ROUND( SUM(leads) * 5 / SUM(cost) ,2) as ROI
FROM some_report_hour r
WHERE r.date_time BETWEEN DATE_FORMAT(NOW() ,'%Y-%m-%d 00:00:00') AND DATE_FORMAT(NOW() ,'%Y-%m-%d 23:59:59')
GROUP BY traffic_source_id, ad_id, affiliate_id, affiliates_key;

Now the reports that gather data per day and not per hour are going to be faster as they are going to be simple selects as well.

So the data is much like the following:

+-------------------+-------+--------------+----------------+------------+-------------+---------+-------+-----------+------+
| traffic_source_id | ad_id | affiliate_id | affiliates_key | date_time | impressions | hits | leads | cost | roi |
+-------------------+-------+--------------+----------------+------------+-------------+---------+-------+-----------+------+
| 100 | 2343 | 9839 | SomeID | 2012-02-14 | 543232 | 44316 | 458 | 100000.00 | 0.02 |
| 100 | 2343 | 9839 | SomeID2 | 2012-02-14 | 89832 | 24916 | 2458 | 100000.00 | 0.12 |
| 100 | 4343 | 9839 | SomeID | 2012-02-14 | 89832 | 44916 | 22458 | 100000.00 | 1.12 |
| 100 | 4343 | 9839 | SomeID2 | 2012-02-14 | 93332 | 34716 | 23438 | 100000.00 | 1.17 |
| 100 | 5343 | 9839 | SomeID | 2012-02-14 | 1239832 | 2344916 | 2538 | 100000.00 | 0.13 |
+-------------------+-------+--------------+----------------+------------+-------------+---------+-------+-----------+------+

You can continue this concept for monthly and yearly data.
You could also build summary tables broken down by affiliate_id and day for example.
It really depends on what your after. Just remember data is never deleted but just copied into summary table for ease of use and speed.

This is a very simple example but hopefully it gets you thinking and started on how you can summarize your data easily.

Tuesday, February 7, 2012

The community helping customers restore faster with mysqldump

A big thanks to Xiaobin Lin for taking the time to submit and the related patch for bug #64248. The patch is based on 5.5.20
This should help users to restore their database faster thanks to fast index creation. More information is available via the bugs page. I have heard that this is just one of several patches he has contributed.
Contributions such as this, help MySQL to continue to deliver an always improving product.

So a big “Thank You” from the MySQL team.

Friday, February 3, 2012

RMOUG Training Days 2012

The Rocky Mountain Oracle User Group ( RMOUG ) Training days are almost here.   This year they are going to have a MySQL Oracle ACE Director, Ronald Bradford talk on MySQL Security Essentials.  I will also be having a MySQL Crash Course . I was going to talk about replication but after attending the IOUG User summit and the seeing the vast interest in MySQL from Oracle users, I decided to take a step back and do this talk 1st. I have had a similar talk at a RMOUG meetup hands on labs in the past, but it has been updated.

Alex Gorbachev of Pythian also has a session, Monitoring MySQL with Oracle Cloud Control 12c.

Look forward to seeing everyone and I think great things are on the horizon with  RMOUG and MySQL.