Week 21 – Indexes and B-Trees

Weekly Learning

This week we took a fairly deep dive into indexing, the strategies behind it, and the advantages and disadvantages of each. This is by far the densest topic that we’ve covered so far. Again, I have to refer back to my personal experience dealing with databases. I’ve often been called out by DBA for either adding too many, not enough or simply for querying indexes in a not proper way. It certainly makes sense now why indexes are effective, but also why keeping them can be expensive. You index in excess and you’re essentially killing your disk space with data you simply cannot use. Not enough, and you  will be hitting the database for a scan of data you need.  But then, while reading becomes trivial with indexes, writing can be expensive, having to balance b-trees, or update sparse tables.

So what is the right solution? The reality is that you still need to solve the main problem of locating data fast in the database. So the trade-off of each approach mainly depend on the type of data, and the quantity to write. Sparse index over large amounts of data that can be spread over several sectors is a great choice, while for systems with little writes just having pointers on an ordered linked list can be an easy solution that is very effective.

Of all topics so far, this has been one that took me a while to click. The video was perhaps the best addition. Having a clear explanation of the context of b-trees and b+-trees was essential for success, and while the reading was good to get an idea of what is needed, ultimately having a voice tell you how it all works, was the best approach for me.

Posted in CST-363, CSUMB | Comments Off on Week 21 – Indexes and B-Trees

Week 20 – Project 1 Part 2

Weekly Learning

This week we worked on the exam. One of the interesting things about the exam is that despite it being open book, it is still a challenge. I’ve been very careful to craft each query and double check against each data point to make sure I am getting the answer I believe it best fits the question, but some of the queries are very complex and make me second guess myself.

We have also been working on the OLAP part of the project. I have to say that I had never enjoyed database work until now. Before, most of my queries were just basic selects, and getting data was more of a chore. Having practice more both with the designed, and how data is composed, I have started to enjoy writing queries that extract data. Where JOINS were a total mystery for me before, they are now almost second nature. I have been doing a lot of playing around with JOINS, GROUP BY, subqueries of virtual tables, and other fun features of myself.

The OLAP assignment has been a great opportunity to also start shifting my thinking into what data should be collected and processed. I realized because this is all fake data, our questions are probably naive. However, I have a good opportunity to compare it against the questions that my coworkers ask, or the assumptions they make when they request new features. Looking at how I would organize the data, and trying to understand what of my data is valuable is a very important step.

I guess I can only compare it to how people say that half of being good at a job such as programing, is knowing how to google. In this case, I’m getting much better at knowing what questions need to be asked, and how my data can either answer it, or what would need to change to make sure it can answer those questions.

Overall, I can honestly say I came to the class expecting it to be one of the classes that you dread to take, and push through, to being one of the classes I’ve most enjoyed.

Posted in CST-363, CSUMB | Comments Off on Week 20 – Project 1 Part 2

Week 19 – Project 1 – Database Design

Weekly Learning

This has been a very demanding week. Project one has been a challenge with both the creation of the database, as well as creating an application to properly visualize the data. I think we started working on it a little later than we should and that meant we had to pull some longer hours getting everything ready. However, I am pretty satisfied with the result with how little time we actually had to accomplish it.

Victor and I decided to go with a game database. This is fairly unconventional since it does not necessarily have as much transactional data as other application would. But, I think it presented an interesting challenge for both of us. Game have a lot of very interesting Business Intelligence data working behind them, as players move through different levels, or value one item over another when making a purchase. Understanding what your users do is important for any business, and games are no different. This data is a bit more abstract than a reservation system would be, since it must be derived from activity rather than numbers, but it’s equally important to break down those behaviors and make sense of them so the operators of the game can make better decisions that will capitalize and monetize the users.

I also spent a bit of time doing research on OLAP and data cubes. I have worked in the past with BI. But this is the first time I have actually sat down to understand how this data is constructed, processed and broken down to be useful on the backend. In the past I have just been a consumer of the data, and that was good enough to get me through the day, but as I have been digging through the topic, I have found it to be very interesting. The analogy of the hot kitchen is incredibly apt with some of the requests I have gotten in the past. Building the datasets was an ugly process but generated reports that could be very useful to some of my customers.

Posted in CST-363, CSUMB | Comments Off on Week 19 – Project 1 – Database Design

Week 18 – Database Design

Weekly Learning

Database Design is a very interesting topic. My personal experience designing databases has been very limited professionally. Normally, I have gone by what the data looks like and designing around it. Comparing this to how the chapter describes database design, I think the overall outcomes have not been very far from what is expected, but data but this has been certainly based on good tribal knowledge and mentorship. Having read and worked on the cases for normalization, I think I have a much better understanding of why those designs worked. More importantly, I understand why normalization is so important, and why we often sacrifice it to make data more accessible.

We’re also starting to deep our toes into Data Science and how sql supports aggregation. Since I am very much interested in the monetization and targeting strategies, this is a topic that is very dear to me. I often have to look for ways to make sense of purchase trends and patterns, and look for ways to visualize that data so users can look into it. In the past, I’ve used code to aggregate and process the data, or used specialized tools like Druid. When data gets to my hands, it’s often already gone through the ETL process, so i’m very much looking forward to the next couple of weeks to better understand the whole process. But using GROUP BY and other aggregators, already helps me!

Posted in CST-363, CSUMB | Comments Off on Week 18 – Database Design

Week 17 – Building Queries on MySQL

Weekly Learnings

This is the first week coming back from the winter break, and we’re jumping right into the action. First, I think the book we’re working with is fantastic! I love the simple, elegant, and pragmatic approach of only using the right side for examples and as reference. It makes both working through the exercises and understanding the logic behind it much easier.

I have been fortunate enough to use MySQL at work, but normally development is done through APIs, and actual access to the database is limited to some select statements to verify data. This week, was particularly satisfying as I finally got to get a solid grasp on both how JOIN statements work. This has been a question that I have previously gotten during interviews, and felt I have come short explaining properly because the concepts were not clear.

I also found this blog on the subject to be very good. It has a better way to graphically represent the INNER, and OUTER join relationships. I have to agree with the author that the ven diagram representation of this functions is just not adequate.

Say NO to Venn Diagrams When Explaining JOINs

Overall, i’m very much looking forward to this class! I think learning more about databases is fundamental for any developer!

Posted in CST-363, CSUMB | Comments Off on Week 17 – Building Queries on MySQL