For a Free Analysis, Call 1 888 534 6919
Please join me for my two seminar sessions on Saturday, August 13th in Ft Lauderdale as I cover the following two topics:
You can download the seminar materials at the following link:
FundingYourWebProject.pdf (4.66 mb)
Posted in: eCommerce | Facebook | Google | Public Appearances | Search Engine Optimization | Software Development | SQL Server | Web Marketing
Tags: FUZION Agency, Google, Google Maps, internet marketing, Jacksonville SEO, Jacksonville SEO Expert, Jared Nielsen, Marketing, search engine, sem, Seminars, seo, SEO Expert Jacksonville, Speaking Engagements, SQL, SQL Saturday, SQL Server, The FUZION Agency, web marketing,
We love warping Blogengine.net into applications that most people would never dream of. This applies to even some of our larger clients. In the process of pushing Blogengine.net to its limits, we began to see some if the limits of BlogEngine. We also gained a larger appreciation of the open source community since there are some really talented software developers out there that comb through these open source blogging platforms and learn it inside and out. My hat has to go out to Peter Kuhn aka Mister GoodCat of Pitorque Software (www.Pitorque.de). He helped us (through his blog post about memory leaks in BlogEngine.net) resolve a critical problem with one our sites that used BlogEngine.net. Hats off to Peter!
I first saw the problem when I was cruising my task manager (a wonderful hobby of mine) and discovered a whopping memory allocation exceeding 1GB. Now we have some large sites, but this clearly was inching up every time a page was viewed and needed to be dealt with.
As helpful as Task Manager is, knowing that a worker process is out of control is nice but WHICH ONE? Since we use host headers to map many websites to the same IP address, and we had multiple IIS configurations working on the same server, I saw the runaway application but didn't know how to identify which site wwas causing the issue.
First I needed the specific PID number for the worker process (which is hidden by default). By using the "View | Select Columns" menu in Task Manager I was able to check the box that made the PID of the worker application appear:
Now we know that our runaway application is number 3584. Great. As always, I learn by precept and precept so I continued to dig, finding an article online that allowed me to execute a special program that lists the IIS application name of any give worker process (by PID or "Process Identifier").
Now we need to discover which application is causing the problem. Here we identify a highly trafficked website that is causing the issue.
This is a website that has a large amount of traffic, benefitting from a large radio advertising budget and pay per click spend. It appears also that the memory climbs per page view so it will just climb and climb and climb.
Here is where our hero Peter Kuhn comes in. He was able to identify that the open source BlogEngine.net version 1.6.1 platform was not using static constructors properly within the code.
The first issue was the BlogRoll. He identified that "the BlogRollItem.Saved event is hooked up every time the control is created (and never released). To change that, make the constructor static, as well as the BlogRollItem_Saved event handler." (Peter Kuhn, www.Pitorque.de). His fix was to simply make the constructor static and recompile the code:
static Blogroll(){ BlogRollItem.Saved += new EventHandler<SavedEventArgs>(BlogRollItem_Saved);}private static void BlogRollItem_Saved(object sender, SavedEventArgs e){ ..
There is a similar problem in many places within the CodeFormatter class. The following elements needed to be set to Static Constructors:
Widget Zone Memory Leak Issues
Then his open source compatriots delved into a lengthy fix-find-fix-find iteration where they began to dig into the more esoteric implementations of WidgetZones (the box containing replaceable and editable widgets ... normally on the left sidebar). The eventual solution was to "Make the constructor static and change the event handler for the saved event, add a field for the dictionary and change the previous field XML_DOCUMENT to be a property that fetches the document from the dictionary." (Peter Kuhn, www.Pitorque.de)
static WidgetZone(){ WidgetEditBase.Saved += delegate { ReloadAllXml(); };}private static Dictionary<string, XmlDocument> _xmlDocumentByZone = new Dictionary<string, XmlDocument>();private const string DefaultZoneName = "be_WIDGET_ZONE";// For backwards compatibility or if a ZoneName is omitted, provide a default ZoneName.private string _ZoneName = DefaultZoneName; private XmlDocument XML_DOCUMENT{ get { // look up the document by zone name if (_xmlDocumentByZone.ContainsKey(ZoneName)) { return _xmlDocumentByZone[ZoneName]; } return null; }}
"The "RetrieveXml" method now is static and takes a zone name as parameter. The "ReloadAllXml" method is new." (Peter Kuhn, www.Pitorque.de)
private static void ReloadAllXml(){ // simply reload all xml documents when something has changed Dictionary<string, XmlDocument> newDocs = new Dictionary<string, XmlDocument>(); foreach (string zoneName in _xmlDocumentByZone.Keys) { XmlDocument doc = RetrieveXml(zoneName); newDocs.Add(zoneName, doc); } _xmlDocumentByZone = newDocs;}private static XmlDocument RetrieveXml(string zoneName){ WidgetSettings ws = new WidgetSettings(zoneName); ws.SettingsBehavior = new XMLDocumentBehavior(); XmlDocument doc = (XmlDocument)ws.GetSettings(); return doc;}
"During initialization, the document for a zone is added to the caching dictionary if it hasn't been loaded yet": (Peter Kuhn, www.Pitorque.de)
protected override void OnInit(EventArgs e){ if (XML_DOCUMENT == null) { // if there's no document for this zone name yet, load it XmlDocument doc = RetrieveXml(ZoneName); if (_xmlDocumentByZone.ContainsKey(ZoneName)) { _xmlDocumentByZone[ZoneName] = doc; } else { _xmlDocumentByZone.Add(ZoneName, doc); } } base.OnInit(e);}
Great News!
What was the outcome? A stunning decrease in load on our server from this single application of over 95%! I guess I can host a few more websites on this server. Thanks Mr. Kuhn!
For more information on this fix, get it from the source! Visit Peter Kuhn's personal blog at http://www.pitorque.de/MisterGoodcat/post/Fixing-the-memory-leaks-in-BlogEngineNET-1-6-1.aspx and keep tabs on any updates to this fix including a download of the source code file.
Jared Nielsen[email protected]The FUZION Agencywww.FUZION.org
Posted in: BlogEngine.net | Blogs | Software Development | SQL Server
Tags: architecture, best practices, FUZION, FUZION Agency, Jacksonville SEO Expert, Jared Nielsen, SEO Expert Jacksonville, SQL, SQL Performance Tuning, IIS, IIS 7.0, BlogEngine.net, Memory Leak, Task Manager, AppCMD, PID, Worker Process, Blogengine, Blogroll, Code Formatter, WidgetZone, Static Constructors, Peter Kuhn
Jared Nielsen is presenting a seminar titled "SQL and SEO" the FUZION of data and marketing
"Bringing marketing together with technology is a fine art that involves high tech tools, real life solutions, and most important human buy-in from the marketing and IT departments. This lecture will delve into the use of the latest SQL Server 2005 and SQL Server 2008 technologies including recursive common table expressions (CTE), hierarchical data models, and CLR stored procedures for the IT side of the house while also delving into the use of analytics, webmaster monitoring tools, web marketing tactics and search engine optimization techniques for the more marketing minded. Neither SQL nor SEO are four letter words and they go together perfectly! So too can your IT and marketing departments as you tackle the new age of IT Marketing."
You can sign up to attend this seminar at the SQL Saturday #61 website for the DC event at the following link:
http://www.sqlsaturday.com/viewsession.aspx?sat=61&sessionid=2784
Posted in: eCommerce | Public Appearances | Search Engine Optimization | SQL Server | Web Marketing
Tags: SEO, FUZION Seminar Facility, SQL, search engine, SQL Server, web marketing, SQL Saturday, Jacksonville SEO Expert, Jared Nielsen, FUZION, The FUZION Agency, Jacksonville, SEO Expert Jacksonville, Jacksonville SEO
This event is hosted by the great folks at SQL Saturday including Brian Knight of Pragmatic Works and many of the top industry leaders. Jared Nielsen will be giving a presentation on SQL and SEO - the financial benefits of proper database design blended with effective website search engine marketing techniques (SEO). Jared Nielsen is an Expert SEO consultant in Jacksonville and has used these skills in some of the largest websites in the world (Yahoo! Sports, ATP Tour, AOL). Also included are personal evaluations of websites that are brought by attendees of the event including reviewing such topics as the Atomic Data Model™ and Exclusionary Dominance™ techniques.
Make sure you attend or send your marketing VP, your corporate CEO or your technical SQL Server DBA to be there and enjoy the event. The SEO and SQL speech is at 100 Weldon Boulevard, Sanford, FL 32773 at 11:30am on Saturday, October 16th, 2010. You can find out more information on my session at the SQL Saturday Website
Posted in: Public Appearances | SQL Server
Tags: advertising, Direct Response, e-commerce, FUZION Agency, Google, internet marketing, jared nielsen, Marketing, natural search, sem, Seminars, SEO, Speaking Engagements, SQL, SQL Server, web marketing
As I continue to upgrade the NOP Commerce engine version 1.7 to SQL 2008 functionality, another query was taking 1:28 to load. I decided to implement Common Table Expressions to help improve the peformance of this paging query so it didn't rely on temp tables. In this case it loads every product image and then pages against it. Since product pictures are BLOB binary objects stored within the database itself, I didn't relish the thought of moving blob objects in an out of temp tables en masse.
The original query was structured to accomplish the following steps (abbreviated here) and ran in 1:28
CREATE TABLE #PageIndex -- Creates a temp tableINSERT INTO #PageIndex -- Inserts rows into the temp tableSELECT ... FROM [Nop_Picture] [p] --Selects pictures from the original table joined against the paged and sorted temp table INNER JOIN #PageIndex [pi] ON [p].PictureID = [pi].PictureIDDROP TABLE #PageIndex -- Disposes of the now unneeded temp table
This is the CTE that manages the paging for this high latency query and now runs in less than 3 seconds when in a worst case scenario (loading all rows)
WITH ProductPicture AS -- declares the CTE ( SELECT -- insert other fields here -- ROW_NUMBER() OVER (ORDER BY [p].PictureId, [p].PictureBinary, [p].Extension, [p].IsNew) AS RowNumber -- gives us a row number to page against FROM [Nop_Picture] [p]) SELECT -- other rows that you need -- FROM ProductPicture WHERE RowNumber > @PageLowerBound -- now we can specify what range we want to extract AND RowNumber < @PageUpperBound ORDER BY --rows that we want to sort by
Please feel free to download the functional source code that alters the existing NOP Commerce stored procedure to tune up your product photo queries. Make sure you pair this with the ProductLoadAllPaged fix I posted as well which you can find here.
RepairedProductPhotoQuery.zip (675.00 bytes)
Posted in: eCommerce | NOP Commerce | SQL Server | 1.7
Tags: Jared Nielsen, FUZION Agency, NOP Commerce, SQL Server, SQL Performance Tuning, CTE, Common Table Expressions, UDF, User Defined Functions, OVER, SQL Paging, Row_Number, 1.7
NOP Commerce is a wonderful platform but as with any open source system, it suffers from use cases that the smaller stores haven't tested for. In my particular case I ran into situations where large catalogs suffered from massive lag times when a particular stored procedure was executed. My general tendency is to rely on edge caching such as Akamai, Limelight or Mirror Image to cover any small latencies, but this particular one led to server hangs and crashes on the database server.
A quick look at the Query Execution Plan led me to the cause. A query that returned 19 rows was being forced to deal with rowsets exceeding 10,000,000 due to how the joins were structured. Here is a short treatise on how I amended the stored procedure to fix the issue. As with any open source application, there are certain considerations that need to be made... in this case, the open source queries were not designed to maximize use of SQL 2008 offerings. I have no such limitations but a lot more work should be done with the problem stored procedure (eliminating the temp tables in lieu of CTEs, etc)
The first tool that we use to diagnose this problem was the good old task manager (in an RDP session, right click a blank area of the bottom windows toolbar and select "Task Manager" then look at the performance tab). This clearly indicates a runaway memory leak in the SQL server process. The cliff you see in the diagram is when I stopped the entire MSSQL service so it didn't crash the server. You can see the massive punishment that the single stored procedure is inflicting on this server (isolated test server without any production load and only SQL Server running on the box). You don't see it here but once the memory maxes out, the CPU races to 100% and stays pegged there for the extended time it takes the stored proc to complete.
By sheer luck, the error generated by the resulting page indicated the specific stored procedure that caused the problem, but you can also use the SQL Performance Tracing tools. The problem query is NOP_ProductLoadAllPaged. This particular stored procedure is the root query that tends to load on all pages in this open source platform but particularly on pages where the page needs a list of "active products" including the product administration pages and any category or product detail pages. While some caching is implemented, this query runs over and over again and the NOP Commerce platform doesn't use ASP / SQL Dependency Cache which would also eliminate the need to run this proc over and over again.
Now to identify what was causing the problem with the query. I don't claim to be a SQL Expert, in fact I would refer you to many other experts besides myself since I focus on marketing, search engine optimization and many other specialties, but I knew enough to know that memory leaks are generally going to be caused by the SQL server having to chew through way too much data than is necessary to make any particular decision. In this case we could look to the query execution plan and see which steps were causing the most "resource hogging". In this case we were able to see the below fragment that gave a hint at the problem:
This identifies that 91% of the resource utilization of this query is improperly focused on one part of the execution plan. I won't pretend to be able to explain every step in an execution plan but once you mouse over this step you see the crux of the problem. A query that is supposed to generate a list of product ID numbers of products that are "Active" in the database is causing the SQL server to examine 8,900,170 rows to generate 20 resulting rows. I privately call this Join Hell, but you can pick your particular expletive. Here is the detail of this step:
Clearly a big problem. In my particular case I was abusing this open source platform that seemed to be focused on handling many single SKU Variant products, while my particular application (Tires that had over 24 product variants or Tire Sizes per product) were overloading the scalability of the NOP Commerce system. Just one single product was creating over 650 rows that needed to be processed... and when your database gets big enough that turns into an unmanageable problem.
These may seem non-critical, but in my case I wanted all of the junk out of the query so all system resources were being used to make the query as fast as possible.
INSERT INTO #DisplayOrderTmp ([ProductID], [Name], [Price], [DisplayOrder1], [DisplayOrder2])SELECT DISTINCT p.ProductID, p.Name, (SELECT Min(Price) from nop_ProductVariant where ProductID=p.ProductID), pcm.DisplayOrder, pmm.DisplayOrder FROM Nop_Product p with (NOLOCK) ............................ORDER BY (lots of criteria to order the resultset by)
I won't go into too much detail on why temp tables are being used, but a veteran will look at this and say "Oh... implementing paging". A common use for temp tables is to pull out a resultset, do some sort of transformation on them (sorting, parsing, filtering), and then another query will re-select from the temp table to display the final records. In true form, this was a paging implementation that isn't taking advantage of CTEs or modern SQL 2008 functions (for backward compatibility and the ability to use the system on multiple hosted platforms where the customer may not have direct access or control over the SQL server). I have no such limitations but I am going to have to reprogram the paging mechanism at a later date.
That doesn't mean that we can't clean this one up a bit. The first issue is we simply don't need to be returning all of these fields if we are just going to requery it again (that can be done in the final join) so this: [ProductID], [Name], [Price], [DisplayOrder1], [DisplayOrder2] can become simply [ProductID].
We also don't really care about any ORDER BY clauses in the original temp table inserts because the order can be all sorted out in the final query that derives from the temp table. This takes a small chunk of resources out of the temp table query. We just commented all of that out.
This isn't a universal rule, but there are circumstances where you need data from tables that are lower in the relational table hierarchy, but that is inherently going to expand the domain or scope of the query.
Simply put, if you want a list of Tires (each of which has 24 possible sizes), and you want to use data from the sub-sizes if they exist, you can use LEFT OUTER JOINs and exponentially expand the load on your query or you can use subqueries (or pre-compute summary data). I chose to use subqueries. Not optimal but it really tuned up the query. See how the original query
SELECT DISTINCT p.ProductID, p.Name, pv.Price, pcm.DisplayOrder, pmm.DisplayOrder FROM Nop_Product p with (NOLOCK) LEFT OUTER JOIN Nop_ProductVariant pv with (NOLOCK) ON p.ProductID = pv.ProductID LEFT OUTER JOIN Nop_ProductVariantLocalized pvl with (NOLOCK) ON pv.ProductVariantID = pvl.ProductVariantID AND pvl.LanguageID = @LanguageID LEFT OUTER JOIN Nop_ProductLocalized pl with (NOLOCK) ON p.ProductID = pl.ProductID AND pl.LanguageID = @LanguageID LEFT OUTER JOIN Nop_Product_Category_Mapping pcm with (NOLOCK) ON p.ProductID=pcm.ProductID LEFT OUTER JOIN Nop_Product_Manufacturer_Mapping pmm with (NOLOCK) ON p.ProductID=pmm.ProductID LEFT OUTER JOIN Nop_ProductTag_Product_Mapping ptpm with (NOLOCK) ON p.ProductID=ptpm.ProductID
All of these LEFT OUTER JOINs are basically saying "You may not need this data in all circumstances, but just in case you need them, here they are". That is a dangerous statement to make on a query that forms the central processing core of a website commerce platform.
The real challenging ones are the JOINs that use Nop_ProductVariant. Visualize this... Take a Tire and see if it matches a search word. This takes 1 row and checks to see if it's a match. Now visualize the alternative. Take a Tire and then pull each of its possible 24 sizes... then see if any of those match a keyword. This takes 1 row and causes 24 other rows to get involved. This can quickly grow out of control.
NOP Commerce adds in another layer. It pulls product variants, and any possible additional product variants that are cloned foreign language variations of that same variant. These "localizations" are referenced in the above JOIN query with the Nop_ProductVariantLocalized table. This adds even more complexity to the core query. Since I deal in tires and not "Tyres" (overseas), I'm comfortable with eliminating the Localization feature from my root query. If I decide to go multilingual, I can reconsider, but I suspect that in that case I would clone my entire database and have it translated under a completely separate website. The core developers of this open source platform are in Russia so they have a completely different basis to make this particular decision.
Consider a reworking of the above query where we stick with a simple rule... Every row should only interact with a single row. This simple math will keep our query simple, even if we use query techniques that may appear inefficient. In our case, take the reworded JOIN section:
SELECT DISTINCT p.ProductID, p.Name, (SELECT Min(Price) from nop_ProductVariant where ProductID=p.ProductID), pcm.DisplayOrder, pmm.DisplayOrder FROM Nop_Product p with (NOLOCK) LEFT OUTER JOIN Nop_Product_Category_Mapping pcm with (NOLOCK) ON p.ProductID=pcm.ProductID LEFT OUTER JOIN Nop_Product_Manufacturer_Mapping pmm with (NOLOCK) ON p.ProductID=pmm.ProductID LEFT OUTER JOIN Nop_ProductTag_Product_Mapping ptpm with (NOLOCK) ON p.ProductID=ptpm.ProductID
Note how we eliminated the LEFT OUTER JOIN of Nop_ProductVariant from the query completely and got rid of the Nop_ProductVariantLocalized table as well. This eliminates the exponential scope creep of the query and now lets us focus on the "spirit" of the query rather than the "law" of the query.
Why were we querying pv.Price? This is used further down in the stored procedure to extract only those products whose price was between a specified PriceMin and PriceMax value for certain parts of the application. Because I believe the core "soul" of the platform was not designed to focus on multi variant products, the original developers were trying to be real specific by iterating through all prices of all variants of a single product. In reality, we just need to pull the minimum price of all product Variants and if that falls within the specified range, I'm ok with that. I can see disagreements on this, but in a situation where I have to trade overall site performance for the rare circumstance that a PriceMin and PriceMax excludes a particular product line, I can live with that. Just make sure your pricing is non-zero and you should be fine. If you want to get really specific without iterating, you could change that to an average aggregate function that finds the "midpoint" price of all possible SKU variations for a given product.
Note the huge performance gain of using a subquery rather than a massive nested tree of LEFT OUTER JOINs. Now a single row is extracting a single result value for price and off we go. Fast and easy.
We have the same problem, but not as elegant a solution, when we come across the check to see if a product is flagged as available during a particular time period or not... and also check to see if that product has one of any possible product variants marked as available during a date range (ones that the admin has set expiration dates for in the store administration tool).
In this case I decided to change tactics and use a User Defined Scalar Valued Function (UDF). Because the logic got a bit complex and I knew that the SQL engine would optimize the UDF for me, I chose to create a function that would decide if a product had any product variants of which any one of them was marked as active or published. Here is the source for this:
SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGO-- =============================================-- Author: www.FUZION.org - Jared Nielsen-- Create date: September 18, 2010-- Description: Determines whether at least one Product Variant is available for a certain time period-- =============================================CREATE FUNCTION dbo.FUZION_isAvailableToday( @ProductID int)RETURNS bitASBEGIN DECLARE @isAvailable bit SELECT @isAvailable = CASE WHEN getutcdate() between isnull(pv.AvailableStartDateTime, '1/1/1900') and isnull(pv.AvailableEndDateTime, '1/1/2999') THEN 1 END FROM Nop_ProductVariant PV WHERE PV.ProductID=@ProductID RETURN @isAvailableENDGO
Now this UDF had to be created because I couldn't actually assume that a MIN() or MAX() function would give me a valid answer. If I pulled the MIN(AvailableStartDateTime) and the MAX(AvailableEndTime) from all product variants that were related to a product, I could easily get a wrong answer. Take a scenario where one product variant had a start date of 1/1/2010 and an end date of 1/2/2010 while another product variant had a start date of 1/15/2010 and an end date of 1/20/2010. My Min() would return 1/1/2010 and my Max() would return 1/20/2010. However if it was 1/10/2010 it shouldn't match either availability window, but with my Min() Max() assumption it would be included... not acceptable.
Now I can just ask the UDF if this product is available. It will check to see (once per row) and then I will get a single, scalar answer back which I can now use in my WHERE clause of the original query. I also don't like the hard-coded time window here at the year 2999... I probably will substitute it with GETDATE()+1 in the ISNULL clause but that's just me being particular. I don't plan on being around in the year 3000 so maybe I don't care all that much.... but it's bad design.
In similar fashion, we need to test against product variants to see if they match any specified keywords. This can be handled with yet another scalar UDF:
CREATE FUNCTION [dbo].[FUZION_isMatchingKeywords]( @ProductID int , @Keywords nvarchar(max) , @SearchDescriptions int)RETURNS bitASBEGIN DECLARE @isMatching bit SELECT @isMatching = CASE WHEN patindex(@Keywords, isnull(p.name, '')) > 0 THEN 1 WHEN patindex(@Keywords, isnull(pv.name, '')) > 0 THEN 1 WHEN patindex(@Keywords, isnull(pv.sku , '')) > 0 THEN 1 WHEN (@SearchDescriptions = 1 and patindex(@Keywords, isnull(p.ShortDescription, '')) > 0) THEN 1 WHEN (@SearchDescriptions = 1 and patindex(@Keywords, isnull(p.FullDescription, '')) > 0) THEN 1 WHEN (@SearchDescriptions = 1 and patindex(@Keywords, isnull(pv.Description, '')) > 0) THEN 1 ELSE 0 END FROM Nop_Product P INNER JOIN Nop_ProductVariant PV ON P.ProductId=PV.ProductID WHERE P.ProductId=@ProductID RETURN @isMatchingEND
Now we are ready to see the results. See the diagram to the right as we run the very same query on the same test installation. A tiny little hiccup as the stored procedure is run, a negligible memory utilization footprint and we are golden. Problem solved!
Jared Nielsen is a Search Engine Optimization (SEO) Expert, Web Advertising and Marketing Consultant, SQL Engineer, and Software Developer in Jacksonville, Florida. He is no longer taking clients, but instead invests in online ventures that show potential, particularly in the brick and mortar commerce business, and financial sectors. If you feel that you have an online project that would be a good match for your brick and mortar business, contact him directly at this link. Contact Jared Nielsen at the FUZION Agency
----------------------------------------------------------------------------Full Source of the modified stored procedure
RepairedQuery.zip (2.78 kb)
Posted in: NOP Commerce | eCommerce | SQL Server
Tags: SQL, SQL Server, NOP Commerce, Query Execution Plan, SQL Performance Tuning, Task Manager, SEO