Cache Aggressively
When designing an app using ASP.NET, make sure you design with an eye on caching. On server versions of the OS, you have a lot of options for tweaking the use of caches on the server and client side. There are several features and tools in ASP that you can make use of to gain performance.
Output Caching—Stores the static result of an ASP request. Specified using the
<@% OutputCache %>
directive:- Duration—Time item exists in the cache
- VaryByParam—Varies cache entries by Get/Post params
- VaryByHeader—Varies cache entries by Http header
- VaryByCustom—Varies cache entries by browser
- Override to vary by whatever you want:
- Fragment Caching—When it is not possible to store an entire page (privacy, personalization, dynamic content), you can use fragment caching to store parts of it for quicker retrieval later.a) VaryByControl—Varies the cached items by values of a control
- Cache API—Provides extremely fine granularity for caching by keeping a hashtable of cached objects in memory (System.web.UI.caching). It also:a) Includes Dependencies (key, file, time)b) Automatically expires unused itemsc) Supports Callbacks
- Fragment Caching—When it is not possible to store an entire page (privacy, personalization, dynamic content), you can use fragment caching to store parts of it for quicker retrieval later.
Caching intelligently can give you excellent performance, and it's important to think about what kind of caching you need. Imagine a complex e-commerce site with several static pages for login, and then a slew of dynamically-generated pages containing images and text. You might want to use Output Caching for those login pages, and then Fragment Caching for the dynamic pages. A toolbar, for example, could be cached as a fragment. For even better performance, you could cache commonly used images and boilerplate text that appear frequently on the site using the Cache API. For detailed information on caching (with sample code), check out the ASP. NET Web site.
Use Session State Only If You Need To
One extremely powerful feature of ASP.NET is its ability to store session state for users, such as a shopping cart on an e-commerce site or a browser history. Since this is on by default, you pay the cost in memory even if you don't use it. If you're not using Session State, turn it off and save yourself the overhead by adding <@% EnabledSessionState = false %> to your asp. This comes with several other options, which are explained at the ASP. NET Web site.
For pages that only read session state, you can choose EnabledSessionState=readonly. This carries less overhead than full read/write session state, and is useful when you need only part of the functionality and don't want to pay for the write capabilities.
Use View State Only If You Need To
An example of View State might be a long form that users must fill out: if they click Back in their browser and then return, the form will remain filled. When this functionality isn't used, this state eats up memory and performance. Perhaps the largest performance drain here is that a round-trip signal must be sent across the network each time the page is loaded to update and verify the cache. Since it is on by default, you will need to specify that you do not want to use View State with <@% EnabledViewState = false %>. You should read more about View State on the the ASP. NET Web site to learn about some of the other options and settings to which you have access.
Avoid STA COM
Apartment COM is designed to deal with threading in unmanaged environments. There are two kinds of Apartment COM: single-threaded and multithreaded. MTA COM is designed to handle multithreading, whereas STA COM relies on the messaging system to serialize thread requests. The managed world is free-threaded, and using Single Threaded Apartment COM requires that all unmanaged threads essentially share a single thread for interop. This results in a massive performance hit, and should be avoided whenever possible. If you can't port the Apartment COM object to the managed world, use <@%AspCompat = "true" %> for pages that use them. For a more detailed explanation of STA COM, see the MSDN Library.
Batch Compile
Always batch compile before deploying a large page into the Web. This can be initiated by doing one request to a page per directory and waiting until the CPU idles again. This prevents the Web server from being bogged down with compilations while also trying to serve out pages.
Remove Unnecessary Http Modules
Depending on the features used, remove unused or unnecessary http modules from the pipeline. Reclaiming the added memory and wasted cycles can provide you with a small speed boost.
Avoid the Autoeventwireup Feature
Instead of relying on autoeventwireup, override the events from Page. For example, instead of writing a Page_Load() method, try overloading the public void OnLoad() method. This allows the run time from having to do a CreateDelegate() for every page.
Encode Using ASCII When You Don't Need UTF
By default, ASP.NET comes configured to encode requests and responses as UTF-8. If ASCII is all your application needs, eliminated the UTF overhead can give you back a few cycles. Note that this can only be done on a per-application basis.
Use the Optimal Authentication Procedure
There are several different ways to authenticate a user and some of more expensive than others (in order of increasing cost: None, Windows, Forms, Passport). Make sure you use the cheapest one that best fits your needs.
Tips for Porting and Developing in Visual Basic
A lot has changed under the hood from Microsoft® Visual Basic® 6 to Microsoft® Visual Basic® 7, and the performance map has changed with it. Due to the added functionality and security restrictions of the CLR, some functions are simply unable to run as quickly as they did in Visual Basic 6. In fact, there are several areas where Visual Basic 7 gets trounced by its predecessor. Fortunately, there are two pieces of good news:
- Most of the worst slowdowns occur during one-time functions, such as loading a control for the first time. The cost is there, but you only pay it once.
- There are a lot of areas where Visual Basic 7 is faster, and these areas tend to lie in functions that are repeated during run time. This means that the benefit grows over time, and in several cases will outweigh the one-time costs.
The majority of the performance issues come from areas where the run time does not support a feature of Visual Basic 6, and it has to be added to preserve the feature in Visual Basic 7. Working outside of the run time is slower, making some features far more expensive to use. The bright side is that you can avoid these problems with a little effort. There are two main areas that require work to optimize for performance, and few simple tweaks you can do here and there. Taken together, these can help you step around performance drains, and take advantage of the functions that are much faster in Visual Basic 7.
Error Handling
The first concern is error handling. This has changed a lot in Visual Basic 7, and there are performance issues related to the change. Essentially, the logic required to implement OnErrorGoto and Resume is extremely expensive. I suggest taking a quick look at your code, and highlighting all the areas where you use the Err object, or any error-handling mechanism. Now look at each of these instances, and see if you can rewrite them to use try/catch. A lot of developers will find that they can convert to try/catch easily for most of these cases, and they should see a good performance improvement in their program. The rule of thumb is "if you can easily see the translation, do it."
Here's an example of a simple Visual Basic program that uses On Error Goto compared with the try/catch version.
Sub SubWithError() On Error Goto SWETrap Dim x As Integer Dim y As Integer x = x / y SWETrap: Exit Sub End Sub Sub SubWithErrorResumeLabel() On Error Goto SWERLTrap Dim x As Integer Dim y As Integer x = x / y SWERLTrap: Resume SWERLExit End Sub SWERLExit: Exit Sub | Sub SubWithError() Dim x As Integer Dim y As Integer Try x = x / y Catch Return End Try End Sub Sub SubWithErrorResumeLabel() Dim x As Integer Dim y As Integer Try x = x / y Catch Goto SWERLExit End Try SWERLExit: Return End Sub |
The speed increase is noticeable. SubWithError() takes 244 milliseconds using OnErrorGoto, and only 169 milliseconds using try/catch. The second function takes 179 milliseconds compared to 164 milliseconds for the optimized version.
Use Early Binding
The second concern deals with objects and typecasting. Visual Basic 6 does a lot of work under the hood to support casting of objects, and many programmers aren't even aware of it. In Visual Basic 7, this is an area that out of which you can squeeze a lot of performance. When you compile, use early binding. This tells the compiler to insert a Type Coercion is only done when explicitly mentioned. This has two major effects:
- Strange errors become easier to track down.
- Unneeded coercions are eliminated, leading to substantial performance improvements.
When you use an object as if it were of a different type, Visual Basic will coerce the object for you if you don't specify. This is handy, since the programmer has to worry about less code. The downside is that these coercions can do unexpected things, and the programmer has no control over them.
There are instances when you have to use late binding, but most of the time if you're not sure then you can get away with early binding. For Visual Basic 6 programmers, this can be a bit awkward at first, since you have to worry about types more than in the past. This should be easy for new programmers, and people familiar with Visual Basic 6 will pick it up in no time.
Turn On Option Strict and Explicit
With Option Strict on, you protect yourself from inadvertent late binding and enforce a higher level of coding discipline. For a list of the restrictions present with Option Strict, see the MSDN Library. The caveat to this is that all narrowing type coercions must be explicitly specified. However, this in itself may uncover other sections of your code that are doing more work than you had previously thought, and it may help you stomp some bugs in the process.
Option Explicit is less restrictive than Option Strict, but it still forces programmers to provide more information in their code. Specifically, you must declare a variable before using it. This moves the type-inference from the run time into compile time. This eliminated check translates into added performance for you.
I recommend that you start with Option Explicit, and then turn on Option Strict. This will protect you from a deluge of compiler errors, and allow you to gradually start working in the stricter environment. When both of these options are used, you ensure maximum performance for your application.
Use Binary Compare for Text
When comparing text, use binary compare instead of text compare. At run time, the overhead is much lighter for binary.
Minimize the Use of Format()
When you can, use toString() instead of format(). In most cases, it will provide you with the functionality you need, with much less overhead.
Use Charw
Use charw instead of char. The CLR uses Unicode internally, and char
must be translated at run time if it is used. This can result in a substantial performance loss, and specifying that your characters are a full word long (using charw)
eliminates this conversion.Optimize Assignments
Use exp += val instead of exp = exp + val. Since exp
can be arbitrarily complex, this can result in lots of unnecessary work. This forces the JIT to evaluate both copies of exp, and many times this is not needed. The first statement can be optimized far better than the second, since the JIT can avoid evaluating the exp twice.Avoid Unnecessary Indirection
When you use byRef, you pass pointers instead of the actual object. Many times this makes sense (side-effecting functions, for example), but you don't always need it. Passing pointers results in more indirection, which is slower than accessing a value that is on the stack. When you don't need to go through the heap, it is best to avoid it.
Put Concatenations in One Expression
If you have multiple concatenations on multiple lines, try to stick them all on one expression. The compiler can optimize by modifying the string in place, providing a speed and memory boost. If the statements are split into multiple lines, the Visual Basic compiler will not generate the Microsoft Intermediate Language (MSIL) to allow in-place concatenation. See the StringBuilder example discussed earlier.
Include Return Statements
Visual Basic allows a function to return a value without using the return statement. While Visual Basic 7 supports this, explicitly using return allows the JIT to perform slightly more optimizations. Without a return statement, each function is given several local variables on stack to transparently support returning values without the keyword. Keeping these around makes it harder for the JIT to optimize, and can impact the performance of your code. Look through your functions and insert return as needed. It doesn't change the semantics of the code at all, and it can help you get more speed from your application.
Tips for Porting and Developing in Managed C++
Microsoft is targeting Managed C++ (MC++) at a specific set of developers. MC++ is not the best tool for every job. After reading this document, you may decide that C++ is not the best tool, and that the tradeoff costs are not worth the benefits. If you aren't sure about MC++, there are many good resources to help you make your decision This section is targeted at developers who have already decided that they want to use MC++ in some way, and want to know about the performance aspects of it.
For C++ developers, working Managed C++ requires that several decisions be made. Are you porting some old code? If so, do you want to move the entire thing to managed space or are you instead planning to implement a wrapper? I'm going to focus on the 'port-everything' option or deal with writing MC++ from scratch for the purposes of this discussion, since those are the scenarios where the programmer will notice a performance difference.
Benefits of the Managed World
The most powerful feature of Managed C++ is the ability to mix and match managed and unmanaged code at the expression level. No other language allows you to do this, and there are some powerful benefits you can get from it if used properly. I'll walk through some examples of this later on.
The managed world also gives you huge design wins, in that a lot of common problems are taken care of for you. Memory management, thread scheduling and type coercions can be left to the run time if you desire, allowing you to focus your energies on the parts of the program that need it. With MC++, you can choose exactly how much control you want to keep.
MC++ programmers have the luxury of being able to use the Microsoft Visual C® 7 (VC7) backend when compiling to IL, and then using the JIT on top of that. Programmers that are used to working with the Microsoft C++ compiler are used to things being lightning-fast. The JIT was designed with different goals, and has a different set of strengths and weaknesses. The VC7 compiler, not bound by the time restrictions of the JIT, can perform certain optimizations that the JIT cannot, such as whole-program analysis, more aggressive inlining and enregistration. There are also some optimizations that can be performed only in typesafe environments, leaving more room for speed than C++ allows.
Because of the different priorities in the JIT, some operations are faster than before while others are slower. There are tradeoffs you make for safety and language flexibility, and some of them aren't cheap. Fortunately, there are things a programmer can do to minimize the costs.
Porting: All C++ Code Can Compile to MSIL
Before we go any further, it's important to note that you can compile any C++ code into MSIL. Everything will work, but there's no guarantee of type-safety and you pay the marshalling penalty if you do a lot of interop. Why is it helpful to compile to MSIL if you don't get any of the benefits? In situations where you are porting a large code base, this allows you to gradually port your code in pieces. You can spend your time porting more code, rather than writing special wrappers to glue the ported and not-yet-ported code together if you use MC++, and that can result in a big win. It makes porting applications a very clean process. To learn more about compiling C++ to MSIL, take a look at the /clr compiler option.
However, simply compiling your C++ code to MSIL doesn't give you the security or flexibility of the managed world. You need to write in MC++, and in v1 that means giving up a few features. The list below is not supported in the current version of the CLR, but may be in the future. Microsoft chose to support the most common features first, and had to cut some others in order to ship. There is nothing that prevents them from being added later, but in the meantime you will need to do without them:
- Multiple Inheritance
- Templates
- Deterministic Finalization
You can always interoperate with unsafe code if you need those features, but you will pay the performance penalty of marshalling data back and forth. And bear in mind that those features can only be used inside the unmanaged code. The managed space has no knowledge of their existence. If you are deciding to port your code, think about how much you rely on those features in your design. In a few cases, the redesign is too expensive and you will want to stick with unmanaged code. This is the first decision you should make, before you start hacking.
Advantages of MC++ Over C# or Visual Basic
Coming from an unmanaged background, MC++ preserves a lot of the ability to handle unsafe code. MC++'s ability to mix managed and unmanaged code smoothly provides the developer with a lot of power, and you can choose where on the gradient you want to sit when writing your code. On one extreme, you can write everything in straight, unadulterated C++ and just compile with /clr. On the other, you can write everything as managed objects and deal with the language limitations and performance problems mentioned above.
But the real power of MC++ comes when you choose somewhere in between. MC++ allows you to tweak some of the performance hits inherent in managed code, by giving you precise control over when to use unsafe features. C# has some of this functionality in the unsafe keyword, but it's not an integral part of the language and it is far less useful than MC++. Let's step through some examples showing the finer granularity available in MC++, and we'll talk about the situations where it comes in handy.
Generalized "byref" pointers
In C# you can only take the address of some member of a class by passing it to a ref parameter. In MC++, a byref pointer is a first-class construct. You can take the address of an item in the middle of an array and return that address from a function:
Byte* AddrInArray( Byte b[] ) { return &b[5]; }
We exploit this feature for returning a pointer to the "characters" in a System.String via our helper routine, and we can even loop through arrays using these pointers:
System::Char* PtrToStringChars(System::String*); for( Char*pC = PtrToStringChars(S"boo"); pC != NULL; pC++ ) { ... *pC ... }
You can also do a linked-list traversal with injection in MC++ by taking the address of the "next" field (which you cannot do in C#):
Node **w = &Head; while(true) { if( *w == 0 || val < (*w)->val ) { Node *t = new Node(val,*w); *w = t; break; } w = &(*w)->next; }
In C#, you can't point to "Head", or take the address of the "next" field, so you have make a special-case where you're inserting at the first location, or if "Head" is null. Moreover, you have to look one node ahead all the time in the code. Compare this to what a good C# would produce:
if( Head==null || val < Head.val ) { Node t = new Node(val,Head); Head = t; }else{ // we know at least one node exists, // so we can look 1 node ahead Node w=Head; while(true) { if( w.next == null || val < w.next.val ){ Node t = new Node(val,w.next.next); w.next = t; break; } w = w.next; } }
User Access to Boxed Types
A performance problem common with OO languages is the time spent boxing and unboxing values. MC++ gives you a lot more control over this behavior, so you won't have to dynamically (or statically) unbox to access values. This is another performance enhancement. Just place __box keyword before any type to represent its boxed form:
__value struct V { int i; }; int main() { V v = {10}; __box V *pbV = __box(v); pbV->i += 10; // update without casting }
In C# you have to unbox to a "v", then update the value and re-box back to an Object:
struct B { public int i; } static void Main() { B b = new B(); b.i = 5; object o = b; // implicit box B b2 = (B)o; // explicit unbox b2.i++; // update o = b2; // implicit re-box }
STL Collections vs. Managed Collections—v1
The bad news: In C++, using the STL Collections was often just as fast as writing that functionality by hand. The CLR frameworks are very fast, but they suffer from boxing and unboxing issues: everything is an object, and without template or generic support, all actions have to be checked at run time.
The good news: In the long term, you can bet that this problem will go away as generics are added to the run time. Code you deploy today will experience the speed boost without any changes. In the short term, you can use static casting to prevent the check, but this is no longer safe. I recommend using this method in tight code where performance is absolutely critical, and you've identified two or three hot spots.
Use Stack Managed Objects
In C++, you specify that an object should be managed by the stack or the heap. You can still do this in MC++, but there are restrictions you should be aware of. The CLR uses ValueTypes for all stack-managed objects, and there are limitations to what ValueTypes can do (no inheritance, for example). More information is available on the MSDN Library.
Corner Case: Beware Indirect Calls Within Managed Code—v1
In the v1 run time, all indirect function calls are made natively, and therefore require a transition into unmanaged space. Any indirect function call can only be made from native mode, which means that all indirect calls from managed code need a managed-to-unmanaged transition. This is a serious problem when the table returns a managed function, since a second transition must then be made to execute the function. When compared to the cost of executing a single Call instruction, the cost is fifty- to one hundred times slower than in C++!
Fortunately, when you are calling a method that resides within a garbage-collected class, optimization removes this. However, in the specific case of a regular C++ file that has been compiled using /clr, the method return will be considered managed. Since this cannot be removed by optimization, you are hit with the full double-transition cost. Below is an example of such a case.
//////////////////////// a.h: ////////////////////////// class X { public: void mf1(); void mf2(); }; typedef void (X::*pMFunc_t)(); ////////////// a.cpp: compiled with /clr ///////////////// #include "a.h" int main(){ pMFunc_t pmf1 = &X::mf1; pMFunc_t pmf2 = &X::mf2; X *pX = new X(); (pX->*pmf1)(); (pX->*pmf2)(); return 0; } ////////////// b.cpp: compiled without /clr ///////////////// #include "a.h" void X::mf1(){} ////////////// c.cpp: compiled with /clr //////////////////// #include "a.h" void X::mf2(){}
There are several ways to avoid this:
- Make the class into a managed class ("__gc")
- Remove the indirect call, if possible
- Leave the class compiled as unmanaged code (e.g. do not use /clr)
Minimize Performance Hits—version 1
There are several operations or features that are simply more expensive in MC++ under version 1 JIT. I'll list them and give some explanation, and then we'll talk about what you can do about them.
- Abstractions—This is an area where the beefy, slow C++ backend compiler wins heavily over the JIT. If you wrap an int inside a class for abstraction purposes, and you access it strictly as an int, the C++ compiler can reduce the overhead of the wrapper to practically nothing. You can add many levels of abstraction to the wrapper, without increasing the cost. The JIT is unable to take the time necessary to eliminate this cost, making deep abstractions more expensive in MC++.
- Floating Point—The v1 JIT does not currently perform all the FP-specific optimizations that the VC++ backend does, making floating point operations more expensive for now.
- Multidimensional Arrays—The JIT is better at handling jagged arrays than multidimensional ones, so use jagged arrays instead.
- 64 bit Arithmetic—In future versions, 64-bit optimizations will be added to the JIT.
What You Can Do
At every phase of development, there are several things you can do. With MC++, the design phase is perhaps the most important area, as it will determine how much work you end up doing and how much performance you get in return. When you sit down to write or port an application, you should consider the following things:
- Identify areas where you use multiple inheritance, templates, or deterministic finalization. You will have to get rid of these, or else leave that part of your code in unmanaged space. Think about the cost of redesigning, and identify areas that can be ported.
- Locate performance hot spots, such as deep abstractions or virtual function calls across managed space. These will also require a design decision.
- Look for objects that have been specified as stack-managed. Make sure they can be converted into ValueTypes. Mark the others for conversion to heap-managed objects.
During the coding stage, you should be aware of the operations that are more expensive and the options you have in dealing with them. One of the nicest things about MC++ is that you come to grips with all the performance issues up front, before you start coding: this is helpful in paring down work later on. However, there are still some tweaks you can perform while you code and debug.
Determine which areas make heavy use of floating point arithmetic, multidimensional arrays or library functions. Which of these areas are performance critical? Use profilers to pick the fragments where the overhead is costing you most, and pick which option seems best:
- Keep the whole fragment in unmanaged space.
- Use static casts on the library accesses.
- Try tweaking boxing/unboxing behavior (explained later).
- Code your own structure.
Finally, work to minimize the number of transitions you make. If you have some unmanaged code or an interop call sitting in a loop, make the entire loop unmanaged. That way you'll only pay the transition cost twice, rather than for each iteration of the loop.
No comments:
Post a Comment