Q: How do the "async" and "await" keywords work in .Net?
A: First of all, open the "AsyncCSharpConsoleTest" sln in your projects sandbox.
Async method signatures always contain a Task or Task<T> as the return type, BUT async methods never actually return a Task (or Task<T>). If the method signature specifies a return type of Task, then the method returns void (or does not return at all). If the method signature specifies a return type of Task<T>, then the method must return a type of T.
Async methods should always contain an “await” statement somewhere in the method body. If not, the method will execute synchronously. At the point where you call the method preceded by “await”, the async method suspends and returns control to the caller. Once the “awaited method” finishes, the caller (of the awaited method) will resume, continuing to execute after the await statement.
The return from a method call which is preceded by await will be the type of T from the async method signature’s return declaration, Task<T>; it will not be a task.
What’s confusing about this: when I think about other kinds of async programming that I’ve done, there are a few models:
• call an async method, passing in a callback function which the main thread will eventually call with the results of the async operation. In this case, there’s never any explicit “awaiting”: just put what you want to do (after the async operation completes) into the callback.
• The other way is more reminiscent: you create a function, or a delegate – some kind of function pointer – and then you create a new thread, passing at the function, and start thread. At some point later on, you will wait on the main thread for the spawned thread to complete; note that you really _must_ wait at some point: presumably, you will need the return value from the spawned thread at some point, and won’t be able to go any further without it.
Note that, you can always get a hold of the Task when you call an async method. Once you have the task, you can always call .ContinueWith() on the task if you want to have some more “traditional” callback-like scheme, or, if the calling method isn’t marked “async” (e.g., console app Main()).
Q: What is an event loop, and how does it work? How is this related to single-threaded, non-blocking servers?
A: (This is my guess as to how it basically works. I haven't been able to find a clear explanation of this anywhere.) We’re talking about a single thread, running. There’s an infinite while loop running on the thread. Each time through, the loop checks to see if anyone has added any events to a queue, a queue to which any thread can add an event (only the main, single thread can remove events from the queue). If there are events in the queue, the loop pulls the next event, and handles it based on the type of event. If there are no events in the queue, the loop waits on a trigger which gets pulled if anyone adds an event to the queue.
In a single threaded Web server, anytime the main thread calls a potentially blocking (long-running) operation, the call is asynchronous, and the caller (the main thread) passes in a pointer to the requisite callback function (the function that the callee will “call back” when it is done). When the main thread calls this asynchronous function, this spawns a new thread of execution. When this new thread finishes its operation, it doesn’t actually “call back” the callback function: it raises an event to the main thread, an event which includes the callback function reference and the arguments to the callback. When the main thread event loop pulls that event from the event queue, it is the main thread that actually calls the "callback" function.
Here's some pseudo-code. This is my guess as to how this works. Needs more research to be solid:
// There is a thread-global queue("eventQueue") into which anyone can put an event object. // Each event object contains the event type, and an associated data object. In the case of // an "execute callback" event, the event object would contain a callback function and a list // of arguments for the callback. // "isRunning" gets true // loop while isRunning is true // if there are events in the eventQueue // take the next event in the eventQueue // switch over the event type for the event // if it's a callback event, call the callback passing the associated arguments // else, wait on a trigger which gets triggered when someone puts an event in the eventQueue
Q: Windows memory: what are private bytes, working set, virtual bytes, etc?
Q: How do I determine encoding format of a file? Especially, how do determine whether a file is unicode, utf-8, or straight ASCII encoded?
A: Read the first two bytes of the file. Corresponding encoding and hex codes are as follow:
unicode Little Endian = "\xFF\xFE" (note the "endianness" of the file has nothing to do with unicode, only the machine format of the file: PC's (intel) are big endian, mac's (motorola) are little endian).
unicode Big Endian = "\xFE\xFF"
utf8 = "\xEF\xBB\xBF" (assuming this is big-endian? little endian should be "\xBF\xBB\xEF") (also note: utf-8 can be saved "with signature" (i.e., the leading bytes) or "without signature". Not sure why.)
ASCII = straight to content
Q: What is a Unicode-compliant font?
A: A Unicode-compliant font would be a font that has a "CMAP" that maps all or part of Unicode characters to glyphs in the font. Both new Type1 and Truetype fonts typically have such a CMAP; older fonts usually do not.
Q: What's the difference between Composition and Aggregation in UML?
A: Over time, I've seen the question about the difference between Composition and Aggregation re-occur over and over again.
Now, this is not really a pattern issue, but it has caused a great deal of confusion and argument. Therefore, it seems worthy of note. The poster then offered this definition:
Now I think I've come about a succinct rule-of-thumb definition, which highlights their difference, and as a bonus, exposes their dual nature:
COMPOSITION means the parts can not exist without the whole.
AGGREGATION means the whole can not exist without its parts.
EXAMPLE1: If you destroy a database all its tables will be destroyed as well. Therefore we can say there is a composition relationship between a database and its tables.
Note that destroying the tables (only) does not destroy the database itself.
He also noted that any binary association might be a Composition, Aggregation or a looser relationship where neither object has any dependence upon the other. However, by this definition, no association may mix the roles of both Composition and Aggregation.
In UML, placing a black diamond at the 'owner' side of an association line represents Composition. Conversely, a 'white' or empty diamond represents Aggregation. In either case, these indicate an attribute or quality specification of the association. Exactly what this quality is remains the subject of some controversy.
Stephen Albin offered the official description of these concepts from the UML specification:
According the The UML Reference Manual (Rumbaugh) p 148, "The distinction between aggregation and association is often a matter of taste rather than a difference in semantics…aggregation is association. Aggregation conveys the thought that the aggregate is inherently the sum of its parts…the only real semantics that it adds to association is the constraint that chains of aggregate links may not form cycles." "In spite of the few semantics attached to aggregation, everybody thinks it is necessary (for different reasons). Think of it as a modeling placebo."
In response, original poster modified his definition:
In reaction to this vagueness I propose the following 'dual' definition to determine if an association is a composition, aggregation, or neither:
COMPOSITION: If destroying the whole destroys (or renders meaningless) the parts, then we have a composition.
AGGREGATION: If destroying the components destroys (or renders meaningless) the whole, then we have an aggregation.
Destroying a database destroys its tables. But destroying the tables of a database does not destroy the database.
Dismantling the group "Simon and Garfunkel" does not destroy Paul or Art. But destroying Paul and Art would be the end of the group "Simon and Garfunkel."
Martin Fowler added this observation:
This is a messy part of the UML. Aggregation is there for reasons that are more due to the interpersonal dynamics of building standards than any technical reason. I recommend that you ignore aggregation. Composition is occasionally useful, but I wouldn't worry about it too much unless you want to be a really fancy UMLer.
On the whole aggregation and composition cause far more confusion than they are worth.
Although aggregation is not well defined in UML, I use the contrast between Composition and Aggregation to help provide a visual clue as to the strength of 'ownership' between two objects. Then I typically add supporting comments to the model to indicate my precise meaning. This approach at least gives some instructions for the programmer in the resulting diagram. Perhaps at some point in the future we can expect greater precision from UML (or its successor.)
For more information on this topic, see Martin's notes on Composition and Aggregation.
Q: What are the different parts of a URI?
http = protocol
minifolders = server name (or third-level domain)
buzzsaw = second-level domain name
com = top-level domain name
buzzsaw.com = domain name
FQDN (fully qualified domain name) = minifolders.buzzsaw.com
in this case, 'www' refers to "the" webserver at the given domain. it is also the "server name", though in this case it's only possible to have _one_ "server name": www.
Q: What are the "heap" and the "stack," and what's the salient difference(s) between them?
A: In certain programming languages including C and Pascal, a heap is an area of pre-reserved computer main storage (memory) that a program process can use to store data in some variable amount that won't be known until the program is running. For example, a program may accept different amounts of input from one or more users for processing and then do the processing on all the input data at once. Having a certain amount of heap storage already obtained from the operating system makes it easier for the process to manage storage and is generally faster than asking the operating system for storage every time it's needed. The process manages its allocated heap by requesting a "chunk" of the heap (called a heap block) when needed, returning the blocks when no longer needed, and doing occasional "garbage collecting," which makes blocks available that are no longer being used and also reorganizes the available space in the heap so that it isn't being wasted in small unused pieces.
The term is apparently inspired by another term, stack. A stack is similar to a heap except that the blocks are taken out of storage in a certain order and returned in the same way. In Pascal, a subheap is a portion of a heap that is treated like a stack.
In programming, a stack is a data area or buffer used for storing requests that need to be handled. The IBM Dictionary of Computing says that a stack is always a push-down list, meaning that as new requests come in, they push down the old ones. Another way of looking at a push-down list - or stack - is that the program always takes its next item to handle from the top of the stack. (This is unlike other arrangements such as "FIFO" or "first-in first-out.")
Java: All objects (aka composite types) in Java are allocated on the heap, whereas all local primitive types (declared within a method body) are allocated on the stack. Primitives that are class members are allocated on the heap with the object. Local object references (a variable which stores the address of an object) are allocated on the stack, but the referenced object is still allocated on the heap. Each thread of execution has its own stack, so that local primitives are inherently thread-safe (no thread can access the stack of another thread). The heap is accessible to all threads, so that any mutable object (in a multi-threaded environment) is potentially not thread-safe.
The entire discussion of heap vrs. stack ends up being intertwined with the issue of programming. First of all, stack allocation tends to be much faster than heap allocation. another common distinction made in programming languages is that "value based" (generally the more primitive data types like ints, bools, strings, enums, and structs) data types are allocated on the stack, while "reference based" data types (objects) are allocated on the heap. The stack is managed by scope: when a stack based variable goes out of scope, its memory is de-allocated. Objects on the heap, these days, are managed by "garbage collectors"; i.e., a software monitor which decides when it's time to clean up and de-allocate the object memory resources.
Value and referenced based variables are also managed differently when they are used in an application. When a programmer assigns or passes a value based variable, a copy of that variable is made. Among other things, this means that: a) more memory is consumed and b) modifications to the copy will NOT modify the original variable. Reference based variables are the opposite. Copying or passing a reference simply creates another reference to THE SAME object (in fact, the new variable is a copy of the _address_ of the original: both reference variables hold the same address to the object). So, if you modify the copy, the system will first "de-reference" the copy to get the SAME object to which the original reference STILL refers, and then modify that object. This means that: a) modifying the copy modifies the original and b) relatively little memory is consumed: you are not copying the entire object, only the ADDRESS of the object.
Q: What's a "cache?"
A: A cache (pronounced CASH) is a place to store something temporarily. The files you automatically request by looking at a Web page are stored on your hard disk in a cache subdirectory under the directory for your browser (for example, Internet Explorer). When you return to a page you've recently looked at, the browser can get it from the cache rather than the original server, saving you time and the network the burden of some additional traffic. You can usually vary the size of your cache, depending on your particular browser.
Computers include caches at several levels of operation, including cache memory and a disk cache. Caching can also be implemented for Internet content by distributing it to multiple servers that are periodically refreshed. (The use of the term in this context is closely related to the general concept of a distributed information base.)
Altogether, we are aware of these types of caches:
- International, national, regional, organizational and other "macro" caches to which highly popular information can be distributed and periodically updated and from which most users would obtain information.
- Local server caches (for example, corporate LAN servers or access provider servers that cache frequently accessed files). This is similar to the previous idea, except that the decision of what data to cache may be entirely local.
- Your Web browser's cache, which contains the most recent Web files that you have downloaded and which is phyically located on your hard disk (and possibly some of the following caches at any moment in time)
- A disk cache (either a reserved area of RAM or a special hard disk cache) where a copy of the most recently accessed data and adjacent (most likely to be accessed) data is stored for fast access.
- RAM itself, which can be viewed as a cache for data that is initially loaded in from the hard disk (or other I/O storage systems).
- L2 cache memory, which is on a separate chip from the microprocessor but faster to access than regular RAM.
- L1 cache memory on the same chip as the microprocessor.
Also see: buffer, which, like a cache, is a temporary place for data, but with the primary purpose of coordinating communication between programs or hardware rather than improving process speed.
Q: What's an "access violation?"
A: An access violation is the attempt by a computer process to access a memory area that it does not own or have permission to access.
With modern operating systems, each process is given one or more segments of system memory where it can store and retrieve information. Each process can request more or less memory as needed, and the request will be acknowlegded by the operating system with the address of the granted memory section. Typically, the process that requested the memory is the only one allowed to read or write it.
An access violation occurs when a process attempts to access a portion of memory assigned to another application, or an unused memory area, without having permission to do so. It is typically the result of a computer bug, for example a wrong pointer. In the popular C programming language, the most frequent cause for access violations is the use of a pointer that has been set to the NULL value, that is, zero. This addressing is always reserved by the operating system, and it is handled as a sure symptom of a serious programming error.
Q: Are static methods thread-safe?
A: The static keyword for methods really has nothing to do with the variables defined within the method. So, static methods may or may not be thread-safe: it depends on whether or not they access static/global variables, and if they do, how they do it. Following on that, it really depends on the local variables and/or class members which the method is accessing/modifying. If some of those members/variables are static as well, then the static method may not be thread safe. This is true of any method, in fact, not just static methods. Local variables cannot be shared among threads because each thread gets its own stack; thus, the local variables of a static method are always non-static (thread-safe), unless they are explicitly declared static. This is actually true of all variables, fields, members, etc. So, any code which acts on static data, whether it's a static method or not, has the potential of not being thread-safe, depending on the use of the static data; in a multi-threaded system, to avoid race conditions, static data must be locked during access, otherwise multi-threaded systems will (probably-eventually) corrupt the data.
Q: What is COM?
A: The Component Object Model (COM) is a Microsoft platform for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication [i.e. a common traslation interface between executables which are not written in the same language or designed in the same way?] and dynamic object creation in any programming language that supports the technology. The term COM is often used in the software development world as an umbrella term that encompasses the OLE, OLE Automation, ActiveX, COM+ and DCOM technologies. Although COM was introduced in 1993, Microsoft did not begin emphasizing the name COM until 1997.
Although it has been implemented on several platforms, it is primarily used with Microsoft Windows. COM is expected to be replaced to at least some extent by the Microsoft .NET framework, and support for Web Services through the Windows Communications Framework. Networked DCOM uses binary proprietary formats, while WCF uses XML-based SOAP messaging. COM also competes with CORBA and Java Beans as component software systems.
Another definition: "COM is a specification and a set of services that permit us to create applications that are language-independent, modular, object-oriented, distributed, customizable, and upgradable. Let us now take apart this definition word by word." From:
Q: What is an ActiveX control?
A: An ActiveX control is a component program object that can be re-used by many application programs within a computer or among computers in a network. The technology for creating ActiveX controls is part of Microsoft's overall ActiveX set of technologies, chief of which is the Component Object Model (COM). ActiveX controls can be downloaded as small programs or animations for Web pages, but they can also be used for any commonly-needed task by an application program in the latest Windows and Macintosh environments. In general, ActiveX controls replace the earlier OCX (Object Linking and Embedding custom controls). An ActiveX control is roughly equivalent in concept and implementation to the Java applet.
An ActiveX control can be created in any programming language that recognizes Microsoft's Component Object Model. The distributed support for COM is called the Distributed Component Object Model (DCOM). In implementation, an ActiveX control is a dynamic link library (DLL) module. An ActiveX control runs in what is known as a container, an application program that uses the Component Object Model program interfaces. This reuseable component approach to application development reduces development time and improves program capability and quality. Windows application development programs such as PowerBuilder and Microsoft Access take advantage of ActiveX controls. Visual Basic and C++ are commonly used to write ActiveX controls.
Q: What are the basic principles of public-key cryptography (e.g., RSA)?
A: Public-key cryptography uses two separate keys: a public key (aka the 'lock') which is published to the outside world, and a private (unlock) key. People use the key-pair to do two things:
- The public key is used to encrypt (lock) data, and only owners of the corresponding private key can decrypt or unlock that data.
- The private key is used to sign data, and a receiver of the data can use the public key to verify that someone with the private key signed the data.
Following are excerpts from this file: Public key cryptography and digital signing -- An excellent explanation with some diagrams
Encryption and Decryption
Encryption is a mechanism by which a message is transformed so that only the sender and recipient can see. For instance, suppose that Alice wants to send a private message to Bob. To do so, she first needs Bob’s public-key; since everybody can see his public-key, Bob can send it over the network in the clear without any concerns. Once Alice has Bob’s public-key, she encrypts the message using Bob’s public-key and sends it to Bob. Bob receives Alice’s message and, using his private-key, decrypts it.
Digital Signature and Verification
Digital signature is a mechanism by which a message is authenticated i.e. proving that a message is effectively coming from a given sender, much like a signature on a paper document. For instance, suppose that Alice wants to digitally sign a message to Bob. To do so, she uses her private-key to encrypt the message; she then sends the message along with her public-key (typically, the public key is attached to the signed message). Since Alice’s public-key is the only key that can decrypt that message, a successful decryption constitutes a Digital Signature Verification, meaning that there is no doubt that it is Alice’s private key that encrypted the message.
Both encryption and digital signature can be combined, hence providing privacy and authentication.
Q: How does SSH work?
A: SSH uses public-key cryptography to authenticate and encrypt communication between clients and servers. I.e., in SSH, both a client and a server have their own public and private keys. Each exchanges public keys with the other, and (somehow) independently agrees to trust the other's public key (i.e., actual humans exchange the public keys, or some other system provides a mechanism by which the client and server can agree to trust each other's public keys.) Once trust is established, the client and server can authenticate and encrypt communication (data), relying on the principles of public-key cryptography.
Q: How do security certificates work?
A: (Note: see "What are the basic principles of public-key cryptography?" above.) A security certificate, whether it is a personal certificate or a Web site certificate, associates an identity with a public key. Only the owner of the certificate (should) know the corresponding private key. The private key allows the owner to:
- Decrypt information (using the private key) which was encrypted with the corresponding public key. This ensures privacy.
- Create a digital signature and sign things with it; these signed things can be decrypted with the public key. This validates that whomever signed the message has the private key.
When you send your certificate to other people, you are actually giving them your public key, so they can send you encrypted information that only you can decrypt and read with your private key.
The digital signature component of a security certificate is your electronic identity card. The digital signature tells the recipient that the information actually came from you and has not been forged or tampered with. I.e., a message signed with a private key can only be decrypted by the corresponding public key. So, if the message can be decrypted with a given public key, it _must have_ been encrypted with the corresponding private key.
Before you can start sending encrypted or digitally signed information, you must obtain a certificate and set up Internet Explorer to use it. When you visit a secure Web site (one whose address starts with https), the site automatically sends you its certificate.
In cryptography, a certificate authority or certification authority (CA, see below) is an entity which issues digital certificates for use by other parties. It is an example of a trusted third party. CAs are characteristic of many public key infrastructure (PKI) schemes.
There are many commercial CAs that charge for their services. Institutions and governments may have their own CAs, and there are free CAs.
A CA will issue a public key certificate which states that the CA attests that the public key contained in the certificate belongs to the person, organization, server, or other entity noted in the certificate. A CA's obligation in such schemes is to verify an applicant's credentials, so that users (relying parties) can trust the information in the CA's certificates. The usual idea is that if the user trusts the CA and can verify the CA's signature, then they can also verify that a certain public key does indeed belong to whomever is identified in the certificate.
Another way to think about it: A certificate, in it's most basic form, contains a distinguished name (DN) and a public key, the combination of which are themselves signed by some private key. The DN identifies an entity — a company, for example — that holds the private key that matches the public key of the certificate. Whichever entity signs the the combination of the DN and public key creates a certificate by doing so. If a certificate is signed by the entity which holds the corresponding private key, it is "self-signed" (e.g., the certificate of a root certificate authority). Usually, certificates are signed by another entity, namely a "trusted-third-party". E.g., your company's certificate will be signed by a certificate authority, and authority which most other users have agreed to trust. In this way, users can trust your company's certificate (i.e., via a transitive relationship).
Q: What is a Root Certificate?
A: A security certificate which identifies the Root Certificate Authority.
Digital certificates are verified using a chain of trust. The trust anchor for the digital certificate is the Root Certificate Authority (CA).
A certificate authority can issue multiple certificates in the form of a tree structure. A root certificate is the top-most certificate of the tree, the private key of which is used to "sign" other certificates. All certificates below the root certificate inherit the trustworthiness of the root certificate - a signature by a root certificate is somewhat analogous to "notarizing" an identity in the physical world.
Many software applications assume these root certificates are trustworthy on the user's behalf. For example, a Web browser uses them to verify identities within SSL/TLS secure connections. However, this implies that the user trusts their browser's publisher, the certificate authorities it trusts, and anyone the certificate authority may have issued a certificate-issuing-certificate, to faithfully verify the identity and intentions of all parties that own the certificates. This (transitive) trust in a root certificate is the usual case and is integral to the X.509 certificate chain model.
A root certificate is part of a public key infrastructure scheme. The most common commercial variety is based on the ITU-T X.509 standard, which normally includes a digital signature from a certificate authority (CA).
Note that in Buzzsaw, OPs created a 'mock' certificate authority for Buzzsaw.com which we use in most of our test environments. We did this so that we could more cheaply create ssl certificates for all of the BZ test environments. Because of this, we need to install the mock Buzzsaw.com root certificate into the list of Trusted Root Certification Authorities on all of our machines which need to access the various test environments. The actual certificate for Buzzsaw Production is issued by a true certificate authority, one which most browsers/computers already trust by default.
from Wikipedia: http://en.wikipedia.org/wiki/Root_certificate
Q: What Are Active Server Pages?
A: Active Server Pages (ASPs) are Web pages that contain server-side scripts in addition to the usual mixture of text and HTML (Hypertext Markup Language) tags. Server-side scripts are special commands you put in Web pages that are processed before the pages are sent from your Personal Web Server to the Web browser of someone who's visiting your Web site. . When you type a URL in the Address box or click a link on a Web page, you're asking a Web server on a computer somewhere to send a file to the Web browser (sometimes called a "client") on your computer. If that file is a normal HTML file, it looks exactly the same when your Web browser receives it as it did before the Web server sent it. After receiving the file, your Web browser displays its contents as a combination of text, images, and sounds.
In the case of an Active Server Page, the process is similar, except there's an extra processing step that takes place just before the Web server sends the file. Before the Web server sends the Active Server Page to the Web browser, it runs all server-side scripts contained in the page. Some of these scripts display the current date, time, and other information. Others process information the user has just typed into a form, such as a page in the Web site's guestbook.
To distinguish them from normal HTML pages, Active Server Pages are given the ".asp" extension.
Q: What are ODBC and DSN?
A: Open Database Connectivity (ODBC) is an open standard application programming interface (API) for accessing a database. By using ODBC statements in a program, you can access files in a number of different databases, including Access, dBase, DB2, Excel, and Text. In addition to the ODBC software, a separate module or driver is needed for each database to be accessed. The main proponent and supplier of ODBC programming support is Microsoft.
Data source name (DSN) is a data structure that contains the information about a specific database that an Open Database Connectivity (ODBC) driver needs in order to connect to it. Included in the DSN, which resides either in the registry or as a separate text file, is information such as the name, directory and driver of the database, and, depending on the type of DSN, the ID and password of the user. The developer creates a separate DSN for each database. To connect to a particular database, the developer specifies its DSN within a program. In contrast, DSN-less connections require that all the necessary information be specified within the program.
Q: What does it mean to index in SQL server?
A: Note: the Wikipedia topic on idexes is very useful and has links to associated topics. Much of the following is taken from here: http://en.wikipedia.org/wiki/Index_(database)
Also, some great info here: http://databases.aspfaq.com/database/should-i-index-my-database-table-s-and-if-so-how.html
First, think about an index in a book. This is a way of taking the contents of the book and reorganizing it into alphabetical categories which reference corresponding pages in the book. The idea is to make it much faster to find information in the book. Similarly, when you insert data into a database, the data can come in any order: it's not automatically organized in a way which makes it easy to later retrieve. The database host software (e.g., SQL) doesn't know anything about the data you're adding, so it can't automatically organize it in a way which will make it easier to search. But, without some higher level of organization, when you try to find data in a database table, all the database host can do is look at each row, one after another, trying to find the one you want. This is known as a "full table scan".
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns [i.e., "significant" columns; ones which will help the database host re-organize the data in a way which will better help you find what you want]. The disk space required to store the index is typically less than the storage of the table (since indexes usually contains only the key-fields according to which the table is to be arranged, and excludes all the other details in the table). In a relational database an index is a copy of part of a table.
Some databases extend the power of indexes even further by allowing indexes to be created on functions or expressions. For example, an index could be created on upper(last_name), which would only store the uppercase versions of the last_name field in the index.
Indexes are defined as unique or non-unique. A unique index acts as a constraint on the table by preventing identical rows in the index and thus, the original columns.
Clustered (Integral) and Non-Clustered Indexes
Index architectures are classified as clustered or non-clustered. Clustered indexes are indexes that are built based on the same key by which the data is ordered on disk. In some relational database management systems such as Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere, as is the case with a non-clustered index. Due to the fact that the clustered index corresponds (at the leaf level) to the actual data, the data in the table is sorted as per the index, and therefore, only one clustered index can exist in a given table (whereas many non-clustered indexes can exist, limited by the particular RDBMS vendor). Unclustered indexes are indexes that are built on any key. Each relation can have a single clustered index and many unclustered indexes. Clustered indexes usually store the actual records within the data structure and as a result can be much faster than unclustered indexes. Unclustered indexes are forced to store only record IDs in the data structure and require at least one additional I/O operation to retrieve the actual record. 'Intrinsic' might be a better adjective than 'clustered' — indicating that the index is an integral part of the data structure storing the table.
Indexes can be implemented using a variety of data structures. Popular indices include balanced trees, B+ trees and hashes.
The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.
For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this, others just won’t use the index.
Applications and limitations
Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Finkelstein';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the b-tree data structure until the Finkelstein entry has been found; this is much less computationally expensive than a full table scan.
Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';. This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database still must perform a full table scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term the database software is unable to use the underlying b-tree data structure. This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: select email_address from customers where reverse(email_address) like reverse('%@yahoo.com');. This puts the wild-card at the right most part of the query (now moc.oohay@%) which the index on reverse(email_address) can satisfy.
-Clustered indexes define the physical sorting of a database table’s rows in the storage media. For this reason, each database table may have only one clustered index. If a PRIMARY KEY constraint is created for a database table and no clustered index currently exists for that table, SQL Server automatically creates a clustered index on the primary key.
-Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself. SQL Server 2000 supports a maximum of 249 non-clustered indexes per table. However, it’s important to keep in mind that non-clustered indexes slow down the data modification and insertion process, so indexes should be kept to a minimum
SQL Server provides a wonderful facility known as the Index Tuning Wizard which greatly enhances the index selection process. To use this tool, first use SQL Profiler to capture a trace of the activity for which you wish to optimize performance. You may wish to run the trace for an extended period of time to capture a wide range of activity. Then, using Enterprise Manager, start the Index Tuning Wizard and instruct it to recommend indexes based upon the captured trace. It will not only suggest appropriate columns for queries but also provide you with an estimate of the performance increase you’ll experience after making those changes!
Q: What is Ajax?
Like DHTML, Ajax is not a technology in itself, but a term that refers to the use of a group of technologies.
The "core" and defining element of Ajax is the XMLHttpRequest object, which gives browsers the ability to make dynamic and asynchronous data requests without having to reload a page, eliminating the need for page refreshes and postbacks. This means that parts of a web page can load (or remain loaded) and "serve" a user (by showing data or allowing input), while other parts of the web page are retrieving data or otherwise responding to user input. Ultimately, this means that a web page can be more user-responsive.
AJAX uses a combination of:
- XHTML (or HTML) and CSS, for marking up and styling information.
- The XMLHttpRequest object is used to exchange data asynchronously with the web server. XML is sometimes used as the format for transferring data between the server and client, although any format will work, including preformatted HTML, plain text and JSON. These files may be created dynamically by some form of server-side code.
Q: What is a MIME type?
A: Multipurpose Internet Mail Extensions. Now commonly referred to as an "Internet Media Type", a MIME type is a two-part identifier for file formats on the Internet. The identifiers were originally defined in RFC 2046 for use in e-mail sent through SMTP, but their use has expanded to other protocols such as HTTP and SIP.
A media type is composed of at least two parts: a type, a subtype, and one or more optional parameters. For example, subtypes of text type have an optional charset parameter that can be included to indicate the character encoding, and subtypes of multipart type often define a boundary between parts.
See http://en.wikipedia.org/wiki/Internet_media_type for a list of common types.
Q: What is MIME?
A: MIME (Multi-Purpose Internet Mail Extensions) is an extension of the original Internet e-mail protocol (SMTP) that lets people use the protocol to exchange:
- text in character sets other than US-ASCII;
- non-text attachments;
- multi-part message bodies; and
- header information in non-ASCII character sets.
Virtually all human-written Internet e-mail and a fairly large proportion of automated e-mail is transmitted via SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME standards that it is sometimes called SMTP/MIME e-mail.
The content types defined by MIME standards are also of importance outside of e-mail, such as in communication protocols like HTTP for the World Wide Web. HTTP requires that data be transmitted in the context of e-mail-like messages, even though the data may not actually be e-mail.
Q: What is an Application Pool?
A: An application pool is a configuration that links one or more applications to a set of one or more worker processes. Because applications in an application pool are separated from other applications by worker process boundaries, an application in one application pool is not affected by problems caused by applications in other application pools.
By creating new application pools and assigning Web sites and applications to them, you can make your server more efficient and reliable, as well as making your other applications always available, even when the worker process serving the new application pool has problems.
Q: What's the difference between Localization, Internationalization, and Globalization
A successfully localized service or product is one that seems to have been developed within the local culture. Localizationis the process of adapting the text and applications of a product or service to enable its acceptability for a particular cultural or linguistic market. Translation is the central activity of localization. Localization goes beyond literal translation, in addition to idiomatic language translation, numerous locale details such as currency, national regulations and holidays, cultural sensitivities, product or service names, gender roles, and geographic examples among many other details must all be considered.
Internationalization is planning and implementing products and services so that they can easily be localized for specific languages and cultures.
This process requires a combination of both international and technical expertise, and generally involves both deploying new systems and reengineering existing ones. Once the internationalized platform is in place, rollouts in new countries or cultures should be significantly more cost efficient, timely and market effective.
Globalization is an approach to business strategy that aims to address all of the logistical and organizational challenges an enterprise faces as it expands its supporting content, assets and message across cultures and markets to new clients. Globalization incorporates internationalization and localization to achieve this goal.
Q: What's a cookie?
A: ASP.NET help on Cookies (HttpCookie): A cookie is a small bit of text that accompanies requests and pages as they go between the Web server and browser. The cookie contains information the Web application can read whenever the user visits the site.
For example, if a user requests a page from your site and your application sends not just a page, but also a cookie containing the date and time, when the user's browser gets the page, the browser also gets the cookie, which it stores in a folder on the user's hard disk. (Note: If you do not set the cookie's expiration, the cookie is created but it is not stored on the user's hard disk. Instead, the cookie is maintained as part of the user's session information. When the user closes the browser, the cookie is discarded. A non-persistent cookie like this is useful for information that needs to be stored for only a short time or that for security reasons should not be written to disk on the client computer. E.g., authentication ticket ID or something)
Later, if user requests a page from your site again, when the user enters the URL the browser looks on the local hard disk for a cookie associated with the URL. If the cookie exists, the browser sends the cookie to your site along with the page request. Your application can then determine the date and time that the user last visited the site. You might use the information to display a message to the user or check an expiration date.
Cookies are associated with a Web site, not with a specific page, so the browser and server will exchange cookie information no matter what page the user requests from your site. As the user visits different sites, each site might send a cookie to the user's browser as well; the browser stores all the cookies separately.
Cookies help Web sites store information about visitors. More generally, cookies are one way of maintaining continuity in a Web application—that is, of persisting some sort of state between individual web requests (which generally contain no state information: i.e., each request comes in as if it is unique, one time only, so that the server knows very little specific info about the request without some added data).
In our app, here's how the authentication cookie works:
- Everything is shipped over the wire using HTTPS encryption.
- User logs in, sending his user/pwrd to the server in a base-64 encoded Authentication header.
- Server receives header, validates it, and responds with a standard Set-Cookie header which includes a bunch of data (domain, timestamp, encoded user): this is the response cookie.
- Client gets the response cookie and stores it in memory. Then, with each request to the domain which was set in the response cookie, the client sends the same cookie data as a standard HTTP "Cookie" header.
- The server verifies the cookie with each request. The cookie is valid for the duration of the "session", a duration which is established by the original response cookie in an 'expires' key-value.