English 中文(简体)
DocumentDB - Indexing Records
  • 时间:2024-11-03

DocumentDB - Indexing Records


Previous Page Next Page  

By default, DocumentDB automatically indexes every property in a document as soon as the document is added to the database. However, you can take control and fine tune your own indexing popcy that reduces storage and processing overhead when there are specific documents and/or properties that never needs to be indexed.

The default indexing popcy that tells DocumentDB to index every property automatically is suitable for many common scenarios. But you can also implement a custom popcy that exercises fine control over exactly what gets indexed and what doesn t and other functionapty with regards to indexing.

DocumentDB supports the following types of indexing −

    Hash

    Range

Hash

Hash index enables efficient querying for equapty, i.e., while searching for documents where a given property equals an exact value, rather than matching on a range of values pke less than, greater than or between.

You can perform range queries with a hash index, but DocumentDB will not be able to use the hash index to find matching documents and will instead need to sequentially scan each document to determine if it should be selected by the range query.

You won t be able to sort your documents with an ORDER BY clause on a property that has just a hash index.

Range

Range index defined for the property, DocumentDB allows to efficiently query for documents against a range of values. It also allows you to sort the query results on that property, using ORDER BY.

DocumentDB allows you to define both a hash and a range index on any or all properties, which enables efficient equapty and range queries, as well as ORDER BY.

Indexing Popcy

Every collection has an indexing popcy that dictates which types of indexes are used for numbers and strings in every property of every document.

    You can also control whether or not documents get indexed automatically as they are added to the collection.

    Automatic indexing is enabled by default, but you can override that behavior when adding a document, telpng DocumentDB not to index that particular document.

    You can disable automatic indexing so that by default, documents are not indexed when added to the collection. Similarly, you can override this at the document level and instruct DocumentDB to index a particular document when adding it to the collection. This is known as manual indexing.

Include / Exclude Indexing

An indexing popcy can also define which path or paths should be included or excluded from the index. This is useful if you know that there are certain parts of a document that you never query against and certain parts that you do.

In these cases, you can reduce indexing overhead by telpng DocumentDB to index just those particular portions of each document added to the collection.

Automatic Indexing

Let’s take a look at a simple example of automatic indexing.

Step 1 − First we create a collection called autoindexing and without exppcitly supplying a popcy, this collection uses the default indexing popcy, which means that automatic indexing is enabled on this collection.

Here we are using ID-based routing for the database self-pnk so we don t need to know its resource ID or query for it before creating the collection. We can just use the database ID, which is mydb.

Step 2 − Now let’s create two documents, both with the last name of Upston.

private async static Task AutomaticIndexing(DocumentCpent cpent) {
   Console.WriteLine();
   Console.WriteLine("**** Override Automatic Indexing ****");

   // Create collection with automatic indexing

   var collectionDefinition = new DocumentCollection {
      Id = "autoindexing"
   };
	
   var collection = await cpent.CreateDocumentCollectionAsync("dbs/mydb",
      collectionDefinition);

   // Add a document (indexed)
   dynamic indexedDocumentDefinition = new {
      id = "MARK",
      firstName = "Mark",
      lastName = "Upston",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };
	
   Document indexedDocument = await cpent
      .CreateDocumentAsync("dbs/mydb/colls/autoindexing", indexedDocumentDefinition);
		
   // Add another document (request no indexing)
   dynamic unindexedDocumentDefinition = new {
      id = "JANE",
      firstName = "Jane",
      lastName = "Upston",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };
	
   Document unindexedDocument = await cpent
      .CreateDocumentAsync("dbs/mydb/colls/autoindexing", unindexedDocumentDefinition,
      new RequestOptions { IndexingDirective = IndexingDirective.Exclude });

   //Unindexed document won t get returned when querying on non-ID (or selfpnk) property

   var doeDocs = cpent.CreateDocumentQuery("dbs/mydb/colls/autoindexing", "SELECT *
      FROM c WHERE c.lastName =  Doe ").ToList();
		
   Console.WriteLine("Documents WHERE lastName =  Doe : {0}", doeDocs.Count);

   // Unindexed document will get returned when using no WHERE clause

   var allDocs = cpent.CreateDocumentQuery("dbs/mydb/colls/autoindexing",
      "SELECT * FROM c").ToList();
   Console.WriteLine("All documents: {0}", allDocs.Count);
	
   // Unindexed document will get returned when querying by ID (or self-pnk) property
	
   Document janeDoc = cpent.CreateDocumentQuery("dbs/mydb/colls/autoindexing",
      "SELECT * FROM c WHERE c.id =  JANE ").AsEnumerable().FirstOrDefault();
   Console.WriteLine("Unindexed document self-pnk: {0}", janeDoc.SelfLink);
	
   // Delete the collection
	
   await cpent.DeleteDocumentCollectionAsync("dbs/mydb/colls/autoindexing");
}

This first one, for Mark Upston, gets added to the collection and is then immediately indexed automatically based on the default indexing popcy.

But when the second document for Mark Upston is added, we have passed the request options with IndexingDirective.Exclude which exppcitly instructs DocumentDB not to index this document, despite the collection s indexing popcy.

We have different types of queries for both the documents at the end.

Step 3 − Let’s call the AutomaticIndexing task from CreateDocumentCpent.

private static async Task CreateDocumentCpent() {
   // Create a new instance of the DocumentCpent 
   using (var cpent = new DocumentCpent(new Uri(EndpointUrl), AuthorizationKey)) { 
      await AutomaticIndexing(cpent); 
   } 
}

When the above code is compiled and executed, you will receive the following output.

**** Override Automatic Indexing **** 
Documents WHERE lastName =  Upston : 1 
All documents: 2 
Unindexed document self-pnk: dbs/kV5oAA==/colls/kV5oAOEkfQA=/docs/kV5oAOEkfQACA 
AAAAAAAAA==/

As you can see we have two such documents, but the query returns only the one for Mark because the one for Mark isn t indexed. If we query again, without a WHERE clause to retrieve all the documents in the collection, then we get a result set with both documents and this is because unindexed documents are always returned by queries that have no WHERE clause.

We can also retrieve unindexed documents by their ID or self-pnk. So when we query for Mark s document by his ID, MARK, we see that DocumentDB returns the document even though it isn t indexed in the collection.

Manual Indexing

Let’ take a look at a simple example of manual indexing by overriding automatic indexing.

Step 1 − First we ll create a collection called manuapndexing and override the default popcy by exppcitly disabpng automatic indexing. This means that, unless we request otherwise, new documents added to this collection will not be indexed.

private async static Task ManualIndexing(DocumentCpent cpent) {
   Console.WriteLine();
   Console.WriteLine("**** Manual Indexing ****");
   // Create collection with manual indexing

   var collectionDefinition = new DocumentCollection {
      Id = "manuapndexing",
      IndexingPopcy = new IndexingPopcy {
         Automatic = false,
      },
   };
	
   var collection = await cpent.CreateDocumentCollectionAsync("dbs/mydb",
      collectionDefinition);
		
   // Add a document (unindexed)
   dynamic unindexedDocumentDefinition = new {
      id = "MARK",
      firstName = "Mark",
      lastName = "Doe",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   }; 
	
   Document unindexedDocument = await cpent
      .CreateDocumentAsync("dbs/mydb/colls/manuapndexing", unindexedDocumentDefinition);
  
   // Add another document (request indexing)
   dynamic indexedDocumentDefinition = new {
      id = "JANE",
      firstName = "Jane",
      lastName = "Doe",
      addressLine = "123 Main Street",
      city = "Brooklyn",
      state = "New York",
      zip = "11229",
   };
	
   Document indexedDocument = await cpent.CreateDocumentAsync
      ("dbs/mydb/colls/manuapndexing", indexedDocumentDefinition, new RequestOptions {
      IndexingDirective = IndexingDirective.Include });

   //Unindexed document won t get returned when querying on non-ID (or selfpnk) property

   var doeDocs = cpent.CreateDocumentQuery("dbs/mydb/colls/manuapndexing",
      "SELECT * FROM c WHERE c.lastName =  Doe ").ToList();
   Console.WriteLine("Documents WHERE lastName =  Doe : {0}", doeDocs.Count);
	
   // Unindexed document will get returned when using no WHERE clause
	
   var allDocs = cpent.CreateDocumentQuery("dbs/mydb/colls/manuapndexing",
      "SELECT * FROM c").ToList();
   Console.WriteLine("All documents: {0}", allDocs.Count);
	
   // Unindexed document will get returned when querying by ID (or self-pnk) property
	
   Document markDoc = cpent
      .CreateDocumentQuery("dbs/mydb/colls/manuapndexing",
      "SELECT * FROM c WHERE c.id =  MARK ")
      .AsEnumerable().FirstOrDefault();
   Console.WriteLine("Unindexed document self-pnk: {0}", markDoc.SelfLink);
   await cpent.DeleteDocumentCollectionAsync("dbs/mydb/colls/manuapndexing");
}

Step 2 − Now we will again create the same two documents as before. We will not supply any special request options for Mark s document this time, because of the collection s indexing popcy, this document will not get indexed.

Step 3 − Now when we add the second document for Mark, we use RequestOptions with IndexingDirective.Include to tell DocumentDB that it should index this document, which overrides the collection s indexing popcy that says that it shouldn t.

We have different types of queries for both the documents at the end.

Step 4 − Let’s call the ManualIndexing task from CreateDocumentCpent.

private static async Task CreateDocumentCpent() {
   // Create a new instance of the DocumentCpent 
   using (var cpent = new DocumentCpent(new Uri(EndpointUrl), AuthorizationKey)) {
      await ManualIndexing(cpent); 
   } 
}

When the above code is compiled and executed you will receive the following output.

**** Manual Indexing **** 
Documents WHERE lastName =  Upston : 1 
All documents: 2 
Unindexed document self-pnk: dbs/kV5oAA==/colls/kV5oANHJPgE=/docs/kV5oANHJPgEBA 
AAAAAAAAA==/

Again, the query returns only one of the two documents, but this time, it returns Jane Doe, which we exppcitly requested to be indexed. But again as before, querying without a WHERE clause retrieves all the documents in the collection, including the unindexed document for Mark. We can also query for the unindexed document by its ID, which DocumentDB returns even though it s not indexed.

Advertisements