Monday, December 25, 2023

Shopify Bulk Query Example

When should I use bulk query? Read on and let's analyse it together.

This blog builds upon my previous blog, Consuming Shopify Product API Example. We'll compare the bulk query with the non-bulk query we did on that example.

Shopify Bulk Query Changes

These are the changes we made to the previous example so we can do Shopify bulk query.

main.go


// snipped...

func main() {
	config, err := config.Load("./conf/config.json")
	if err != nil {
		panic(err)
	}

	products := service.BulkQuery(config)

	r := chi.NewRouter()
	r.Use(middleware.Logger)

	r.Get("/", router.GetStatus)

	r.Mount("/products", router.GetProducts(config))
	r.Mount("/cached-products", router.GetCachedProducts(config, products))

	http.ListenAndServe(fmt.Sprintf(":%v", config.Port), r)
}

Just a few changes here. We created a new package named service to handle the bulk query operations and a new path to hit to pull the cached products. Quick and easy.

model.go


package service

type Node struct {
	ID          string   `json:"id,omitempty"`
	Title       string   `json:"title,omitempty"`
	Handle      string   `json:"handle,omitempty"`
	Vendor      string   `json:"vendor,omitempty"`
	ProductType string   `json:"producType,omitempty"`
	Tags        []string `json:"tags,omitempty"`
	Namespace   string   `json:"namespace,omitempty"`
	Key         string   `json:"key,omitempty"`
	Value       string   `json:"value,omitempty"`
	ParentID    string   `json:"__parentId,omitempty"`
}

type Product struct {
	ID          string      `json:"id,omitempty"`
	Title       string      `json:"title,omitempty"`
	Handle      string      `json:"handle,omitempty"`
	Vendor      string      `json:"vendor,omitempty"`
	ProductType string      `json:"producType,omitempty"`
	Tags        []string    `json:"tags,omitempty"`
	Metafields  []Metafield `json:"metafields,omitempty"`
}

type Metafield struct {
	Namespace string `json:"namespace,omitempty"`
	Key       string `json:"key,omitempty"`
	Value     string `json:"value,omitempty"`
	ParentID  string `json:"__parentId,omitempty"`
}

Here we model the data types we use.

products.go


// snipped...
func GetCachedProducts(config config.Config, products []service.Product) chi.Router {
	router := chi.NewRouter()

	router.Get("/", func(w http.ResponseWriter, r *http.Request) {

		jsonBytes := marshaller.Marshal(products)

		w.Header().Set(contentType, applicationJson)
		w.WriteHeader(200)
		w.Write(jsonBytes)
	})

	return router
}

A function that handles requests to the new /cached-products path.

bulk-query.go


// snipped...
func BulkQuery(config config.Config) []Product {
	fmt.Println("++ bulk query")
	bulkQueryGql := fmt.Sprintf(`
	mutation {
		bulkOperationRunQuery(
			// ... snipped ...
		}
	}
	`)

	query := GqlQuery{
		Query: bulkQueryGql,
	}

	client := &http.Client{}

	responseBody, err := sendRequest(client, query, config)
	if err != nil {
		panic(err)
	}

	gqlResp := marshaller.Unmarshal[GqlResponse](responseBody)

	if gqlResp.Data.BulkOperationRunQuery.BulkOperation.Status == "CREATED" {
		fmt.Println("Created at: ", gqlResp.Data.BulkOperationRunQuery.BulkOperation.CreatedAt)
		currentOperationQueryGql := fmt.Sprintf(`
		query CurrentBulkOperation {
			currentBulkOperation {
				completedAt
				createdAt
				errorCode
				fileSize
				id
				objectCount
				status
				url
			}
		}
		`)

		query = GqlQuery{
			Query: currentOperationQueryGql,
		}

		for {
			time.Sleep(time.Second * 2)

			responseBody, err := sendRequest(client, query, config)
			if err != nil {
				panic(err)
			}

			gqlResp = marshaller.Unmarshal[GqlResponse](responseBody)

			if gqlResp.Data.CurrentBulkOperation.Status == "CANCELED" ||
				gqlResp.Data.CurrentBulkOperation.Status == "CANCELING" ||
				gqlResp.Data.CurrentBulkOperation.Status == "EXPIRED" ||
				gqlResp.Data.CurrentBulkOperation.Status == "FAILED" {
				fmt.Println("Status: ", gqlResp.Data.CurrentBulkOperation.CreatedAt)
				break
			}

			if gqlResp.Data.CurrentBulkOperation.Status == "COMPLETED" {
				fmt.Println("URL: ", gqlResp.Data.CurrentBulkOperation.URL)
				productFile, err := downloadFile("products.tmp", gqlResp.Data.CurrentBulkOperation.URL)
				if err != nil {
					break
				}
				return parseProductsFile(productFile)
			}
		}
	}

	return make([]Product, 0)
}
// snipped...

This is where all the magic happens. We issue a bulk query request via mutation operation. We then unmarshall the response and check the bulk operation status if it has been created. If it was created, we then poll the current bulk operation until it is completed, canceled, failed, etc. Once it is completed, we download it and save it into a temporary file. This file will be in JSONL (JSON Lines) format, then we will have to parse the file in order to build the product tree.

There is also a webhook way of checking the bulk operation status. It is recommended over polling as it limits the number of redundant API calls. But for the purposes of this example, we'll do polling.

For more details about Shopify bulk operations, go to Perform bulk operations with the GraphQL Admin API. Go to Bulk Operation Status to learn more valid status values.

The JSONL file would look something like below:


{"id":"gid:\/\/shopify\/Product\/8787070189860","title":"The Videographer Snowboard","handle":"the-videographer-snowboard","vendor":"Quickstart (5cec88e7)","productType":"","tags":[]}
{"id":"gid:\/\/shopify\/Product\/8787070222628","title":"The Minimal Snowboard","handle":"the-minimal-snowboard","vendor":"Quickstart (5cec88e7)","productType":"","tags":[]}
{"id":"gid:\/\/shopify\/Product\/8787070386468","title":"The Archived Snowboard","handle":"the-archived-snowboard","vendor":"Snowboard Vendor","productType":"","tags":["Archived","Premium","Snow","Snowboard","Sport","Winter"]}

Running the Shopify Bulk Query Example

On start up, you should see somethingl like below. Shopify has returned with a download URL. Which means the bulk query has completed and we are ready to serve the cached products.

Comparison with Non-Bulk Query

Now let's compare the new way of pulling data to the old way.

Can you spot the difference? In terms of speed, the response time of the new way was clearly super fast. Just 4ms compared to 279ms, imagine that? What's more is that not only did the old way take longer, it returned less data. It return 639 B compared to 4.65 KB. In other words, in the old way we only received 3 products while in the new way we received all products including metafields. That's an icing on the cake. As for start up time of the app, it was negligible.

Shopify Bulk Query Wrap Up

Would you do bulk query now or not? It is up to you to identify a potential bulk query. Queries that use pagination to get all pages of results are the most common candidates. There are limitations on a bulk query though. For example, you can't pull data that's nested two levels deep. Check the Shopify documentation for more information.

There you have it. Another way to pull product data from the Shopify GraphQL Admin API. If you got a better way of doing things (e.g. how to parse the JSONL better), just raise a pull request. Happy to look at it. Grab the repo here, github.com/jpllosa/shopify-product-api, it's on the bulk-query branch.