Skip to content
walmart

Resource URIs

Use walmart as a database/sql-style driver so a host program can address Walmart as walmart:// URIs.

walmart is a command line, but the walmart Go package is also a small driver that makes Walmart addressable as a resource URI. A host program registers it the way a program registers a database driver with database/sql, then dereferences walmart:// URIs without knowing anything about how Walmart is fetched.

The host that does this today is ant, a single binary that puts one URI namespace over a family of site tools. The examples below use ant; any program that links the package gets the same behaviour.

Mounting the driver

A host enables the driver with one blank import, exactly like import _ "github.com/lib/pq":

import _ "github.com/tamnd/walmart-cli/walmart"

The package's init registers a domain with the scheme walmart for the hosts www.walmart.com and walmart.com. The standalone walmart binary does not change.

Addressing records

A URI is scheme://authority/id. The resolver types are:

URI What it is
walmart://product/<id> one product, keyed by its item id
walmart://store/<id> a store's public profile
walmart://category/<id> a category, keyed by its numeric id
ant get walmart://product/5037034321        # the product record
ant get walmart://category/3944             # the category record
ant url walmart://product/5037034321        # the live https URL
ant resolve https://www.walmart.com/cp/electronics/3944  # a pasted link, back to its URI

product, store, and category are best-effort: from a datacenter they may hit Walmart's bot wall and report need-auth, the same as the matching commands. See what anonymous access reaches.

Collections

ls lists the members of a collection. Each list operation has its own authority, so they never shadow one another:

URI What it lists
walmart://search/<query> products matching a keyword
walmart://category/<id> the items in a category
walmart://categories/<id> a category's child categories
walmart://stores/<zip> stores near a ZIP code
walmart://deals the current rollbacks
walmart://trending the trending products
ant ls walmart://search/cordless%20drill     # products matching the keyword
ant ls walmart://category/3944               # the items in the category
ant ls walmart://categories/3944             # the child categories

Walking the graph

Every record carries explicit edges to the records it points at, so a host can breadth-first crawl the site and write it to disk without scraping URLs out of free text. The edges are:

From Field Edge to
Listing item walmart://product/<id>
Deal item walmart://product/<id>
Product category_id walmart://category/<id>
Product variants walmart://product/<id> (each sibling)
Category parent_id walmart://category/<id> (up)
Category children walmart://category/<id> (down, each)

A search result or a category page links straight through to the full product; a product links up to its leaf category and across to its colour, size, and configuration siblings; a category links both up to its parent and down to its children. Starting from any node, --follow walks these edges:

ant export walmart://categories/ --follow 3 --to ./data   # crawl the taxonomy down three levels
ant export walmart://product/5037034321 --follow 1 --to ./data  # a product, its category, and its variants

Each record is written under its minted URI with its edges intact, so the saved set reconstructs the slice of the site that was reached: the category tree, the products in each category, and the variant clusters that tie products together.

These edge fields stay out of the table and CSV views (they would be noise in a terminal) but are always present in the JSON and JSONL a host reads.

Why this is the same code

The driver and the binary share one definition per operation. A resolver op answers both walmart product on the command line and ant get walmart://product/... through a host, from the same handler and the same client. There is no second implementation to keep in step.