What should I do now?
The software-development curriculum at Learners Guild in Oakland is being revised again (the last major overhaul took place in June), this time with substantial involvement by Learners themselves. During my 27th week there (ending on 10 November), while the final revisions were being made by the staff, Learners continued to enjoy plenty of advice on how to progress toward eligibility for gainful employment, and access to expert coaching and presentations in support of further learning, but almost complete freedom to choose what aspects of software development to study. Most of my peers in “phase 4” spent their afternoons working in teams on projects to add features to the Guild’s own software, after devoting the mornings to practice solving programming problems under time pressure and explaining their solutions as would be required in a job interview.
I decided to do something that would let me learn more about a near-universal problem faced by the inhabitants of the digital universe: search. I happened to own a collection of about 15,000 documents that (like any document repository) could use better search tools, so I was motivated to learn how to make website searching accurate, user-friendly, and powerful.
Despite its ubiquity on the web, search was only barely mentioned in the Guild’s curriculum. For example, in working on a Guild module, I had developed a search interface for movies, but it merely let the user enter a query and fed that query to IMDB to run whatever search IMDB chose to run. As I mentioned at the time, the results given by IMDB could be mystifying in their irrelevance. And, if IMDB couldn’t do better than that, there must be a demand for people who can improve search engines.
The search begins
I started by investigating tools for creating search engines. Before long I decided to figure out how to use Solr, an open-source project that the Apache Software Foundation has been developing for the last 11 years. It is written in Java, which I last studied or used a decade ago. So I intended to use Solr as a tool in developing a document-repository website, but not to contribute to the development of Solr itself. Nonetheless, as I began to study its documentation, I did find some inconsistent or ambiguous parts and contributed corrections to the project.
Solr is complicated. Consider that it can analyze HTML, XML, Microsoft Word, Microsoft Excel, PowerPoint, OpenOffice, PDF, RTF, plain-text, and other file types that it encounters in your collection. When I say “analyze”, I mean reduce what it finds to a set of attributes and values. For example, if your repository contains documents on treaties, you can tell it to find treaty names, subjects, parties, execution dates, and ratification dates, and Solr will try to do that, converting a heterogeneous collection of documents to a body of structured data. With those data you can, in principle, answer users’ questions about particular treaties or treaty statistics. How good a job does it do, though? That’s one of the questions I wanted to answer, but first I needed to figure out how to use it, so I could develop a site offering search with Solr. Once it worked at all, I would then evaluate it (getting some judgments from test users) and try to make it work better.
Studying Solr was frustrating. Although it is extensively documented, there are gaps that make it difficult to figure some basic things out. For example, if you want to use Solr to give users the contexts in which the terms they search for appear in files, how do you do that? I expected that use case to be covered prominently, but it wasn’t.
By week’s end, I was making progress, but had nothing to demonstrate yet. I was tempted to give up, but reminded myself that this difficulty would make competence in Solr even more valuable in the market than if it were easy to learn. When debugging problems became hard to tolerate, I got an hour of help from one of the phase-4 professional developers. This showed me that there is no magic bullet. The pro did detective work as I was doing it, only faster and with more tricks in his inventory. Solr, ESLint, and similar mature projects are far too complex to be fully understood. Debugging requires not just reading the documentation. It also requires poking at the tool and seeing how it behaves, until one finds a solution, despite the fact that one doesn’t understand why the solution is a solution. Sometimes it is wise to settle for that much.