[{"data":1,"prerenderedAt":1430},["ShallowReactive",2],{"blog-list-en":3},[4],{"id":5,"title":6,"body":7,"config":1414,"date":1415,"description":1416,"draft":1417,"extension":1418,"image":1414,"meta":1419,"navigation":1420,"path":1421,"seo":1422,"stem":1423,"tags":1424,"toolbar":1414,"translationKey":1428,"updated":1415,"__hash__":1429},"blog/en/blog/zero-hallucination-qa.md","How I Built Zero-Hallucination Q&A in Our Reader",{"type":8,"value":9,"toc":1378},"minimark",[10,18,33,36,41,48,53,58,76,81,94,99,133,136,140,158,165,169,184,189,226,233,237,250,275,280,397,415,421,423,427,434,449,456,476,482,484,488,491,497,499,503,526,536,597,600,611,621,628,630,634,645,651,658,662,669,677,684,688,698,741,752,758,760,764,778,786,792,795,830,840,848,854,861,865,874,880,891,893,897,903,907,914,918,941,948,950,954,960,1018,1024,1026,1030,1037,1056,1060,1080,1091,1093,1097,1108,1111,1134,1145,1151,1153,1157,1172,1183,1185,1189,1210,1221,1223,1227,1246,1252,1254,1258,1345,1352,1363],[11,12,13],"p",{},[14,15],"img",{"alt":16,"src":17},"Cover: Zero-hallucination Q&A","https://cdn.linghuxiong.com/resources/snapshots/ai-chat-cover.png",[19,20,21],"blockquote",{},[11,22,23,24,28,29,32],{},"This post shares how we implemented ",[25,26,27],"strong",{},"zero-hallucination Q&A"," in our AI reader: answers are strictly grounded in the text of the book you have open, and key claims can be ",[25,30,31],{},"traced in one click"," to the exact passage. If you are building AI reading, document Q&A, or RAG-style apps, we hope three iterations of lessons and the final architecture are useful.",[34,35],"hr",{},[37,38,40],"h2",{"id":39},"i-evolution-in-three-stages","I. Evolution in three stages",[11,42,43,44,47],{},"Zero-hallucination Q&A was not designed perfectly on day one. It evolved under tension between ",[25,45,46],{},"cost, latency, and accuracy",". Below is a chronological view of three stages—useful context for why the current architecture looks the way it does.",[49,50],"mermaid",{":config":51,"code":52},"config","flowchart%20LR%0A%20%20%20%20P1%5BStage%201%3A%20Full-text%20dump%5D%20--%3E%20P2%5BStage%202%3A%20LLM%20key-sentence%20extract%5D%0A%20%20%20%20P2%20--%3E%20P3%5BStage%203%3A%20Segment%20index%20%2B%20Tool%20retrieval%5D%0A%20%20%20%20P1%20-.-%3E%7CSlow%2C%20costly%2C%20inaccurate%20on%20long%20books%7C%20X1%5BRetired%5D%0A%20%20%20%20P2%20-.-%3E%7CLost%20detail%2C%20still%20slow%7C%20X2%5BRetired%5D%0A%20%20%20%20P3%20--%3E%7CCurrent%7C%20OK%5BZero%20hallucination%20%2B%20traceable%5D",[54,55,57],"h3",{"id":56},"stage-1-dump-the-full-book-into-context-simplestand-first-to-break","Stage 1: Dump the full book into context (simplest—and first to break)",[11,59,60,63,64,67,68,71,72,75],{},[25,61,62],{},"Approach:"," When a user opens a book and asks a question, put ",[25,65,66],{},"all extracted body text"," into the system prompt or user message and let the chat model answer. If the book exceeds about ",[25,69,70],{},"400k characters",", ",[25,73,74],{},"hard-truncate","—only the beginning is kept; later chapters are invisible to the model.",[11,77,78],{},[25,79,80],{},"Pros:",[82,83,84,88,91],"ul",{},[85,86,87],"li",{},"Very low implementation cost; almost no preprocessing;",[85,89,90],{},"Works reasonably on short books and simple documents—the model really “saw the whole book”;",[85,92,93],{},"Simple UX: ask and get an answer, no “please wait while we analyze” state.",[11,95,96],{},[25,97,98],{},"Cons (quickly unacceptable):",[82,100,101,107,113,123],{},[85,102,103,106],{},[25,104,105],{},"Slow responses:"," Every question resends a huge payload; time-to-first-token and total latency grow with book length;",[85,108,109,112],{},[25,110,111],{},"High token cost:"," You pay for the full book input on every question;",[85,114,115,118,119,122],{},[25,116,117],{},"Long books distort badly:"," After 400k characters, the second half, appendices, and conclusions may as well not exist—and the UI often ",[25,120,121],{},"does not clearly say"," truncation happened;",[85,124,125,128,129,132],{},[25,126,127],{},"Zero retrieval granularity:"," The model must “find a needle in a haystack” across hundreds of thousands of characters—easy to miss details and easier to produce ",[25,130,131],{},"plausible-sounding summaries with no basis","—exactly what reading apps must avoid.",[11,134,135],{},"Stage 1 is fine for an MVP, not for a product-grade solution.",[54,137,139],{"id":138},"stage-2-use-a-lighter-llm-to-extract-key-sentences-compress-contextbut-too-aggressively","Stage 2: Use a lighter LLM to extract key sentences (compress context—but too aggressively)",[11,141,142,144,145,148,149,152,153,157],{},[25,143,62],{}," Before Q&A (or on first open), run a ",[25,146,147],{},"cheaper model"," over the body: split by spine chapter (or chunk the whole book), extract ",[25,150,151],{},"key sentences",", keep position tags like ",[154,155,156],"code",{},"[fFile-start-end]",", then concatenate excerpts into a shorter context for later Q&A.",[11,159,160,161,164],{},"Typical pipeline: ",[25,162,163],{},"Extract → Cache → Chat",". Extract once (offline or on demand), store a “key sentence bundle,” reuse it for every question—same idea as many document-QA prototypes that compress first, then answer.",[11,166,167],{},[25,168,80],{},[82,170,171,178,181],{},[85,172,173,174,177],{},"Each question sends ",[25,175,176],{},"much less text","; per-request token use drops vs. stage 1;",[85,179,180],{},"Preprocessing can be cached; no re-extract per question on the same book;",[85,182,183],{},"Position tags lay groundwork for citations.",[11,185,186],{},[25,187,188],{},"Cons (still fails on long books):",[82,190,191,197,207,216],{},[85,192,193,196],{},[25,194,195],{},"Heavy detail loss:"," “Key sentences” are model-selected; qualifiers, counterexamples, and argument chains are often dropped—answers become “correct but one-sided”;",[85,198,199,202,203,206],{},[25,200,201],{},"Context still large on long books:"," Even key-sentence bundles for big works are sizable—latency and cost are ",[25,204,205],{},"eased, not solved",";",[85,208,209,212,213,206],{},[25,210,211],{},"Double LLM error:"," Extraction may miss; Q&A may misread excerpts—errors ",[25,214,215],{},"stack",[85,217,218,221,222,225],{},[25,219,220],{},"Static context:"," Whether the user asks about one chapter or whole-book structure, the model always gets the ",[25,223,224],{},"same pre-extracted blob","—no dynamic narrowing by question.",[11,227,228,229,232],{},"Lesson: the issue is not “whether we compress,” but ",[25,230,231],{},"whether compression is on-demand and whether we can return to source text",".",[54,234,236],{"id":235},"stage-3-segment-index-tool-retrieval-on-demand-source-text-back-current","Stage 3: Segment index + Tool retrieval on demand + source text back (current)",[11,238,239,241,242,249],{},[25,240,62],{}," Inspired by ",[243,244,248],"a",{"href":245,"rel":246},"https://github.com/VectifyAI/PageIndex",[247],"nofollow","PageIndex",". Vs. stage 2, three core shifts:",[251,252,253,259,269],"ol",{},[85,254,255,258],{},[25,256,257],{},"Preprocessing produces a structured index"," (TOC-level summaries + exact character spans), not excerpts used directly as Q&A context;",[85,260,261,264,265,268],{},[25,262,263],{},"Each question uses Tool Calling to retrieve on demand",", then ",[25,266,267],{},"pulls source text with position tags"," to answer;",[85,270,271,274],{},[25,272,273],{},"System prompt + frontend"," enforce citation format and support click-to-jump highlights in the reader.",[11,276,277],{},[25,278,279],{},"Three-stage comparison:",[281,282,283,302],"table",{},[284,285,286],"thead",{},[287,288,289,293,296,299],"tr",{},[290,291,292],"th",{},"Dimension",[290,294,295],{},"Stage 1 (full dump)",[290,297,298],{},"Stage 2 (key sentences)",[290,300,301],{},"Stage 3 (current)",[303,304,305,324,338,352,366,383],"tbody",{},[287,306,307,311,314,317],{},[308,309,310],"td",{},"Context per question",[308,312,313],{},"Whole book (or truncated front half)",[308,315,316],{},"Pre-extracted key sentences",[308,318,319,320,323],{},"Only ",[25,321,322],{},"source"," snippets relevant to the question",[287,325,326,329,332,335],{},[308,327,328],{},"Long-book accuracy",[308,330,331],{},"Collapses past ~400k chars",[308,333,334],{},"Depends on extraction; loses detail",[308,336,337],{},"Retrieve by TOC/span; no hard full-book truncate",[287,339,340,343,346,349],{},[308,341,342],{},"Response speed",[308,344,345],{},"Slow",[308,347,348],{},"Somewhat better; long books still slow",[308,350,351],{},"Retrieve + short context—noticeably faster",[287,353,354,357,360,363],{},[308,355,356],{},"Token cost",[308,358,359],{},"Very high",[308,361,362],{},"Medium-high",[308,364,365],{},"Amortized preprocess + pay per need",[287,367,368,371,374,377],{},[308,369,370],{},"Traceability",[308,372,373],{},"Weak (hard to cite)",[308,375,376],{},"Tags exist but content is secondarily filtered",[308,378,379,380],{},"Footnotes map to ",[25,381,382],{},"real source spans",[287,384,385,388,391,394],{},[308,386,387],{},"Engineering complexity",[308,389,390],{},"Low",[308,392,393],{},"Medium",[308,395,396],{},"High",[11,398,399,402,403,406,407,410,411,414],{},[25,400,401],{},"Why we stopped at stage 3:"," For reading, zero hallucination is not “show the model as much text as possible,” but ",[25,404,405],{},"“before answering, fetch source evidence for the question.”"," Stages 1–2 fought ",[25,408,409],{},"context size","; stage 3 splits the pipeline into ",[25,412,413],{},"index (preprocess) → retrieve (Tool) → evidence (source) → answer (constrained generation)","—balancing accuracy, cost, and traceability.",[11,416,417,418,232],{},"Below we detail ",[25,419,420],{},"stage 3",[34,422],{},[37,424,426],{"id":425},"ii-problem-statement-in-book-qa-hallucination-hurts-more-than-in-generic-chat","II. Problem statement: In book Q&A, hallucination hurts more than in generic chat",[11,428,429,430,433],{},"Users forgive occasional errors in a general chatbot. In ",[25,431,432],{},"book Q&A",", the cost is higher:",[82,435,436,443,446],{},[85,437,438,439,442],{},"Users ask what ",[25,440,441],{},"this book"," says—not what lives in the model’s parametric memory;",[85,444,445],{},"A plausible-sounding “view from the book” can mislead notes, citations, and reshares;",[85,447,448],{},"Without sources, users cannot verify—trust is hard to build.",[11,450,451,452,455],{},"So “zero hallucination” becomes three ",[25,453,454],{},"enforceable"," rules:",[251,457,458,464,470],{},[85,459,460,463],{},[25,461,462],{},"Book questions must query the book first:"," Anything plausibly about the open book must go through retrieval (Tool) before an answer;",[85,465,466,469],{},[25,467,468],{},"Answers must be traceable:"," Key claims carry position tags the UI can parse and jump to;",[85,471,472,475],{},[25,473,474],{},"Say when you cannot find it:"," If the book does not contain it, say so—do not dress up general knowledge as “what the book says.”",[11,477,478,479,481],{},"The rest follows ",[25,480,420],{}," data flow and how these rules are implemented.",[34,483],{},[37,485,487],{"id":486},"iii-architecture-preprocess-tool-retrieval-constrained-generation-clickable-citations","III. Architecture: Preprocess → Tool retrieval → Constrained generation → Clickable citations",[49,489],{":config":51,"code":490},"flowchart%20TB%0A%20%20%20%20subgraph%20prep%20%5BOffline%20%2F%20first-time%20preprocess%5D%0A%20%20%20%20%20%20%20%20A%5BSplit%20book%20by%20TOC%20or%20length%5D%20--%3E%20B%5BLLM%20segment%20summaries%5D%0A%20%20%20%20%20%20%20%20B%20--%3E%20C%5BPersist%20Segment%20cache%20locally%5D%0A%20%20%20%20end%0A%0A%20%20%20%20subgraph%20ask%20%5BUser%20question%5D%0A%20%20%20%20%20%20%20%20D%5BUser%20input%5D%20--%3E%20E%7BSegment%20cache%20exists%3F%7D%0A%20%20%20%20%20%20%20%20E%20--%3E%7CNo%7C%20F%5BExtract%20full%20text%20%2F%20ask%20to%20preprocess%5D%0A%20%20%20%20%20%20%20%20F%20--%3E%20prep%0A%20%20%20%20%20%20%20%20E%20--%3E%7CYes%7C%20G%5BRegister%20Tool%20Calling%5D%0A%20%20%20%20end%0A%0A%20%20%20%20subgraph%20retrieve%20%5BTool%20retrieval%5D%0A%20%20%20%20%20%20%20%20G%20--%3E%20H%7BQuestion%20type%7D%0A%20%20%20%20%20%20%20%20H%20--%3E%7COverview%20%2F%20review%7C%20I%5Bget_full_book_segment_summaries%5D%0A%20%20%20%20%20%20%20%20H%20--%3E%7CFacts%20%2F%20people%20%2F%20chapter%7C%20J%5Bget_related_segment_summaries%5D%0A%20%20%20%20%20%20%20%20J%20--%3E%20K%5BLLM%20picks%20segment%20IDs%20from%20summary%20catalog%5D%0A%20%20%20%20%20%20%20%20K%20--%3E%20L%5BFetch%20source%20by%20span%20%2B%20position%20tags%5D%0A%20%20%20%20%20%20%20%20I%20--%3E%20M%5BConcatenate%20all%20segment%20summaries%5D%0A%20%20%20%20end%0A%0A%20%20%20%20subgraph%20answer%20%5BGenerate%20%26%20display%5D%0A%20%20%20%20%20%20%20%20L%20--%3E%20N%5BTool%20results%20back%20to%20model%5D%0A%20%20%20%20%20%20%20%20M%20--%3E%20N%0A%20%20%20%20%20%20%20%20N%20--%3E%20O%5BSystem%20prompt%20citation%20rules%5D%0A%20%20%20%20%20%20%20%20O%20--%3E%20P%5BStream%20answer%20%2B%20position%20footnotes%5D%0A%20%20%20%20%20%20%20%20P%20--%3E%20Q%5BRender%20clickable%20footnotes%5D%0A%20%20%20%20%20%20%20%20Q%20--%3E%20R%5BClick%20%E2%86%92%20preview%20%E2%86%92%20jump%20%26%20highlight%5D%0A%20%20%20%20end",[11,492,493,494],{},"Core idea: ",[25,495,496],{},"do not let the model “answer from memory”—make it “gather evidence, then answer, and mark sources.”",[34,498],{},[37,500,502],{"id":501},"iv-preprocessing-turn-the-whole-book-into-a-searchable-segment-index","IV. Preprocessing: Turn the whole book into a searchable segment index",[11,504,505,506,509,510,513,514,517,518,521,522,525],{},"If every question still used ",[25,507,508],{},"stage 1"," full-book context, long books blow token budgets and retrieval is too coarse. Stage 3: on first AI chat for a book, run a ",[25,511,512],{},"segment summary job"," in the background—split by ",[25,515,516],{},"TOC"," or ",[25,519,520],{},"text length"," into ",[154,523,524],{},"Segment","s, summarize each, persist in local IndexedDB.",[11,527,528,529,531,532,535],{},"Each ",[154,530,524],{}," holds summary plus ",[25,533,534],{},"physical position in the body",":",[281,537,538,548],{},[284,539,540],{},[287,541,542,545],{},[290,543,544],{},"Field",[290,546,547],{},"Meaning",[303,549,550,564,577,587],{},[287,551,552,561],{},[308,553,554,557,558],{},[154,555,556],{},"startFileIndex"," / ",[154,559,560],{},"endFileIndex",[308,562,563],{},"Spine file index (PDF: one file per page)",[287,565,566,574],{},[308,567,568,557,571],{},[154,569,570],{},"startOffset",[154,572,573],{},"endOffset",[308,575,576],{},"Character start/end",[287,578,579,584],{},[308,580,581],{},[154,582,583],{},"sequence",[308,585,586],{},"Linear reading order",[287,588,589,594],{},[308,590,591],{},[154,592,593],{},"title",[308,595,596],{},"TOC title",[11,598,599],{},"Splitting balances precision and cost: if a TOC node’s body is under ~20KB, summarize that node only; sibling nodes may merge into batches (15–20KB) before LLM calls; unstructured long blocks split in ~30–40k character ranges.",[11,601,602,603,606,607,610],{},"The summary system prompt requires ",[25,604,605],{},"keeping inline position tags"," (",[154,608,609],{},"[fNumber-Number-Number]",") so Tool-fetched source aligns with spine offsets. Core constraint:",[612,613,619],"pre",{"className":614,"code":616,"language":617,"meta":618},[615],"language-text","If summary content relates to a passage, keep the trailing position tag [fNumber-Number-Number] (e.g. [f1-90-109]).\nTags are atomic—do not alter, merge, or omit any character or digit.\n","text","",[154,620,616],{"__ignoreMap":618},[11,622,623,624,627],{},"After preprocessing, Q&A depends on a ",[25,625,626],{},"structured segment index",", not whole-book context—the engineering prerequisite for zero hallucination on long books.",[34,629],{},[37,631,633],{"id":632},"v-position-tag-system-encode-where-into-text","V. Position tag system: Encode “where” into text",[11,635,636,637,640,641,644],{},"Zero hallucination requires content from source ",[25,638,639],{},"and"," machine-parseable, UI-jumpable ",[25,642,643],{},"provenance",". We use inline tags:",[612,646,649],{"className":647,"code":648,"language":617},[615],"[f{fileIndex}-{startChar}-{endChar}]\n",[154,650,648],{"__ignoreMap":618},[11,652,653,654,657],{},"Example: ",[154,655,656],{},"[f5-123-165]"," = spine file 5 (0-based), characters 123–165.",[54,659,661],{"id":660},"_51-how-tags-are-written-into-body-text","5.1 How tags are written into body text",[11,663,664,665,668],{},"The extraction layer appends ",[154,666,667],{},"[f{fileIndex}-{start}-{end}]"," at segment ends:",[612,670,675],{"className":671,"code":673,"language":674,"meta":618},[672],"language-typescript","const position = `[f${fileIndex}-${absOffset}-${absOffset + segment.length}]`;\nfileLines.push(segment.text.trim() + position);\n","typescript",[154,676,673],{"__ignoreMap":618},[11,678,679,680,683],{},"Whether preprocessing summaries or Tool excerpts, positions align with ",[25,681,682],{},"spine character offsets","—not model-guessed page numbers.",[54,685,687],{"id":686},"_52-constraints-on-model-output","5.2 Constraints on model output",[11,689,690,691,697],{},"The system prompt includes ",[25,692,693],{},[694,695,696],"span",{},"Position Citation Rules","—five core points:",[251,699,700,710,720,726,735],{},[85,701,702,705,706,709],{},[25,703,704],{},"Standard format:"," Must use ",[154,707,708],{},"[f_fileIndex-startChar-endChar]","; all three numeric parts required;",[85,711,712,715,716,719],{},[25,713,714],{},"Copy only from current sources:"," Footnotes must be ",[25,717,718],{},"verbatim"," from this turn’s system/user messages or Tool returns;",[85,721,722,725],{},[25,723,724],{},"No fabrication:"," Do not compute, edit, or invent positions;",[85,727,728,731,732,206],{},[25,729,730],{},"Prefer omission:"," If no valid tag exists in context, answer normally—",[25,733,734],{},"output no position tags",[85,736,737,740],{},[25,738,739],{},"Inline with claims:"," Tags follow the relevant sentence; no citation dumps at the end.",[11,742,743,744,747,748,751],{},"The UI also filters occasional ",[25,745,746],{},"two-part"," invalid tags (e.g. ",[154,749,750],{},"[f1-293]",") before render.",[11,753,754],{},[14,755],{"alt":756,"src":757},"Citation trace popup","https://cdn.linghuxiong.com/resources/snapshots/ai-chat.png",[34,759],{},[37,761,763],{"id":762},"vi-tool-calling-retrieve-first-answer-second","VI. Tool Calling: Retrieve first, answer second",[11,765,766,767,770,771,774,775,232],{},"When chat is bound to a book (",[154,768,769],{},"resourceId"," present, ",[154,772,773],{},"chatType === 'chat'","), we register two Tools with executors before each generation—standard OpenAI-style ",[25,776,777],{},"function calling loop",[54,779,781,782,785],{"id":780},"_61-get_related_segment_summaries-targeted-segment-lookup","6.1 ",[154,783,784],{},"get_related_segment_summaries"," — Targeted segment lookup",[11,787,788,789,232],{},"For: concepts, characters, plot, chapter details—",[25,790,791],{},"clear retrieval intent",[11,793,794],{},"Flow:",[251,796,797,804,810,813,823],{},[85,798,799,800,803],{},"Model rewrites user wording into ",[25,801,802],{},"terms likely to appear in the book"," (“Optimize Search Queries” in system prompt);",[85,805,806,807,206],{},"Call Tool with ",[154,808,809],{},"question",[85,811,812],{},"Batch all segment summaries by token budget (~30k tokens per batch, max 5 batches);",[85,814,815,816,819,820,206],{},"Each batch: separate LLM request picks relevant segment IDs (max 5) from ",[154,817,818],{},"{ id, title, summary }",", JSON like ",[154,821,822],{},"{\"Thinking\":\"...\",\"answer\":[\"1\",\"3\"]}",[85,824,825,826,829],{},"For selected segments, pull ",[25,827,828],{},"tagged source text"," from spine—not summaries—as Tool result.",[11,831,832,835,836,839],{},[25,833,834],{},"Key design: Tool returns source, not summaries."," The model answers from real paragraphs with inline ",[154,837,838],{},"[f…]",", avoiding “summary → re-summary” drift.",[54,841,843,844,847],{"id":842},"_62-get_full_book_segment_summaries-whole-book-overview","6.2 ",[154,845,846],{},"get_full_book_segment_summaries"," — Whole-book overview",[11,849,850,851,232],{},"For: “summarize the book,” “review this book,” “overall structure/themes”—",[25,852,853],{},"global view",[11,855,856,857,860],{},"Concatenate all segment ",[154,858,859],{},"summary"," fields in reading order—avoid missing key chapters via per-chunk relevance only.",[54,862,864],{"id":863},"_63-system-prompt-book-first-tools-first","6.3 System prompt: Book first, tools first",[11,866,867,868,873],{},"With a bound book, ",[25,869,870],{},[694,871,872],{},"Core Principles for Reading Assistant"," applies:",[612,875,878],{"className":876,"code":877,"language":617},[615],"1. Book First, Tool First\n   - Any question possibly about the book must call tools first;\n   - Answers must rely mainly on retrieval—never invent “book content” without retrieval.\n\n2. General Knowledge as Fallback Only\n   - Only for: casual chat / user explicitly skips the book / tools return nothing;\n   - If the book lacks it, say “not mentioned in this book” before general knowledge.\n\n3. Direct Style\n   - Get to the point—avoid “based on the provided materials…” and similar filler.\n",[154,879,877],{"__ignoreMap":618},[11,881,882,883,886,887,890],{},"Generation runs the tool loop: ",[154,884,885],{},"tool_calls"," → execute → append ",[154,888,889],{},"role: tool"," → continue until final text. With tools enabled, thinking channel is off to avoid protocol conflicts.",[34,892],{},[37,894,896],{"id":895},"vii-frontend-traceability-from-footnote-to-highlight","VII. Frontend traceability: From footnote to highlight",[11,898,899,900,902],{},"Model output ",[154,901,656],{}," is not shown raw; render layer turns it into clickable citations.",[54,904,906],{"id":905},"_71-footnote-rendering","7.1 Footnote rendering",[11,908,909,910,913],{},"Normalize tags to Markdown links like ",[154,911,912],{},"[1]([f5-123-165])",", render as numbered footnotes; dedupe same position to avoid UI clutter.",[54,915,917],{"id":916},"_72-click-interaction","7.2 Click interaction",[251,919,920,929,935],{},[85,921,922,925,926,928],{},[25,923,924],{},"First click:"," Parse ",[154,927,838],{}," → fileIndex + offsets → extract spine text → preview (optional TOC title);",[85,930,931,934],{},[25,932,933],{},"Same footnote again:"," Close preview;",[85,936,937,940],{},[25,938,939],{},"Confirm jump:"," Open reader view, highlight character range.",[11,942,943,944,947],{},"From copied model tag to user-visible source, the chain ",[25,945,946],{},"never passes through another LLM call","—deterministic and reproducible.",[34,949],{},[37,951,953],{"id":952},"viii-edge-cases-and-honest-degradation","VIII. Edge cases and honest degradation",[11,955,956,957,535],{},"Zero hallucination ≠ “always has an answer”—it means ",[25,958,959],{},"no evidence, no fabrication",[281,961,962,972],{},[284,963,964],{},[287,965,966,969],{},[290,967,968],{},"Scenario",[290,970,971],{},"Behavior",[303,973,974,982,994,1002,1010],{},[287,975,976,979],{},[308,977,978],{},"Segment summaries not ready",[308,980,981],{},"Extract full text and summarize first",[287,983,984,987],{},[308,985,986],{},"Tool finds nothing",[308,988,989,990,993],{},"Return ",[154,991,992],{},"(No relevant segment excerpts found…)","; model should say not in book",[287,995,996,999],{},[308,997,998],{},"Invalid two-part tags from model",[308,1000,1001],{},"Frontend filters; no broken footnotes",[287,1003,1004,1007],{},[308,1005,1006],{},"Casual chat",[308,1008,1009],{},"System prompt allows general knowledge off-book",[287,1011,1012,1015],{},[308,1013,1014],{},"Export chat",[308,1016,1017],{},"Footnotes can become reader deep links for sharing/archiving",[11,1019,1020],{},[14,1021],{"alt":1022,"src":1023},"Chat export","https://cdn.linghuxiong.com/resources/snapshots/ai-chat-export.png",[34,1025],{},[37,1027,1029],{"id":1028},"ix-design-trade-off-why-not-vector-rag","IX. Design trade-off: Why not “vector RAG”?",[11,1031,1032,1033,1036],{},"Peers building document Q&A often ask: if you do retrieval-augmented generation, why not ",[25,1034,1035],{},"Embedding + vector DB Top-K","?",[11,1038,1039,1040,1043,1044,1047,1048,1051,1052,1055],{},"We ",[25,1041,1042],{},"are doing RAG","—retrieve before generate. The difference: “RAG” in community speech often implies ",[25,1045,1046],{},"vector similarity","; our stage 3 is ",[25,1049,1050],{},"segment index + Tool on-demand source pull","—",[25,1053,1054],{},"no vector layer by design",". Below: architectural reasons, not denying vector RAG’s value.",[54,1057,1059],{"id":1058},"scope-not-no-retrieval-but-no-vector-retrieval","Scope: not “no retrieval,” but “no vector retrieval”",[82,1061,1062,1071],{},[85,1063,1064,1067,1068,206],{},[25,1065,1066],{},"Broad RAG:"," retrieve → generate → ",[25,1069,1070],{},"we do this",[85,1072,1073,1076,1077,232],{},[25,1074,1075],{},"Vector RAG:"," recall via embedding similarity → ",[25,1078,1079],{},"not in this version",[11,1081,1082,1083,1086,1087,1090],{},"Preprocessing builds a ",[25,1084,1085],{},"segment summary index","; the model picks segments via Tools and gets ",[25,1088,1089],{},"source text",". Retrieval exists without a separate embedding model and vector index upkeep.",[34,1092],{},[54,1094,1096],{"id":1095},"reason-1-custom-llm-providerskeep-the-integration-surface-small","Reason 1: Custom LLM providers—keep the integration surface small",[11,1098,1099,1100,1103,1104,1107],{},"Users can plug ",[25,1101,1102],{},"their own API keys",", custom base URLs, or ",[25,1105,1106],{},"local Ollama","—chat model is their choice; cost and data path stay under control.",[11,1109,1110],{},"Typical vector RAG widens integration:",[82,1112,1113,1124,1127],{},[85,1114,1115,1116,1119,1120,1123],{},"Besides ",[25,1117,1118],{},"chat model",", you usually need an ",[25,1121,1122],{},"embedding model"," (another name, sometimes another endpoint);",[85,1125,1126],{},"Local Ollama needs a separate embedding model plus dimension/API compatibility;",[85,1128,1129,1130,1133],{},"More failure modes: chat works but ",[25,1131,1132],{},"empty retrieval","—embedding, index, or dimension mismatch; harder to debug than one provider end-to-end.",[11,1135,1136,1137,1140,1141,1144],{},"Here, ",[25,1138,1139],{},"segment picking and answering share one provider config","—no “chat on A, index on B.” For ",[25,1142,1143],{},"pluggable LLM"," apps, that often beats a few points of recall.",[11,1146,1147],{},[14,1148],{"alt":1149,"src":1150},"Custom AI providers","https://cdn.linghuxiong.com/resources/snapshots/ai-customize-providers.png",[34,1152],{},[54,1154,1156],{"id":1155},"reason-2-embeddings-bind-to-the-indexprovider-switches-are-expensive","Reason 2: Embeddings bind to the index—provider switches are expensive",[11,1158,1159,1160,1163,1164,1167,1168,1171],{},"In vector RAG, ",[25,1161,1162],{},"vectors are not a universal intermediate format","—they are coordinates under one embedding model. Index with A, query with B: similarity is usually ",[25,1165,1166],{},"not comparable","—often ",[25,1169,1170],{},"full re-embedding",", and dimensions (768 / 1024 / 1536 …) lock storage schema.",[11,1173,1174,1175,1178,1179,1182],{},"Stage 3 persists ",[25,1176,1177],{},"structured summaries + character spans",", not vectors; switching chat models ",[25,1180,1181],{},"does not rebuild the index","; evidence chain (source positions) stays the same—aligned with “try different LLMs anytime.”",[34,1184],{},[54,1186,1188],{"id":1187},"reason-3-structured-routing-is-often-enough-for-toc-heavy-long-docs","Reason 3: Structured routing is often enough for TOC-heavy long docs",[11,1190,1191,1192,1195,1196,1199,1200,1203,1204,1209],{},"E-books and PDFs usually have ",[25,1193,1194],{},"chapter structure","; preprocessing yields ",[25,1197,1198],{},"segment titles + summaries",". For “what does chapter X say” or “how does the book define Y,” pick segments from the catalog then ",[25,1201,1202],{},"pull source"," works well in practice; Tool returns ",[25,1205,1206,1207],{},"source with ",[154,1208,838],{},", so zero hallucination stays anchored on character spans.",[11,1211,1212,1213,1216,1217,1220],{},"Vectors help fuzzy semantics, cross-language, long-span literal mismatch; for ",[25,1214,1215],{},"TOC + preprocess + strong traceability"," readers, investing in ",[25,1218,1219],{},"Tool + source return + citation rules"," often has better ROI.",[34,1222],{},[54,1224,1226],{"id":1225},"future-hybrid-recall-not-a-rewrite","Future: Hybrid recall, not a rewrite",[11,1228,1229,1230,1233,1234,1237,1238,1241,1242,1245],{},"We may add ",[25,1231,1232],{},"vector coarse recall"," (embedding only for Top-N chapter candidates), still ending in ",[25,1235,1236],{},"pick segment → source → clickable trace","—zero-hallucination rules unchanged. If added: embedding ",[25,1239,1240],{},"optional",", explicit ",[25,1243,1244],{},"re-index"," prompts when models change—avoid silent wrong retrieval.",[11,1247,1248,1249],{},"Until then: ",[25,1250,1251],{},"any OpenAI-compatible chat API works; changing chat model does not rebuild local index.",[34,1253],{},[37,1255,1257],{"id":1256},"x-summary","X. Summary",[281,1259,1260,1273],{},[284,1261,1262],{},[287,1263,1264,1267,1270],{},[290,1265,1266],{},"Step",[290,1268,1269],{},"Method",[290,1271,1272],{},"Role",[303,1274,1275,1286,1299,1312,1323,1334],{},[287,1276,1277,1280,1283],{},[308,1278,1279],{},"Preprocess",[308,1281,1282],{},"Split by TOC/length + segment summary cache",[308,1284,1285],{},"Long books searchable & locatable",[287,1287,1288,1291,1296],{},[308,1289,1290],{},"Position tags",[308,1292,1293,1295],{},[154,1294,156],{}," in source",[308,1297,1298],{},"Machine-parseable provenance",[287,1300,1301,1304,1309],{},[308,1302,1303],{},"Tool retrieval",[308,1305,1306,1307],{},"Per-question segments / full-book summaries, return ",[25,1308,322],{},[308,1310,1311],{},"Force evidence before answer",[287,1313,1314,1317,1320],{},[308,1315,1316],{},"System prompt",[308,1318,1319],{},"Book first, no fake tags, say when missing",[308,1321,1322],{},"Constrain generation",[287,1324,1325,1328,1331],{},[308,1326,1327],{},"Frontend",[308,1329,1330],{},"Footnote → preview → jump & highlight",[308,1332,1333],{},"User verifies evidence",[287,1335,1336,1339,1342],{},[308,1337,1338],{},"No vector retrieval",[308,1340,1341],{},"Single provider; swap chat model without re-index",[308,1343,1344],{},"Lower integration & migration cost",[11,1346,1347,1348,1351],{},"“Zero hallucination” does not mean the model never errs—it means ",[25,1349,1350],{},"engineering locks output to an evidence chain",": no retrieval → do not pose as book content; with retrieval → give verifiable source positions.",[11,1353,1354,1355,1358,1359,1362],{},"If you build AI reading or document Q&A, we hope the path ",[25,1356,1357],{},"full dump → key sentences → Tool-first on-demand retrieval",", plus ",[25,1360,1361],{},"inline position tags + source return",", is a useful reference implementation.",[19,1364,1365],{},[11,1366,1367,1368,1373,1374,232],{},"These are lessons from building ",[243,1369,1372],{"href":1370,"rel":1371},"https://reader.linghuxiong.com",[247],"Foxycape"," AI reader—for reference only. Try the reader on the ",[243,1375,1377],{"href":1376},"/en#download","download page",{"title":618,"searchDepth":1379,"depth":1379,"links":1380},2,[1381,1387,1388,1389,1390,1394,1401,1405,1406,1413],{"id":39,"depth":1379,"text":40,"children":1382},[1383,1385,1386],{"id":56,"depth":1384,"text":57},3,{"id":138,"depth":1384,"text":139},{"id":235,"depth":1384,"text":236},{"id":425,"depth":1379,"text":426},{"id":486,"depth":1379,"text":487},{"id":501,"depth":1379,"text":502},{"id":632,"depth":1379,"text":633,"children":1391},[1392,1393],{"id":660,"depth":1384,"text":661},{"id":686,"depth":1384,"text":687},{"id":762,"depth":1379,"text":763,"children":1395},[1396,1398,1400],{"id":780,"depth":1384,"text":1397},"6.1 get_related_segment_summaries — Targeted segment lookup",{"id":842,"depth":1384,"text":1399},"6.2 get_full_book_segment_summaries — Whole-book overview",{"id":863,"depth":1384,"text":864},{"id":895,"depth":1379,"text":896,"children":1402},[1403,1404],{"id":905,"depth":1384,"text":906},{"id":916,"depth":1384,"text":917},{"id":952,"depth":1379,"text":953},{"id":1028,"depth":1379,"text":1029,"children":1407},[1408,1409,1410,1411,1412],{"id":1058,"depth":1384,"text":1059},{"id":1095,"depth":1384,"text":1096},{"id":1155,"depth":1384,"text":1156},{"id":1187,"depth":1384,"text":1188},{"id":1225,"depth":1384,"text":1226},{"id":1256,"depth":1379,"text":1257},null,"2026-06-03","Engineering notes on zero-hallucination Q&A in an AI reader—answers grounded in the current book, with one-click citations back to exact passages.",false,"md",{},true,"/en/blog/zero-hallucination-qa",{"title":6,"description":1416},"en/blog/zero-hallucination-qa",[1425,1426,1427],"reader","AI","engineering","zero-hallucination-qa","uvw654rlcM4E60wP4tYzIynEFr2kFaRn6sdNW-J_HhI",1780489852806]