Upgrade to Pro — share decks privately, control downloads, hide ads and more …

annoy4s

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Pishen Tsai Pishen Tsai
July 05, 2016
95

 annoy4s

Avatar for Pishen Tsai

Pishen Tsai

July 05, 2016
Tweet

Transcript

  1. Pros • Fully supports all the functionality of Annoy (indexing/querying,

    Euclidean/Angular). • Didn't rewrite the code, utilized the optimized C++ code provided by Annoy. • Easy parallelized by Scala. queries.par.map(q => annoy.query(q)) • JVM with C++ native code is fast and type-safe. • Annoy itself is fast (10x faster than lsh4s).
  2. Cons • Platform dependent. Need to compile the C++ code

    (for now) if you're not using linux-x86-64. > compileNative > publish • May not be as simple as lsh4s when broadcasting the index onto each worker in Spark. • My C++ skill is poor, as well as my JN* knowledge.
  3. JNA in one page libraryDependencies += "net.java.dev.jna" % "jna" %

    "4.2.2" src/main/cpp/annoyjava.cpp src/main/scala/annoy4s/AnnoyLibrary.scala src/main/resources/linux-x86-64/libannoy.so functions mapping compile call
  4. AnnoyIndexInterface<int, float> *createEuclidean(int f) { return new AnnoyIndex<int, float, Euclidean,

    Kiss64Random>(f); } val annoy: Pointer = lib.createEuclidean(64) memory address space JVM -Xmx2G AnnoyIndex annoy
  5. void getNnsByItem(AnnoyIndexInterface<int, float> *ptr, int item, int n, int search_k,

    int *result, float *distances){ vector<int> resultV; vector<float> distancesV; ptr->get_nns_by_item(item, n, search_k, &resultV, &distancesV); std::copy(resultV.begin(), resultV.end(), result); std::copy(distancesV.begin(), distancesV.end(), distances); } val result = Array.fill(10)(-1) val distances = Array.fill(10)(-1.0f) lib.getNnsByItem(annoy, item, 10, -1, result, distances) under GC's control free automatically