Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an ML powered Android Livestreaming Ap...

Building an ML powered Android Livestreaming App by Etienne Caron

This presentation explores the application of computer vision and machine learning models for real-time video and audio processing. We’ll demonstrate how this technology can enable the creation of a whole new category of live-streaming applications.

Developing a video conferencing application has historically been fairly complex. We will start with a brief overview of the Livekit open-source APIs, showcasing how to build a simple and intuitive video streaming Android application.

Next, we will explore integrating various ML-powered agents into the experience. We will also illustrate how to use reactive programming techniques to create easily understandable and modifiable multi-stage processing pipelines.

Looking forward to exploring this exciting and innovative topic with you!

https://youtu.be/aUQg_5HEI2M

DevFest Montreal

GDG Montreal

November 14, 2024
Tweet

More Decks by GDG Montreal

Other Decks in Programming

Transcript

  1. PRODUCE DETECTION EXPLAINED A high level explanation of past Kanastruk

    customer work. AUTHOR: @kanawish DATE: 24/04/20 Image Classi fi er BackgroundSubstractor 15fps (...) (f3) (f2) (f1)
  2. 💡 ❤ 🛠 Put yourself in the user's shoes Imagine

    the future Make it real Design Thinking Workshops
  3. ➜ ~ ➜ ~ ➜ ~ brew install livekit OK

    livekit-server one of key-file or keys must be provided livekit-server --dev INFO livekit starting in development mode INFO livekit no keys provided, using placeholder keys {"API Key": "devkey", "API Secret": "secret"} INFO livekit starting LiveKit server {"portHttp": 7880, "nodeID": "ND_SUoZgzemKouv", "nodeIP": "192.168.50.240", "version": "1.7.2", "bindAddresses": ["127.0.0.1", "::1"], "rtc.portTCP": 7881, "rtc.portUDP": {"Start":7882,"End":0}} Installing and spinning up a local LiveKit Server LiveKit Server
  4. LiveKit Server Client App Client App Client App Auth Services

    Official Blob ID Name: Face Listening to You Status: Not an Emoji yet Cuteness: Yes Auth Tokens
  5. LiveKit Server Client App Client App Client App Auth Services

    Official Blob ID Name: Face Listening to You Status: Not an Emoji yet Cuteness: Yes Room, track and participant services Auth Tokens
  6. ➜ livekit git clone [email protected]:livekit-examples/livestream.git Cloning into 'livestream'... remote: Enumerating

    objects: 379, done. remote: Counting objects: 100% (157/157), done. remote: Compressing objects: 100% (97/97), done. remote: Total 379 (delta 76), reused 93 (delta 51), pack-reused 222 (from 1) Receiving objects: 100% (379/379), 537.75 KiB | 3.11 MiB/s, done. Resolving deltas: 100% (154/154), done. ➜ livekit cd livestream ➜ livestream git:(main) npm install added 511 packages, and audited 512 packages in 20s ➜ livestream git:(main) ✗ cp .env.example .env.development ➜ livestream git:(main) ✗ vi .env.development Installing the Livestream REST / Web App Sample.
  7. ➜ livestream git:(main) npm run dev > [email protected] dev >

    next dev ▲ Next.js 14.0.1 - Local: http://localhost:3000 - Environments: .env.development /bin/sh: pnpm: command not found ✓ Ready in 2.7s ◦ Compiling /page ... ✓ Compiled /page in 2.8s (1025 modules) ✓ Compiled in 481ms (446 modules) ◦ Compiling /favicon.ico/route ... ✓ Compiled /favicon.ico/route in 535ms (1030 modules) Running the Livestream REST / Web App Sample.
  8. Start Screen Join Screen Stream Options Screen Participant Info Screen

    Participant List Screen Room Screen Invited Screen Start Screen Room Container Screen
  9. var userName by rememberSaveable(stateSaver = TextFieldValue.Saver) { mutableStateOf(TextFieldValue(preferencesManager.getUsername())) } var

    chatEnabled by rememberSaveable { mutableStateOf(true) } var viewerJoinRequestEnabled by rememberSaveable { mutableStateOf(true) } var cameraPosition by remember { mutableStateOf(CameraPosition.FRONT) }
  10. var response: CreateStreamResponse? = null try { response = livestreamApi.createStream(

    CreateStreamRequest( metadata = RoomMetadata( creatorIdentity = Participant.Identity(userName.text), enableChat = chatEnabled, allowParticipation = viewerJoinRequestEnabled, ) ) ).body() } catch (e: Exception) { Timber.e(e) { "error" } }
  11. if (response != null) { Timber.e { "response received: $response"

    } appModel.run { connected( authToken = response.authToken, connectionDetails = response.connectionDetails, isHost = true, initialCamPos = cameraPosition ) mainNav.mainNavigate(RoomContainerRoute) } } else { Timber.e { "response failed!" } }
  12. /** * Apis used for the Livestream example. */ interface

    LivestreamApi { @POST("/api/create_stream") suspend fun createStream( @Body body: CreateStreamRequest ): Response<CreateStreamResponse> @POST("/api/join_stream") suspend fun joinStream( @Body body: JoinStreamRequest ): Response<JoinStreamResponse> } Auth + Livestream Back-end & Front-end Official Blob ID Name: Face Listening to You Status: Not an Emoji yet Cuteness: Yes
  13. /** * Apis that require an Authentication: Token <token> header

    */ interface AuthenticatedLivestreamApi { @POST("/api/invite_to_stage") suspend fun inviteToStage( @Body body: IdentityRequest ): Response<Unit> @POST("/api/remove_from_stage") suspend fun removeFromStage( @Body body: IdentityRequest ): Response<Unit> @POST("/api/raise_hand") suspend fun requestToJoin(): Response<Unit> @POST("/api/stop_stream") suspend fun stopStream(): Response<Unit> } Auth + Livestream Back-end & Front-end Official Blob ID Name: Face Listening to You Status: Not an Emoji yet Cuteness: Yes
  14. Auth + Livestream Back-end & Front-end Official Blob ID Name:

    Face Listening to You Status: Not an Emoji yet Cuteness: Yes Client App Client App Web App http://localhost:3000 Auth and Livestream APIs LiveKit Server LiveKit APIs (Room, participants, tracks, etc)
  15. RoomScope( url = connectionDetails.wsUrl, token = connectionDetails.token, audio = rememberEnableMic(enableAudio),

    video = rememberEnableCamera(enableVideo), roomOptions = DefaultRoomOptions { options -> options.copy( videoTrackCaptureDefaults = LocalVideoTrackOptions( position = initialCamPos ) ) }, liveKitOverrides = DefaultLKOverrides(context), onConnected = { Timber.d("RoomScreenContainer -> onConnected") }, onDisconnected = { Toast.makeText(context, "Disconnected from livestream.", Toast.LENGTH_LONG).show() mainNav.mainPopBackstack(if (isHost) StartRoute else JoinRoute,false) }, onError = { _, error -> if (error is RoomException.ConnectException) { Toast.makeText( context, "Error while joining. Check the code and try again.", Toast.LENGTH_LONG ).show() mainNav.mainPopBackstack(if (isHost) StartRoute else JoinRoute,false) } } ) { room ->
  16. fun RoomNavHost( cameraPosition: MutableState<CameraPosition>, showOptionsDialogOnce: MutableState<Boolean>, roomNav: RoomNav = koinInject()

    ) { // ... NavHost( navController = roomNavHostController, startDestination = RoomRoute ) { composable<RoomRoute> { // Pass in 'view state' that belongs to container. RoomScreen( cameraPosition = cameraPosition, showOptionsDialogOnce = showOptionsDialogOnce ) } bottomSheet(StreamOptionsRoute.name) { StreamOptionsScreen() } bottomSheet(ParticipantListRoute.name) { ParticipantListScreen() } // FIXME - bottomSheet(ParticipantInfoRoute.name + "/{sid}") { val sid = it.arguments?.getString("sid") ParticipantInfoScreen(participantSid = sid) } bottomSheet(InvitedToStageRoute.name) { InvitedToStageScreen() } } }
  17. val tracks = rememberTracks(usePlaceholders = setOf(Track.Source.CAMERA)) val hostParticipant = rememberHostParticipant(roomMetadata.creatorIdentity)

    val hostTrack = tracks.firstOrNull { track -> track.participant == hostParticipant } // Get all the tracks for all the other participants. val stageParticipants = rememberOnStageParticipants(roomMetadata.creatorIdentity) val stageTracks = stageParticipants.map { p -> tracks.firstOrNull { track -> track.participant == p } } // Prioritize the host to the top. val videoTracks = listOf(hostTrack).plus(stageTracks) val metadatas = rememberParticipantMetadatas()
  18. ParticipantGrid( videoTracks = videoTracks, isHost = isHost, modifier = Modifier

    .constrainAs(hostScreen) { width = Dimension.matchParent height = Dimension.fillToConstraints top.linkTo(parent.top) bottom.linkTo(chatBar.top) } )
  19. VideoTrackView( room = RoomLocal.current, trackReference = trackReference, mirror = isHost,

    scaleType = ScaleType.Fill, modifier = Modifier .clip(RoundedCornerShape(8.dp)) .then(modifier) )
  20. Wut? In order of preference •ICE over UDP: ideal connection

    type, used in majority of conditions •TURN with UDP (3478): used when ICE/UDP is unreachable •ICE over TCP: used when network disallows UDP (i.e. over VPN or corporate fi rewalls) •TURN with TLS: used when fi rewall only allows outbound TLS connections Connectivity
  21. In order of preference •ICE over UDP: ideal connection type,

    used in majority of conditions •TURN with UDP (3478): used when ICE/UDP is unreachable •ICE over TCP: used when network disallows UDP (i.e. over VPN or corporate fi rewalls) •TURN with TLS: used when fi rewall only allows outbound TLS connections Connectivity
  22. async def draw_color_cycle(output_source: rtc.VideoSource, width, height): argb_frame = bytearray(width *

    height * 4) arr = np.frombuffer(argb_frame, dtype=np.uint8) framerate = 1 / 30 hue = 0.0 while True: start_time = asyncio.get_event_loop().time() rgb = colorsys.hsv_to_rgb(hue, 1.0, 1.0) rgb = [(x * 255) for x in rgb] # type: ignore argb_color = np.array(rgb + [255], dtype=np.uint8) arr.flat[::4] = argb_color[0] arr.flat[1::4] = argb_color[1] arr.flat[2::4] = argb_color[2] arr.flat[3::4] = argb_color[3] frame = rtc.VideoFrame(width, height, rtc.VideoBufferType.RGBA, argb_frame) output_source.capture_frame(frame) hue = (hue + framerate / 3) % 1.0 code_duration = asyncio.get_event_loop().time() - start_time await asyncio.sleep(1 / 30 - code_duration)
  23. async def draw_face_mask_to_video_loop( input_stream: rtc.VideoStream, output_source: rtc.VideoSource, show_window=True ): landmarker

    = FaceLandmarker.create_from_options(options) # cv2 commands are only for _local_ window/preview if show_window: cv2.namedWindow("livekit_video", cv2.WINDOW_NORMAL) cv2.startWindowThread() async for frame_event in input_stream: buffer: VideoFrame = frame_event.frame arr = np.frombuffer(buffer.data, dtype=np.uint8) arr = arr.reshape((buffer.height, buffer.width, 3)) mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=arr) detection_result = landmarker.detect_for_video( mp_image, frame_event.timestamp_us ) draw_landmarks_on_image(arr, detection_result) frame = rtc.VideoFrame(buffer.width, buffer.height, rtc.VideoBufferType.RGB24, buffer.data) output_source.capture_frame(frame) if show_window: arr = cv2.cvtColor(arr, cv2.COLOR_RGB2BGR) cv2.imshow("livekit_video", arr) if cv2.waitKey(1) & 0xFF == ord("q"): break landmarker.close() if show_window: cv2.destroyAllWindows()
  24. async def handle_frame_event(frame_event: VideoFrameEvent, output_source: rtc.VideoSource): buffer: VideoFrame = frame_event.frame

    arr = np.frombuffer(buffer.data, dtype=np.uint8) arr = arr.reshape((buffer.height, buffer.width, 3)) src_image = cv2.cvtColor(arr, cv2.COLOR_RGB2BGR) gray = cv2.cvtColor(src_image, cv2.COLOR_BGR2GRAY) cv2.imshow(windows[0], gray) blurred = cv2.GaussianBlur(gray, (7, 7), 0) cv2.imshow(windows[1], blurred) _, thresh = cv2.threshold(blurred, 120, 255, cv2.THRESH_BINARY_INV) cv2.imshow(windows[2], thresh) contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) dest_image = src_image for contour in contours: cv2.drawContours(dest_image, [contour], -1, (0, 255, 0), 2) cv2.imshow(windows[4], dest_image) frame = rtc.VideoFrame( buffer.width, buffer.height, rtc.VideoBufferType.RGB24, cv2.cvtColor(dest_image, cv2.COLOR_BGR2RGB).data ) output_source.capture_frame(frame) cv2.waitKey(1)